Skip to main content

Others

Streaming and Sketching Algorithms

Sketching (Compressed data structures)

Given data and a function f, compress the data into a sketch to allow computation, just given the sketch

Streaming

Maintain sketch on the fly, as the data is updated

https://www.sketchingbigdata.org

https://www.youtube.com/watch?v=xbTM3t26xLk

Streaming Algorithms

Reservoir Algorithm

Reservoir sampling is a family of randomized algorithms for choosing a simple random sample without replacement of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large to fit all n items into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far

On the ith iteration of our loop to pick a random element, let's assume we already picked an element uniformly from [0, i - 1]. In order to maintain the loop invariant, we would need to pick the ith element as the new random element at1 / (i + 1)chance. For the base case wherei = 0, let's say the random element is the first one. Then we know it works because

  • Fori = 0, we would've picked uniformly from [0, 0].
  • Fori > 0, before the loop began, any elementKin [0, i - 1]had1 / ichance of being chosen as the random element. We wantKto have1 / (i + 1)chance of being chosen after the iteration. This is the case since the chance of having being chosen already but not getting swapped with theithelement is1 / i (1 - (1 / (i + 1)))which is1 / i i / (i + 1)or1 / (i + 1)
import random

def pick(big_stream):
random_element = None

for i, e in enumerate(big_stream):
if i == 0:
random_element = e
if random.randint(1, i + 1) == 1:
random_element = e
return random_element

https://en.wikipedia.org/wiki/Reservoir_sampling

Evolutionary Algorithm

https://en.wikipedia.org/wiki/Evolutionary_algorithm

In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve acandidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle'sposition and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

https://en.wikipedia.org/wiki/Particle_swarm_optimization

Genetic Algorithm

Genetic Programming

Thompson's construction

It is an algorithm for transforming a regular expression into an equivalent non-deterministic finite automata (NFA). This NFA can be used to match strings against the regular expression.

Approximate counting algorithm

Theapproximate counting algorithmallows the counting of a large number of events using a small amount of memory. Invented in 1977 by Robert Morris (cryptographer) of Bell Labs, it uses probabilistic techniques to increment the counter. It was fully analyzed in the early 1980s by Philippe Flajolet of INRIA Rocquencourt, who coined the nameapproximate counting, and strongly contributed to its recognition among the research community. The algorithm is considered one of the precursors of streaming algorithms, and the more general problem of determining the frequency moments of a data stream has been central to the field.

https://en.wikipedia.org/wiki/Approximate_counting_algorithm

Joining point cloud scans (ICP - Iterative Closest Point)

Joining Point Cloud Scans (ICP) - Computerphile

https://en.wikipedia.org/wiki/Iterative_closest_point