{Agorithm}
Algorithms/Bloom Filter
Data Structureadvancedprobabilistichashingspace-efficient

Bloom Filter

Space-efficient probabilistic set membership test with false positives.

How it works

A Bloom Filter is a probabilistic data structure that answers "is this element in the set?" using only a fixed bit array and multiple hash functions — consuming a tiny fraction of the memory a hash set would need.

How it works: - Insert: run the item through k hash functions; set those k bit positions to 1. - Query: run through the same k functions; if ANY bit is 0, the item is definitely absent. If all bits are 1, it is probably present (with a calculable false-positive rate).

There are NO false negatives — if an item was inserted, every query will say "probably present". There CAN be false positives, where bits set by other items accidentally match a new item's positions.

Tuning: with n expected items and desired false-positive rate p, optimal size is m = -n·ln(p)/(ln 2)² bits and k = (m/n)·ln 2 hash functions.

Complexity

Best case
O(k)
Average case
O(k)
Worst case
O(k)
Space
O(m)

Visualizer

Generating steps…

Where it's used in the real world

Google Bigtable / LevelDB / RocksDB

Avoid disk reads for missing keys

LSM-tree storage engines keep a Bloom filter per SSTable. Before doing an expensive disk read, they check the filter; a definitive 'absent' answer skips the I/O entirely, saving ~99% of unnecessary reads.

Chrome Safe Browsing

Malicious URL pre-filter

Chrome stores a Bloom filter of known-malicious URL hashes locally. Most legitimate URLs are filtered locally with zero network round trips; only suspected hits trigger a server check.

Akamai CDN

One-hit-wonder cache filtering

Akamai uses a Bloom filter to detect 'one-hit wonders' — objects requested only once. Items not seen before are not cached, saving cache space for genuinely popular objects.

Bitcoin / Ethereum light clients

SPV wallet transaction filtering

BIP 37 (Bitcoin) uses Bloom filters so lightweight wallets can ask full nodes to send only transactions matching their address, without revealing which addresses they own.

Implementation

class BloomFilter {
  private bits: boolean[];
  private size: number;

  constructor(size: number) {
    this.size = size;
    this.bits = new Array(size).fill(false);
  }

  private h1(s: string): number {
    let sum = 0;
    for (const c of s) sum += c.charCodeAt(0);
    return sum % this.size;
  }

  private h2(s: string): number {
    let sum = 0;
    for (const c of s) sum += c.charCodeAt(0);
    return (sum * 31) % this.size;
  }

  private h3(s: string): number {
    return (s.charCodeAt(0) * 17 + s.length * 7) % this.size;
  }

  insert(item: string): void {
    this.bits[this.h1(item)] = true;
    this.bits[this.h2(item)] = true;
    this.bits[this.h3(item)] = true;
  }

  query(item: string): boolean {
    return this.bits[this.h1(item)] &&
           this.bits[this.h2(item)] &&
           this.bits[this.h3(item)];
  }
}

const bf = new BloomFilter(16);
["apple", "banana", "cherry"].forEach(w => bf.insert(w));
console.log(bf.query("apple"));  // true  (definitely present)
console.log(bf.query("grape"));  // false (definitely absent) — or true (false positive)

Related algorithms