False positive dependency checks break package install order
jajcus at jajcus.net
Tue Aug 13 20:08:38 CEST 2013
On Tue, 13 Aug 2013 12:26:38 -0400
Jeffrey Johnson <n3npq at me.com> wrote:
> Yes. The Bloom filters been in use for about 3 years with no known
> problems, including randomized installations for 5-10 linux distros
> under a CI harness.
> THe performance/memory gains are significant.
> > Are there any other places in the code with algos like this?
> Yes. Bloom filters are quite convenient.
I am sure they are. For a _preliminary_ check. This can still a huge
Let's say we have 1000 packages each 1000 files average and
we are looking for 'a package containing given file'.
And let's say the Bloom filter gives 1 in 1000 false positives on
average and a cost of Bloom filter lookup is '2', while cost of
file list scan comparision is '1'.
1. No filter:
1000 * 1000 = 1000000 comparisions, no false positives
2. Bloom filter only:
1000 filter lookups, 2 false positives
500 times faster, but unreliable.
3. Bloom filter plus scan:
1000 filter lookups -> give us 2 'candidates' -> these require 2000
Only twice slower than Bloom filter only and still 250 times faster than
just scanning the file list.
One could also estimate b-tree index lookup cost for this case.
More information about the pld-devel-en