False positive dependency checks break package install order

Jeffrey Johnson n3npq at me.com
Tue Aug 13 16:44:25 CEST 2013


On Aug 13, 2013, at 7:47 AM, Jacek Konieczny wrote:

> 
> Hi,
> 
> Trying to debug my package installation 'LOOP:' problems I have
> instrumented the RPM source code with many 'rpmlog(RPMLOG_DEBUG, ...)',
> so I could understand what is happening there.
> 
> After hours of such debugging I found out what is going on: for each
> dependency rpmalSatisfiesDepend() is called from order.c to build the
> dependency graph. There, for every file dependency (package requires
> '/something') rpmalAllFileSatisfiesDepend() is called which finds out
> what packages provide the required file. And here is the problem:
> rpmalAllFileSatisfiesDepend() uses a Bloom filter look-up to find out
> if a package contain a file. But Bloom filter, to its very nature, gives
> false positives, which are not verified further. So, it can happen that
> rpmalSatisfiesDepend() returns wrong package for a required file or
> directory, which adds an extra edge to the dependency graph. RPM reports
> this as a 'LOOP' problem and tries to work-around it by changing the
> installation order, which totally breaks the order for me.
> 

I see you had some fun. ;-)

Just in case:
	This isn't pleasant code to debug: your analysis is largely correct,
	and the patch is mostly convincing.

(aside)
	You should be calling rpmfiFree(alp->fi) but all objects have a common refcount
	so the patch happens to work.

However the whole point of using a Bloom filter is/was to increase performance,
and reduce memory, with some acceptable risk of false positives.

FWIW, the file names are sorted, so a binary search could be done, if you absolutely
insist on removing false positives. The linear search was attempted in rpm-3.0.4, rather slow.

Meanwhile the simpler fix is to adjust the Bloom filter parameters to decrease the probability
of a false positive.

This can be done like so:

Index: rpmfi.c
===================================================================
RCS file: /v/rpm/cvs/rpm/lib/rpmfi.c,v
retrieving revision 2.160.4.4
diff -p -u -w -r2.160.4.4 rpmfi.c
--- rpmfi.c	4 Jun 2012 15:10:11 -0000	2.160.4.4
+++ rpmfi.c	13 Aug 2013 14:35:13 -0000
@@ -184,7 +184,7 @@ void * rpmfiFNBF(rpmfi fi)
     if (fi != NULL) {
 	if (fi->_fnbf == NULL) {
 	    char * fn = (char *) alloca(fi->fnlen + 1);
-	    static double e = 1.0e-4;
+	    static double e = 1.0e-5;
 	    size_t n = (fi->fc > 10 ? fi->fc : 10);
 	    size_t m = 0;
 	    size_t k = 0;

That reduces the probability of false positives from one in 10,000 to one in 100,000
and SHOULD fix your issue. One might also increase the estimated population "n"
by 5-10%, but I would change "e" first as the intent is clearer.

There is also some modest increase in memory used by the Bloom filter.

See if that fixes your problem please.

> The attached patch is a quick fix to the problem: after rpmbfChk()
> states a file is provided by a package, its answer is additionally
> verified by scanning the file index of the package.
> 
> I am sure this can be implemented in a more elegant way, but I don't
> know the internal RPM architecture well enough and I cannot spend much
> more time on fixing this problem.
> 

You pretty much know all there is to know once you have debugged
a loop ordering issue like this ;-)

73 de Jeff



More information about the pld-devel-en mailing list