False positive dependency checks break package install order

Jan Rękorajski baggins at pld-linux.org
Tue Aug 13 19:55:25 CEST 2013


On Tue, 13 Aug 2013, Jeffrey Johnson wrote:

> 
> On Aug 13, 2013, at 7:47 AM, Jacek Konieczny wrote:
> 
> > 
> > Hi,
> > 
> > Trying to debug my package installation 'LOOP:' problems I have
> > instrumented the RPM source code with many 'rpmlog(RPMLOG_DEBUG, ...)',
> > so I could understand what is happening there.
> > 
> > After hours of such debugging I found out what is going on: for each
> > dependency rpmalSatisfiesDepend() is called from order.c to build the
> > dependency graph. There, for every file dependency (package requires
> > '/something') rpmalAllFileSatisfiesDepend() is called which finds out
> > what packages provide the required file. And here is the problem:
> > rpmalAllFileSatisfiesDepend() uses a Bloom filter look-up to find out
> > if a package contain a file. But Bloom filter, to its very nature, gives
> > false positives, which are not verified further. So, it can happen that
> > rpmalSatisfiesDepend() returns wrong package for a required file or
> > directory, which adds an extra edge to the dependency graph. RPM reports
> > this as a 'LOOP' problem and tries to work-around it by changing the
> > installation order, which totally breaks the order for me.
> > 
> 
> I see you had some fun. ;-)
> 
> Just in case:
> 	This isn't pleasant code to debug: your analysis is largely correct,
> 	and the patch is mostly convincing.
> 
> (aside)
> 	You should be calling rpmfiFree(alp->fi) but all objects have a common refcount
> 	so the patch happens to work.
> 
> However the whole point of using a Bloom filter is/was to increase performance,
> and reduce memory, with some acceptable risk of false positives.
> 
> FWIW, the file names are sorted, so a binary search could be done, if you absolutely
> insist on removing false positives. The linear search was attempted in rpm-3.0.4, rather slow.
> 
> Meanwhile the simpler fix is to adjust the Bloom filter parameters to decrease the probability
> of a false positive.
> 
> This can be done like so:
> 
> Index: rpmfi.c
> ===================================================================
> RCS file: /v/rpm/cvs/rpm/lib/rpmfi.c,v
> retrieving revision 2.160.4.4
> diff -p -u -w -r2.160.4.4 rpmfi.c
> --- rpmfi.c	4 Jun 2012 15:10:11 -0000	2.160.4.4
> +++ rpmfi.c	13 Aug 2013 14:35:13 -0000
> @@ -184,7 +184,7 @@ void * rpmfiFNBF(rpmfi fi)
>      if (fi != NULL) {
>  	if (fi->_fnbf == NULL) {
>  	    char * fn = (char *) alloca(fi->fnlen + 1);
> -	    static double e = 1.0e-4;
> +	    static double e = 1.0e-5;
>  	    size_t n = (fi->fc > 10 ? fi->fc : 10);
>  	    size_t m = 0;
>  	    size_t k = 0;
> 
> That reduces the probability of false positives from one in 10,000 to one in 100,000
> and SHOULD fix your issue. One might also increase the estimated population "n"
> by 5-10%, but I would change "e" first as the intent is clearer.
> 
> There is also some modest increase in memory used by the Bloom filter.
> 
> See if that fixes your problem please.

Even if probablity reduction fixes the problem for now, I will apply Jacek's
patch to rpm in PLD (with s/rpmbfFree/rpmfiFree/). Just changing Bloom
filter parameters won't mean that the problem is gone, it will just make it
harder to occur and I want stable installation order, always.

-- 
Jan Rękorajski                                 | PLD/Linux
SysAdm                                         | http://www.pld-linux.org/
baggins<at>mimuw.edu.pl
baggins<at>pld-linux.org


More information about the pld-devel-en mailing list