Terrible performance of Python dependency generator

Jeffrey Johnson n3npq at me.com
Sun Nov 22 22:03:14 CET 2015


> On Nov 22, 2015, at 3:39 PM, Jacek Konieczny <jajcus at jajcus.net> wrote:
> 
> Hi,
> 
> We will probably need to rebuild the python-* packages again and I
> already hate that. Such python-django takes 45 minutes to build and most
> of that is in the auto-dependency generator. That is insane! It should
> not take that long!
> 
> /usr/lib/rpm/pythoneggs.py is used to find the dependencies and it is
> not that slow by itself… but it is called twice (Provides + Requires)
> for each file in /usr/share/pythonX.Y. And big Python packages have lots
> of files there. Most of them not adding any extra dependency
> information.
> 
> That is strange, as the dependency helpers accept list of file names on
> their stdout… and RPM (in lib/rpmfc.c) always feeds them with one
> filename only. Why is that?
> 

Short answer:
	rpmfc is dynamically building multiple cross referenced arrays dynamically
	in a single pass through the file list, so each file typed to an external helper
	is called.

> I can even see a buffer for a file list in the code (iob_python in the
> rpmfc_s struct), but it seems not used.
> 

Yes: either ancient history (when all dependencies were generated externally)
or being used in rpmdeps (which can be used as an external dependency generator replacement).

> I tried to invent some smart hack to limit number of files examined –
> usually checking a single *.py file and the *.egg-info/PKG-INFO should
> be enough, but I was not able to inject this in the weird rpmfc logic.
> And I do not quite understand what it is supposed to do (what are those
> 'colors' and what files should be python-colored).
> 

Colors (as exposed externally in rpm headers) are logic bits to distinguish
between ELF32 and ELF64 (and the oddball MIPS little endian).

Colors (as used internally in rpmfc) are used for classification based on
lib magic strings and used to associate helpers with file types, basically
to avoid having to repeatedly do string compares.

> Can this be fixed somehow? How have we ended with this?
> 

There’s some quick fixes and then there are some better fixes.

Dependencies are automatically generated only for executable files. So
using %files -f manifest, one can make a pass in %install to generate
the manifest, and doing both
	1) add a %attr marker to set the execute bits
	2) chmod -x on the file in %buildroot
and then generate dependencies manually (using a two pass build to
edit Requires: etc into the spec file.

The better fix would be to use the embedded python interpreter yo
avoid repeatedly involving a shell that invokes python.

Bur the fundamental problem is with user overridable external
helper scripts that conform to ancient expectations of the helper API
and still must classify files and generate cross referenced tag data
dynamically.

hah

 73 de Jeff
> Jacek
> _______________________________________________
> pld-devel-en mailing list
> pld-devel-en at lists.pld-linux.org
> http://lists.pld-linux.org/mailman/listinfo/pld-devel-en



More information about the pld-devel-en mailing list