Terrible performance of Python dependency generator
Jacek Konieczny
jajcus at jajcus.net
Mon Nov 23 10:16:55 CET 2015
On 2015-11-22 22:03, Jeffrey Johnson wrote:
> Dependencies are automatically generated only for executable files.
That is not true for Python dependencies and this would not work for
Python dependencies.
There are two useful types of Python dependencies:
1. python(abi) – this is extracted from .pyc or .pyo files. These are
not the executable scripts, but non-executable library files in /usr/lib
or /usr/share. Checking a single *.py[co] file would do for the whole
package. On the other hand, this dependency is a bit redundant, because
files for each python abi are going to a different directory and the
directory dependency should be enough.
2. pythonegg(*) – this are extracted from meta-data in *.egg-info
directories. A package usually contains only one such directory.
Currently it works as all /usr/{lib*,share}/pythonX.Y/* files are passed
to pythoneggs.py. Among this file there would be some *.pyc and some
file from the egg-info directory, so all the important dependencies
would be extracted.
Examining only the executables would return only the '/usr/bin/python',
or even '/bin/sh' dependency.
I guess I will hack rpmfc.c to run Python helper only for a single
py[co] file and a single file in every egg-info directory.
> So
> using %files -f manifest, one can make a pass in %install to generate
> the manifest, and doing both
> 1) add a %attr marker to set the execute bits
> 2) chmod -x on the file in %buildroot
>
> and then generate dependencies manually (using a two pass build to
> edit Requires: etc into the spec file.
Sounds like a very ugly hack.
BTW we don't need a manifest to preserve proper file permissions as in
PLD we _always_ provide permissions explicitly in %files. So we could
just chmod -R a-x all the Python files. But that is not what file
permissions are for!
> The better fix would be to use the embedded python interpreter yo
> avoid repeatedly involving a shell that invokes python.
That wouldn't work much better than no repeat a stupid check for each file.
> Bur the fundamental problem is with user overridable external
> helper scripts that conform to ancient expectations of the helper API
> and still must classify files and generate cross referenced tag data
> dynamically.
The 'ancient expectations of the helper API' actually made some sense in
terms of performance (single process to handle a file list). Executing
any external process for every file is plain stupid.
And Python (and probably not only Python) dependencies are not per-file,
but per python package. Linking dependencies checks to specific files is
quite artificial.
Jacek
More information about the pld-devel-en
mailing list