PLD cooperation

Wto, 11 Paź 2005, 17:50:14 CEST

Looks like I really should have started with a complete description of how our
infrastructure works and what impact does it have on the whole distribution's
development philosophy (and vice versa).

The core element of our infrastructure is our CVS repository which is tightly
integrated with a lot of other things (for example the DistFiles system that's
responsible for handling large binary files, so we don't have to store them in
CVS or manually upload it anywhere; more info at
http://cvs.pld-linux.org/cgi-bin/cvsweb/PLD-doc/Distfiles-Quick-HowTo.en).  
All changes to spec files and accompanying patches/scripts/whatever (SPECS 
and SOURCES CVS modules respectively) get commited to the repo and as I've 
already written, we don't have any access restrictions and everybody is free 
to modify any package they wish.  This is very important because it means all 
developers have instant access to full sources of all available packages 
including the ones that are either only missing some polish or are a complete 
work in progress and any modifications can be instantly commited to the 
repository (I'll talk about some scripts that aid in retrieving and modifying 
packages later on). I can't overemphasize the importance of this -- every 
developer has the right to instantly commit any changes she has made without 
having to go through any formal process and get any kind of acceptance from 
anyone (of course this doesn't mean one can commit stuff that violates coding 
practices or breaks something, but those things are checked for after the 
commit has actually happened and developers get punished if they tend to 
constantly ignore those rules). This is crucial, since developers often use 
PLD for getting their work done and after making the necessary modifications, 
they can't afford having to wait for anyone (read: the maintainer) to 
greenlight their changes just to be able to commit them. This would be 
extremely counterproductive since it would (a) require a lot more labor 
(maintainers would have to take the time to greenlight changes) and (b) slow 
down the pace at which new changes are made available to everyone which in 
turn could prevent people from actually commiting some changes and render 
developers unable to finish/correct changes after someone else (in other 
words I prefer to have an unfinished spec file in the CVS that I can work 
with then to have to start from scratch and duplicate work).

Like I've mentioned before, a diff of every change gets sent to the cvs-commit 
mailing list (http://lists.pld-linux.org/mailman/listinfo/pld-cvs-commit) for
anyone interested to review. People can go through the whole daily batch 
(which can be quite large -- even up to 200-300 messages a day) or just 
filter out the interesting packages in their MUAs. We have a lot of pressure 
on developers to actually make clear and descriptive commit logs.

I've already given the URL to details about how DistFiles work, so the two 
main remaining often used components of our CVS infrastructure are scripts:
'adapter.awk' and 'builder'. The first one is a no-brainer to be run on all 
new spec files. It mostly takes care of correcting some common formatting 
mistakes (yes, our spec files must even look nice :). The second script's 
main job is to retrieve a spec with all patches, sources and any additional 
files required to build it (try doing the same with glibc by manually doing 
cvs up for all patches mentioned in the spec file; not to mention downloading 
tarballs from distfiles).

That's the core of our development methodology and although probably has some
places for minor improvements here and there, it's not subject to any
revolutions (especially introducing precommit maintainer greenlighting).

One common mistake many people make is placing an equal sign between what's in
CVS and what's on FTP. Fact of the matter is, that our builder and FTP
infrastructures are (mostly) separate from one another. Building binary
packages and placing them on FTP goes something like this...

First, a developer with a request account for a given distro line (we 
currently have infrastructure for two lines, 2.0 and 3.0, set up and can 
support any number of those at the same time; assuming we have enough 
machines to handle the load of course) sends a request (via a gpg-signed 
email using the make-request.sh utility) to build a given spec file (from a 
given tag/branch; defaults to HEAD). That mail is received by a so called 
source builder that retrieves the spec file along with all sources from our 
repo (using the 'builder' script), tags the spec and sources (well, that 
depends on options given to make-request.sh; I'll skip the details), creates 
a src.rpm, uploads it to our FTP server and publishes it via http along with 
some metadata (all of that data for PLD 2.0 line can be found here:
http://ep09.pld-linux.org/~buildsrc/; especially check out queue.html). Binary
builders (do note, that all builders perform RPM related tasks inside chroots,
so the machines environment doesn't have any influence on the RPMs; a couple 
of our builders run on Debian :) download the src.rpms, try to build them and 
(a) send a status report to the developer that requested the build, cc it to 
a mailing list for other people to be able to follow what's going on and to 
the src.builder for it to update queue.html and do some other stuff, (b) 
upload the buildlogs of the build to http://buildlogs.pld-linux.org (go to 
queue.html again and click on any of the green or red stuff :) and (c) if the 
build is successful, upload the packages to our FTP server.

Ok, now begins the fun part. Since for a long time I've been the only person
actually doing any active development on our builder/ftp system, it mostly
reflects how I view the problems with our current model of development (no,
this isn't our first version of the system, but earlier versions where a real
PITA to use). So here it goes...

Like I've said before, the way we use our CVS repo is not subject to any major
changes (we can't even figure out if going with SVN would be worth it). More 
or less the same goes for our builder infrastructure, although there's
definitively room for improvement here (you've mentioned some post build 
static checks; sounds yummy). As for the way we handle FTP (and do keep in 
mind that we're a multiarch distro and need to keep all archs more or less in 
sync), the new infrastructure is nice (for some hands on tutorial go here:
http://cvs.pld-linux.org/cgi-bin/cvsweb/PLD-doc/PLD_2.0_ftp_administration)
although is still work in progress and my great goal of being able to use a
trained monkey for FTP management is still not that small amount of coding
hours away (screenshot:
http://ep09.pld-linux.org/~mmazur/misc/pld-ftp-admin4.png). So as far as
infrastructure (both code and architecture) is concerned, your's most likely 
wouldn't be of any use to us, rather the other way around (I'm not
talking about single utilities that probably could be of great value to us).

So now for the stuff that can (and will) change, maybe under your influence --
the release methodology. We can try out various solutions both with builder
access and FTP management. My take on builder access is the same as with CVS
access -- the less restrictions the better -- there's no point in having a
binary distro if most of the goodies are only available from CVS (and require
doing source builds with the 'builder' script -- Gentoo style). That's what 
I'm going to do with PLD 3.0 (I'm the Release Manager).

Handling the FTP is a completely different matter. The general idea is that
distro quality can be achieved by having all new packages tested when they 
hang around test trees and before they get moved to the main FTP tree and 
made available to unsuspecting users to upgrade/install (see, the draconian 
quality assurance gets shifted from CVS, where it'd stifle innovation, to FTP 
where developers, that usually know what they're doing, can always use 
packages from test trees if they're in a hurry). There's enormous room for 
improvements here and as always it would be good to have more developers 
doing this stuff :). I have a long list of tests that can be done by the FTP 
management scripts, but going far beyond that, we could go with full 
bugtracker integration, better information management, etc. This is the point 
where traditional style maintainers (with obligatory maintainer reviews, 
blocking moves of packages that have unclosed bug reports in the BTS and 
whatever proves most optimal) can be introduced.

Hmm, what else I forgot...

Ah, the builders. Do note that complete builder 'security' isn't a goal, since 
most of the times it'd require some additional support from the machine 
hosting the builder and we're in need of too many machines to have much of a 
choice; currently we're fine with having all builder activity being done 
inside chroots and from ordinary users (no package requires root to build). 
Oh, and our builders are able to install missing dependencies :)

We have (stalled) LiveCD and *x86 and PPC rescuecds (links at
www.pld-linux.org).

Check out cia.navi.cx. We're fourth (I think) on the list of most active 
projects (behind kde, gnome and Gentoo).

One last thing -- APT sucks. When you try our package management software
(called poldek), you'll dump APT at once :)

Here's a sample session.

[root w klapek ~]# poldek
Loading [pndir]ac...
Loading [pndir]ac-updates-security...
12001 packages read
Loading [rpmdbcache]/var/lib/rpm...
439 packages loaded

Welcome to the poldek shell mode. Type "help" for help with commands.

poldek:/all-avail> ls
[...]
12001 packages
poldek:/all-avail> cd ../
poldek:/> ls
ac/              # Ac is the code name for PLD 2.0 (Ra -- 1.0, Th -- 3.0)
all-avail/
installed/
poldek:/> cd ac
poldek:/ac> upgrade --test a             # I'm using tab completion
aalib-1.4rc5-10           alsa-utils-1.0.9a-2       arj-3.10.22-1
acl-2.2.31-1              alsa-utils-init-1.0.9a-2  atk-1.10.3-1
alsa-lib-1.0.9-1          applnk-1.9.5-19           attr-2.4.23-1
poldek:/ac> upgrade --test acl-2.2.31-1
Processing dependencies...
acl-2.2.28-1 obsoleted by acl-2.2.31-1
There are 1 package to install, 1 to uninstall:
I acl-2.2.31-1
R acl-2.2.28-1
Need to get 63.5KB of archives (63.5KB to download).
After unpacking 103.8KB will be used.
poldek:/ac> upgrade --test a*
Processing dependencies...
arj-3.10.20-2 obsoleted by arj-3.10.22-1
alsa-utils-init-1.0.8-1 obsoleted by alsa-utils-init-1.0.9a-2
alsa-utils-1.0.8-1 obsoleted by alsa-utils-1.0.9a-2
applnk-1.9.5-15 obsoleted by applnk-1.9.5-19
alsa-lib-1.0.8-1 obsoleted by alsa-lib-1.0.9-1
aalib-1.4rc5-9 obsoleted by aalib-1.4rc5-10
aalib-1.4rc5-10 marks slang-2.0.4-1 (cap libslang.so.2)
  slang-1.4.9-8 obsoleted by slang-2.0.4-1
  greedy upgrade mplayer-1.0-1.pre6a.1 to 1.0-2.pre7try2.4 (unresolved
libslang.so.1)
        mplayer-1.0-1.pre6a.1 obsoleted by mplayer-1.0-2.pre7try2.4
        mplayer-1.0-2.pre7try2.4 marks mplayer-common-1.0-2.pre7try2.4 (cap
mplayer-common = 3:1.0-2.pre7try2.4)
atk-1.10.1-1 obsoleted by atk-1.10.3-1
acl-2.2.28-1 obsoleted by acl-2.2.31-1
attr-2.4.20-1 obsoleted by attr-2.4.23-1
There are 12 packages to install (3 marked by dependencies), 11 to uninstall:
I aalib-1.4rc5-10, acl-2.2.31-1, alsa-lib-1.0.9-1, alsa-utils-1.0.9a-2,
I alsa-utils-init-1.0.9a-2, applnk-1.9.5-19, arj-3.10.22-1, atk-1.10.3-1,
I attr-2.4.23-1
D mplayer-1.0-2.pre7try2.4, mplayer-common-1.0-2.pre7try2.4, slang-2.0.4-1
R arj-3.10.20-2, attr-2.4.20-1, acl-2.2.28-1, atk-1.10.1-1,
R alsa-lib-1.0.8-1, alsa-utils-1.0.8-1, alsa-utils-init-1.0.8-1,
R applnk-1.9.5-15, slang-1.4.9-8, aalib-1.4rc5-9, mplayer-1.0-1.pre6a.1
Need to get 7.1MB of archives (7.1MB to download).
After unpacking 20.0MB will be used.
poldek:/ac>

-- 
In the year eighty five ten
God is gonna shake his mighty head
He'll either say,
"I'm pleased where man has been"
Or tear it down, and start again