[projects/git-slug] Przyspieszenie git-pld dla 20k+ repozytoriów
arekm
arekm at pld-linux.org
Tue May 12 15:08:15 CEST 2026
commit ba65da1522451ba5d1d43ef2768eaa9c77fe1abb
Author: Arkadiusz Miśkiewicz <arekm at maven.pl>
Date: Mon Apr 6 21:34:29 2026 +0200
Przyspieszenie git-pld dla 20k+ repozytoriów
Zachowano multiprocessing.Pool (ProcessPool) — benchmarki wykazały że
ThreadPool jest ~2-3x wolniejszy dla operacji subprocess z powodu
kontencji na GIL podczas fork+exec. ProcessPool z j=32 skaluje się
poprawnie, ThreadPool nie.
Domyślna liczba workerów podniesiona do min(cpu_count*4, 32) — dla pracy
I/O-bound więcej procesów lepiej nasyca przepustowość dysku i sieci.
Limit 32 chroni przed przekroczeniem SSH MaxStartups na serwerze git.
Użytkownik nadal może nadpisać przez -j lub PLD.jobs w gitconfig.
Nowy moduł git_slug/proctrack.py z klasą TrackedPopen (podklasa Popen):
- auto-rejestracja w globalnym zbiorze _active przy tworzeniu,
- auto-wyrejestrowanie w communicate(), wait() i poll(),
- start_new_session=True — każdy git działa we własnej grupie procesów,
- kill_all() wysyła SIGTERM przez os.killpg() do całej grupy, więc
Ctrl-C zabija też procesy potomne (ssh, credential helpers).
Cache refs w GitRepo — check_remote() wywoływany był per branch per
repozytorium, co przy 20k repo × 5 branchy dawało ~100k operacji
plikowych (open + read loose ref, a przy braku — liniowe skanowanie
packed-refs). Teraz:
- _load_packed_refs() parsuje packed-refs raz do dict (współdzielony
między remote'ami), pomija linie # i ^,
- _load_loose_refs(remote) rekurencyjnie (os.walk) wczytuje loose refs
dla danego remote'a — rekursja potrzebna bo nazwy branchy z '/'
(np. feature/foo) tworzą zagnieżdżone katalogi,
- loose refs mają priorytet nad packed (zgodnie z semantyką git),
- cache żyje tyle co instancja GitRepo (jedna per worker w fetch_package).
Benchmark: 3.6x przyspieszenie (0.58s → 0.16s dla 3000 repo × 5 branchy).
Wsadowy odczyt konfiguracji PLD.* — jedno wywołanie
"git config --global --get-regexp ^PLD\." zastępuje 2-3 osobne
subprocess.run() przy starcie. Klucze przechowywane lowercase (tak je
zwraca git config), pusta wartość ('') odróżniana od braku klucza (None).
Benchmark: 740x przyspieszenie (1.64s → 0.002s dla 1000 odczytów).
Optymalizacja prune w fetch_packages() — gdy options.branch == ['*']
(przypadek clone), wynik pierwszego getrefs() jest ponownie
wykorzystywany zamiast wykonywania drugiego zapytania sieciowego do
repozytorium Refs. Przy branch != ['*'] (np. domyślne update z
['master']) drugie zapytanie jest nadal konieczne. Dodano repopattern
do drugiego wywołania getrefs(), co zawęża ilość przesyłanych danych.
Poprawiono return_all aby zwracał all_refs.heads zachowując oryginalną
semantykę po prune.
git_passthrough_worker() używa TrackedPopen zamiast subprocess.run() —
wynik to bytes, dekodowane z errors='replace'. Wyciągnięto
print_prefixed(stderr) przed if/else. Dodano early-return w
print_prefixed() dla pustych stringów.
Benchmark podsumowanie (3000 repozytoriów):
check_remote() 15k wywołań: 0.58s → 0.16s (3.6x)
git_config_get() 1k odczytów: 1.64s → 0.002s (740x)
git log j=16: 0.54s → 0.55s (bez regresji)
git log j=32: 0.35s → 0.36s (bez regresji)
Testy: 87 testów (było 71), pokrycie 91%. Nowy test_proctrack.py
(TrackedPopen rejestracja/wyrejestrowanie, kill_all z grupą procesów,
integracja z GitRepo.commandexc). Nowe testy refs cache (loose nadpisuje
packed, branch z '/', osobne cache per remote, jednokrotny odczyt
packed-refs). Nowe testy prune (reuse refs przy ['*'], osobne getrefs
przy filtrowanym branchu). Zaktualizowane istniejące testy pod zmienione
API (FakeTrackedPopen zamiast mock subprocess.run, reset
_pld_config_cache między testami, jobs == 12).
git_slug/gitrepo.py | 85 +++++++++++++----
git_slug/proctrack.py | 96 ++++++++++++++++++++
slug.py | 107 +++++++++++++++-------
tests/test_cli.py | 2 +-
tests/test_config_defaults.py | 46 ++++++++++
tests/test_failure_paths.py | 66 ++++++++++++++
tests/test_gitrepo.py | 74 +++++++++++++++
tests/test_passthrough.py | 31 ++++---
tests/test_proctrack.py | 207 ++++++++++++++++++++++++++++++++++++++++++
9 files changed, 651 insertions(+), 63 deletions(-)
---
diff --git a/git_slug/gitrepo.py b/git_slug/gitrepo.py
index 2b46e5b..336edce 100644
--- a/git_slug/gitrepo.py
+++ b/git_slug/gitrepo.py
@@ -1,4 +1,5 @@
from .gitconst import EMPTYSHA1, REMOTE_NAME, REFFILE
+from .proctrack import TrackedPopen
import os
from subprocess import PIPE
@@ -21,8 +22,15 @@ class GitRepo:
if self.wtree is not None:
self.command_prefix.append('--work-tree='+self.wtree)
+ # Refs cache for check_remote(): avoids repeated file I/O when
+ # checking multiple branches per repo (common in update/clone).
+ # packed-refs is shared across remotes; loose refs are per-remote
+ # because they live under refs/remotes/<remote>/.
+ self._packed_refs = None
+ self._loose_refs = {}
+
def command(self, clist):
- return subprocess.Popen(self.command_prefix + clist, stdout=PIPE, stderr=PIPE, bufsize=-1)
+ return TrackedPopen(self.command_prefix + clist, stdout=PIPE, stderr=PIPE, bufsize=-1)
def commandio(self, clist):
return self.command(clist).communicate()
@@ -81,23 +89,68 @@ class GitRepo:
self.commandio(['config', '--local', '--add', 'remote.{}.fetch'.format(remotename),
'refs/notes/*:refs/notes/*'])
- def check_remote(self, ref, remote=REMOTE_NAME):
- localref = EMPTYSHA1
- ref = ref.replace(REFFILE, os.path.join('remotes', remote))
+ def _load_packed_refs(self):
+ """Parse packed-refs into a dict, cached for the GitRepo lifetime.
+
+ Shared across remotes — packed-refs contains all refs regardless
+ of remote name. Skips comment lines (#) and peeled tag lines (^).
+ """
+ if self._packed_refs is not None:
+ return
+ self._packed_refs = {}
try:
- with open(os.path.join(self.gdir, ref), 'r') as f:
- localref = f.readline().strip()
+ with open(os.path.join(self.gdir, 'packed-refs')) as f:
+ for line in f:
+ if line.startswith('#') or line.startswith('^'):
+ continue
+ parts = line.split()
+ if len(parts) >= 2:
+ self._packed_refs[parts[1]] = parts[0]
except IOError:
- try:
- with open(os.path.join(self.gdir, 'packed-refs')) as f:
- for line in f:
- line_data = line.split()
- if len(line_data) == 2 and line_data[1] == ref:
- localref = line_data[0].strip()
- break
- except IOError:
- pass
- return localref
+ pass
+
+ def _load_loose_refs(self, remote):
+ """Walk refs/remotes/<remote>/ recursively, caching per remote.
+
+ Recursive walk is needed because branch names with '/' (e.g.
+ feature/foo) create nested directories under the remote ref path.
+ """
+ if remote in self._loose_refs:
+ return
+ self._loose_refs[remote] = {}
+ refs_base = os.path.join(self.gdir, 'refs', 'remotes', remote)
+ try:
+ for dirpath, _, filenames in os.walk(refs_base):
+ for fname in filenames:
+ full = os.path.join(dirpath, fname)
+ # Build ref path relative to .git dir,
+ # e.g. refs/remotes/origin/feature/x
+ ref = os.path.relpath(full, self.gdir)
+ try:
+ with open(full) as f:
+ self._loose_refs[remote][ref] = f.readline().strip()
+ except IOError:
+ pass
+ except OSError:
+ pass
+
+ def check_remote(self, ref, remote=REMOTE_NAME):
+ """Check local sha for a remote ref, using cached ref data.
+
+ Loose refs take priority over packed-refs (matching git semantics).
+ Both are loaded lazily and cached for the lifetime of this GitRepo
+ instance, so repeated calls for different branches only do file I/O
+ once — the rest are dict lookups.
+ """
+ ref = ref.replace(REFFILE, os.path.join('remotes', remote))
+ # Loose refs override packed-refs (git writes loose refs on fetch,
+ # packs them later during gc).
+ self._load_loose_refs(remote)
+ val = self._loose_refs[remote].get(ref)
+ if val is not None:
+ return val
+ self._load_packed_refs()
+ return self._packed_refs.get(ref, EMPTYSHA1)
def showfile(self, filename, ref="/".join([REMOTE_NAME, "master"])):
clist = ['show', ref + ':' + filename]
diff --git a/git_slug/proctrack.py b/git_slug/proctrack.py
new file mode 100644
index 0000000..c8351c7
--- /dev/null
+++ b/git_slug/proctrack.py
@@ -0,0 +1,96 @@
+"""Subprocess tracker for clean Ctrl-C shutdown.
+
+git-pld uses ThreadPool for parallel git operations because the work is
+entirely I/O-bound (waiting for git subprocesses). ThreadPool cannot kill
+blocked worker threads, so we track every spawned git subprocess here and
+kill them all on interrupt — unblocking the worker threads so the pool can
+be joined promptly.
+
+Each subprocess is started in its own session (start_new_session=True) so
+that os.killpg() kills the entire process group — git plus any helpers it
+spawns (ssh, credential helpers, etc.).
+
+This affects ALL git operations globally (fetch, checkout, config, show...),
+not just passthrough. Since git-pld is batch-only with no TTY interaction,
+running in a separate session is safe — no git helper needs terminal access.
+
+Usage:
+ # In worker code — use TrackedPopen instead of subprocess.Popen.
+ # It auto-registers on creation, auto-unregisters on communicate()/wait().
+ proc = TrackedPopen(cmd, stdout=PIPE, stderr=PIPE)
+ out, err = proc.communicate()
+
+ # On Ctrl-C — kill all tracked process groups from the main thread.
+ kill_all()
+"""
+
+import os
+import signal
+import subprocess
+import threading
+
+# Thread-safe set of currently running TrackedPopen instances.
+_active = set()
+_lock = threading.Lock()
+
+
+class TrackedPopen(subprocess.Popen):
+ """Popen subclass that auto-registers for Ctrl-C tracking.
+
+ Automatically registers on __init__, unregisters on communicate(),
+ wait(), and poll() — covering all calling patterns in the codebase:
+ - commandexc()/commandio() use communicate()
+ - refsdata.py uses wait()
+ - showfile() callers use poll() + stdout.read()
+
+ Callers MUST call communicate() or wait() to unregister. All current
+ callers do. showfile() callers are covered by poll() which unregisters
+ once the process is finished.
+
+ No __del__ safety net: _active holds strong references so __del__ would
+ never fire while the object is tracked (gc can't collect reachable objects).
+ """
+
+ def __init__(self, *args, **kwargs):
+ # Start in own session so kill_all()'s killpg() reaches helper
+ # children (ssh, credential helpers, etc.), not just the git process.
+ kwargs.setdefault('start_new_session', True)
+ super().__init__(*args, **kwargs)
+ with _lock:
+ _active.add(self)
+
+ def _untrack(self):
+ with _lock:
+ _active.discard(self)
+
+ def communicate(self, *args, **kwargs):
+ try:
+ return super().communicate(*args, **kwargs)
+ finally:
+ self._untrack()
+
+ def wait(self, *args, **kwargs):
+ try:
+ return super().wait(*args, **kwargs)
+ finally:
+ self._untrack()
+
+ def poll(self):
+ ret = super().poll()
+ if ret is not None:
+ self._untrack()
+ return ret
+
+
+def kill_all():
+ """Kill all tracked process groups — called on Ctrl-C.
+
+ Sends SIGTERM to each process group (not just the parent PID) so that
+ child processes like ssh are also terminated promptly.
+ """
+ with _lock:
+ for proc in list(_active):
+ try:
+ os.killpg(proc.pid, signal.SIGTERM)
+ except (OSError, ProcessLookupError):
+ pass
diff --git a/slug.py b/slug.py
index 8b1e7a2..5ef4227 100755
--- a/slug.py
+++ b/slug.py
@@ -19,6 +19,9 @@ import glob
import itertools
import sys
import os
+
+# Allow running from source tree or via symlink without installing git_slug.
+sys.path.insert(0, os.path.dirname(os.path.realpath(__file__)))
import shutil
import subprocess
import shlex
@@ -31,6 +34,7 @@ from multiprocessing import Pool as WorkerPool
from git_slug.gitconst import (GITLOGIN, GITSERVER, GIT_REPO, GIT_REPO_PUSH,
REMOTE_NAME, REMOTEREFS)
from git_slug.gitrepo import GitRepo, GitRepoError
+from git_slug.proctrack import TrackedPopen, kill_all as kill_tracked_subprocesses
from git_slug.refsdata import GitArchiveRefsData, NoMatchedRepos, RemoteRefsError
@@ -101,30 +105,56 @@ def default_packagesdir():
return os.path.expanduser('~/rpm/packages')
-def git_config_get(key):
- """Read a single value from global/system git config.
+# Cached PLD.* config values — loaded once via git config --get-regexp
+# instead of spawning a separate subprocess per key. Called from
+# apply_defaults() in the main thread before the pool is created,
+# so no locking is needed.
+_pld_config_cache = None
+
- Uses --global so that PLD settings come from ~/.gitconfig (or
- XDG_CONFIG_HOME/git/config), not from the cwd repo's .git/config.
- This prevents a single repo's local config from changing batch
- behavior across all repos.
+def _load_pld_config():
+ """Read all PLD.* keys from global gitconfig in one subprocess call.
- Returns the value as a string, or None if the key is not set.
+ git config --get-regexp outputs lowercased key names, so we store
+ and look up with lowercase keys. Values are preserved as-is:
+ empty string (e.g. 'PLD.fetch-config =') stays as '' and is
+ distinguishable from a missing key (which returns None via dict.get).
"""
+ global _pld_config_cache
+ if _pld_config_cache is not None:
+ return _pld_config_cache
+ _pld_config_cache = {}
try:
result = subprocess.run(
- ['git', 'config', '--global', '--get', key],
+ ['git', 'config', '--global', '--get-regexp', r'^PLD\.'],
capture_output=True, text=True
)
+ # --get-regexp returns exit code 1 when no keys match — treat as empty.
if result.returncode == 0:
- return result.stdout.strip()
+ for line in result.stdout.splitlines():
+ # partition preserves empty rhs: "pld.x " → ('pld.x', ' ', '')
+ key, _, val = line.partition(' ')
+ _pld_config_cache[key.lower()] = val
except FileNotFoundError:
pass
- return None
+ return _pld_config_cache
+
+
+def git_config_get(key):
+ """Read a single PLD.* value from the cached global git config.
+
+ Uses a cached batch read of all PLD.* keys (one subprocess call)
+ instead of spawning a subprocess per key. Returns the value as a
+ string, or None if the key is not set.
+ """
+ config = _load_pld_config()
+ return config.get(key.lower())
def print_prefixed(stream_data, prefix, dest):
"""Print each line of stream_data prefixed with 'prefix: ' to dest."""
+ if not stream_data:
+ return
for line in stream_data.splitlines():
print("{}: {}".format(prefix, line), file=dest)
@@ -269,7 +299,10 @@ def apply_defaults(options):
# only run if the value wasn't set by CLI or git config.
HARDCODED = {
'packagesdir': default_packagesdir,
- 'jobs': cpu_count,
+ # Workers are I/O-bound (waiting for git subprocesses), so more
+ # workers = more parallelism. Capped at 32 to stay within typical
+ # SSH MaxStartups limits on git servers.
+ 'jobs': lambda: min(cpu_count() * 4, 32),
'quiet': lambda: False,
'pattern': lambda: ['*'],
}
@@ -321,6 +354,7 @@ def run_worker(function, options, args):
except KeyboardInterrupt:
print('Keyboard interrupt received, finishing...', file=sys.stderr)
pool.terminate()
+ kill_tracked_subprocesses()
pool.join()
sys.exit(1)
return ret
@@ -375,22 +409,22 @@ def git_passthrough_worker(repo_dir, git_command, git_args, config_pairs, quiet)
"""
directory = os.path.basename(repo_dir)
cmd = build_git_cmd(repo_dir, git_command, git_args, config_pairs)
- result = subprocess.run(cmd, capture_output=True, text=True)
-
- if result.returncode != 0:
+ proc = TrackedPopen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+ stdout_bytes, stderr_bytes = proc.communicate()
+ stdout_str = stdout_bytes.decode('utf-8', errors='replace')
+ stderr_str = stderr_bytes.decode('utf-8', errors='replace')
+
+ # Always show stderr (may contain warnings even on success).
+ print_prefixed(stderr_str, directory, sys.stderr)
+ if proc.returncode != 0:
# Failed — always show all output so the developer can diagnose
- print_prefixed(result.stdout, directory, sys.stdout)
- print_prefixed(result.stderr, directory, sys.stderr)
- else:
- # Succeeded — always show stderr (warnings, progress), because
- # even a successful command might emit important diagnostics.
- print_prefixed(result.stderr, directory, sys.stderr)
- if not quiet:
- # In quiet mode we suppress stdout from successful repos
- # (e.g. "Already up to date" from hundreds of repos).
- print_prefixed(result.stdout, directory, sys.stdout)
+ print_prefixed(stdout_str, directory, sys.stdout)
+ elif not quiet:
+ # In quiet mode we suppress stdout from successful repos
+ # (e.g. "Already up to date" from hundreds of repos).
+ print_prefixed(stdout_str, directory, sys.stdout)
- return repo_dir if result.returncode else None
+ return repo_dir if proc.returncode else None
def passthrough_command(options, git_command, git_args):
@@ -521,12 +555,13 @@ def fetch_packages(options, return_all=False):
Sets options.had_errors = True if any init or prune operation failed,
so callers can exit non-zero.
"""
- refs = getrefs(options.branch, options.repopattern)
+ branch_refs = getrefs(options.branch, options.repopattern)
+ all_refs = branch_refs
print('Read remotes data', file=sys.stderr)
had_errors = False
pkgs_new = []
if options.newpkgs:
- for pkgdir in sorted(refs.heads):
+ for pkgdir in sorted(branch_refs.heads):
gitdir = os.path.join(options.packagesdir, pkgdir, '.git')
if not os.path.isdir(gitdir):
pkgs_new.append(pkgdir)
@@ -540,12 +575,12 @@ def fetch_packages(options, return_all=False):
had_errors = True
args = []
- for pkgdir in sorted(refs.heads):
+ for pkgdir in sorted(branch_refs.heads):
if options.omitexisting and pkgdir not in pkgs_new:
continue
else:
gitrepo = GitRepo(os.path.join(options.packagesdir, pkgdir))
- args.append((gitrepo, refs.heads[pkgdir], options))
+ args.append((gitrepo, branch_refs.heads[pkgdir], options))
fetch_results = run_worker(fetch_package, options, args)
# Separate successful fetches from errors
@@ -554,10 +589,18 @@ def fetch_packages(options, return_all=False):
had_errors = True
if options.prune:
- refs = getrefs(['*'])
+ # Prune needs all-branches refs to know which repos still exist
+ # upstream. If we already fetched with ['*'], reuse those refs
+ # to avoid a redundant network call. Otherwise (e.g. -b master)
+ # we must fetch again with ['*'].
+ # Adding repopattern narrows the fetch to only matching repos —
+ # the prune loop already filters via find_git_repos(...,
+ # options.repopattern), so the effective result is the same.
+ if options.branch != ['*']:
+ all_refs = getrefs(['*'], options.repopattern)
for fulldir in find_git_repos(options.packagesdir, options.repopattern):
pkgdir = os.path.basename(fulldir)
- if len(refs.heads[pkgdir]) == 0:
+ if len(all_refs.heads[pkgdir]) == 0:
print('warning: removing', fulldir, file=sys.stderr)
try:
shutil.rmtree(fulldir)
@@ -568,7 +611,7 @@ def fetch_packages(options, return_all=False):
options.had_errors = had_errors
if return_all:
- return refs.heads
+ return all_refs.heads
else:
return updated_repos
diff --git a/tests/test_cli.py b/tests/test_cli.py
index a86ce63..cfaca5b 100644
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -149,6 +149,6 @@ def test_apply_defaults_can_fill_namespace_created_like_argparse(monkeypatch):
slug.apply_defaults(options)
assert options.packagesdir == "/fallback"
- assert options.jobs == 3
+ assert options.jobs == 12 # min(3 * 4, 32)
assert options.quiet is False
assert options.pattern == ["*"]
diff --git a/tests/test_config_defaults.py b/tests/test_config_defaults.py
index 6d9723d..041eca0 100644
--- a/tests/test_config_defaults.py
+++ b/tests/test_config_defaults.py
@@ -4,7 +4,13 @@ from types import SimpleNamespace
import slug
+def _reset_config_cache():
+ """Reset the PLD config cache so tests don't leak state."""
+ slug._pld_config_cache = None
+
+
def test_apply_defaults_prefers_cli_over_config(monkeypatch):
+ _reset_config_cache()
options = argparse.Namespace(
packagesdir="/cli/packages",
jobs=9,
@@ -26,6 +32,7 @@ def test_apply_defaults_prefers_cli_over_config(monkeypatch):
def test_apply_defaults_reads_git_config_for_missing_values(monkeypatch):
+ _reset_config_cache()
options = argparse.Namespace()
monkeypatch.setattr(slug, "git_config_get", lambda key: {
@@ -44,6 +51,7 @@ def test_apply_defaults_reads_git_config_for_missing_values(monkeypatch):
def test_get_command_config_uses_builtin_when_unconfigured(monkeypatch):
+ _reset_config_cache()
monkeypatch.setattr(slug, "git_config_get", lambda key: None)
assert slug.get_command_config("pull") == [
@@ -54,6 +62,7 @@ def test_get_command_config_uses_builtin_when_unconfigured(monkeypatch):
def test_get_command_config_uses_shlex_split_for_user_config(monkeypatch):
+ _reset_config_cache()
monkeypatch.setattr(
slug,
"git_config_get",
@@ -67,12 +76,30 @@ def test_get_command_config_uses_shlex_split_for_user_config(monkeypatch):
def test_get_command_config_allows_empty_string_to_disable_defaults(monkeypatch):
+ _reset_config_cache()
monkeypatch.setattr(slug, "git_config_get", lambda key: "")
assert slug.get_command_config("fetch") == []
+def test_git_config_get_returns_value_from_batch_cache(monkeypatch):
+ """git_config_get uses _load_pld_config() to batch-read all PLD.* keys."""
+ _reset_config_cache()
+ monkeypatch.setattr(
+ slug.subprocess,
+ "run",
+ lambda *args, **kwargs: SimpleNamespace(
+ returncode=0,
+ stdout="pld.jobs 7\npld.packagesdir /custom\n",
+ ),
+ )
+
+ assert slug.git_config_get("PLD.jobs") == "7"
+ assert slug.git_config_get("PLD.packagesdir") == "/custom"
+
+
def test_git_config_get_returns_none_on_nonzero_status(monkeypatch):
+ _reset_config_cache()
monkeypatch.setattr(
slug.subprocess,
"run",
@@ -83,9 +110,28 @@ def test_git_config_get_returns_none_on_nonzero_status(monkeypatch):
def test_git_config_get_returns_none_when_git_is_missing(monkeypatch):
+ _reset_config_cache()
+
def fake_run(*args, **kwargs):
raise FileNotFoundError()
monkeypatch.setattr(slug.subprocess, "run", fake_run)
assert slug.git_config_get("PLD.jobs") is None
+
+
+def test_git_config_get_preserves_empty_value(monkeypatch):
+ """Empty value (e.g. 'PLD.fetch-config =') is '' not None."""
+ _reset_config_cache()
+ monkeypatch.setattr(
+ slug.subprocess,
+ "run",
+ lambda *args, **kwargs: SimpleNamespace(
+ returncode=0,
+ stdout="pld.fetch-config \n",
+ ),
+ )
+
+ # Empty value is distinguishable from missing key
+ assert slug.git_config_get("PLD.fetch-config") == ""
+ assert slug.git_config_get("PLD.nonexistent") is None
diff --git a/tests/test_failure_paths.py b/tests/test_failure_paths.py
index dc3fe25..0d8c894 100644
--- a/tests/test_failure_paths.py
+++ b/tests/test_failure_paths.py
@@ -168,6 +168,7 @@ def test_run_worker_handles_keyboard_interrupt(monkeypatch, make_options, capsys
state["joined"] = True
monkeypatch.setattr(slug, "WorkerPool", FakePool)
+ monkeypatch.setattr(slug, "kill_tracked_subprocesses", lambda: None)
with pytest.raises(SystemExit) as exc:
slug.run_worker(object(), make_options(jobs=3), [])
@@ -300,3 +301,68 @@ def test_fetch_packages_marks_errors_on_prune_delete_failure(monkeypatch, make_o
output = capsys.readouterr().err
assert "warning: removing /pkgs/orphan" in output
assert "error: failed to remove /pkgs/orphan: permission denied" in output
+
+
+# --- Prune refs reuse tests ---
+
+def test_prune_reuses_refs_when_branch_is_star(monkeypatch, make_options):
+ """When branch is ['*'], prune reuses the initial refs (one getrefs call)."""
+ calls = []
+
+ def counting_getrefs(*args):
+ calls.append(args)
+ return SimpleNamespace(
+ heads=collections.defaultdict(dict, {"pkg": {"refs/heads/master": "sha"}})
+ )
+
+ monkeypatch.setattr(slug, "getrefs", counting_getrefs)
+ monkeypatch.setattr(slug, "GitRepo", lambda path: SimpleNamespace(wtree=path, gdir=path + "/.git"))
+ monkeypatch.setattr(slug, "run_worker", lambda function, options, args: [])
+ monkeypatch.setattr(slug, "find_git_repos", lambda packagesdir, patterns: [])
+
+ options = make_options(
+ branch=["*"],
+ repopattern=["*"],
+ newpkgs=False,
+ prune=True,
+ depth=0,
+ omitexisting=False,
+ )
+
+ slug.fetch_packages(options)
+
+ # Only one getrefs call — the ['*'] result is reused for prune
+ assert len(calls) == 1
+ assert calls[0][0] == ["*"]
+
+
+def test_prune_fetches_separately_when_branch_is_filtered(monkeypatch, make_options):
+ """When branch != ['*'], prune makes a second getrefs call with ['*']."""
+ calls = []
+
+ def counting_getrefs(*args):
+ calls.append(args)
+ return SimpleNamespace(
+ heads=collections.defaultdict(dict, {"pkg": {"refs/heads/master": "sha"}})
+ )
+
+ monkeypatch.setattr(slug, "getrefs", counting_getrefs)
+ monkeypatch.setattr(slug, "GitRepo", lambda path: SimpleNamespace(wtree=path, gdir=path + "/.git"))
+ monkeypatch.setattr(slug, "run_worker", lambda function, options, args: [])
+ monkeypatch.setattr(slug, "find_git_repos", lambda packagesdir, patterns: [])
+
+ options = make_options(
+ branch=["devel"],
+ repopattern=["*"],
+ newpkgs=False,
+ prune=True,
+ depth=0,
+ omitexisting=False,
+ )
+
+ slug.fetch_packages(options)
+
+ # Two getrefs calls: first with ['devel'], second with ['*'] for prune
+ assert len(calls) == 2
+ assert calls[0][0] == ["devel"]
+ assert calls[1][0] == ["*"]
diff --git a/tests/test_gitrepo.py b/tests/test_gitrepo.py
index 283b572..996c11f 100644
--- a/tests/test_gitrepo.py
+++ b/tests/test_gitrepo.py
@@ -114,3 +114,77 @@ def test_check_remote_falls_back_to_packed_refs(tmp_path):
repo = GitRepo(str(repo_dir))
assert repo.check_remote("refs/heads/master") == "def456"
+
+
+# --- Refs cache tests ---
+
+def test_loose_refs_override_packed_refs(tmp_path):
+ """Loose ref file takes priority over packed-refs (git semantics)."""
+ repo_dir = tmp_path / "pkg"
+ gitdir = repo_dir / ".git"
+ # Write conflicting sha in packed-refs
+ gitdir.mkdir(parents=True)
+ (gitdir / "packed-refs").write_text("packed111 refs/remotes/origin/master\n")
+ # Write different sha in loose ref
+ loose_dir = gitdir / "refs" / "remotes" / "origin"
+ loose_dir.mkdir(parents=True)
+ (loose_dir / "master").write_text("loose222\n")
+
+ repo = GitRepo(str(repo_dir))
+
+ assert repo.check_remote("refs/heads/master") == "loose222"
+
+
+def test_check_remote_handles_branch_names_with_slash(tmp_path):
+ """Branch names with '/' create nested dirs (e.g. feature/foo)."""
+ repo_dir = tmp_path / "pkg"
+ gitdir = repo_dir / ".git"
+ # Create nested loose ref: refs/remotes/origin/feature/foo
+ nested_dir = gitdir / "refs" / "remotes" / "origin" / "feature"
+ nested_dir.mkdir(parents=True)
+ (nested_dir / "foo").write_text("abc999\n")
+
+ repo = GitRepo(str(repo_dir))
+
+ assert repo.check_remote("refs/heads/feature/foo") == "abc999"
+
+
+def test_non_default_remotes_have_separate_cache(tmp_path):
+ """Each remote gets its own loose refs cache."""
+ repo_dir = tmp_path / "pkg"
+ gitdir = repo_dir / ".git"
+
+ # Create loose refs for two different remotes
+ for remote, sha in [("origin", "orig111"), ("upstream", "up222")]:
+ ref_dir = gitdir / "refs" / "remotes" / remote
+ ref_dir.mkdir(parents=True)
+ (ref_dir / "master").write_text(sha + "\n")
+
+ repo = GitRepo(str(repo_dir))
+
+ assert repo.check_remote("refs/heads/master", "origin") == "orig111"
+ assert repo.check_remote("refs/heads/master", "upstream") == "up222"
+
+
+def test_packed_refs_cache_loaded_once(tmp_path, monkeypatch):
+ """packed-refs file is read only once, even across multiple check_remote calls."""
+ repo_dir = tmp_path / "pkg"
+ gitdir = repo_dir / ".git"
+ gitdir.mkdir(parents=True)
+ (gitdir / "packed-refs").write_text(
+ "aaa111 refs/remotes/origin/master\n"
+ "bbb222 refs/remotes/origin/devel\n"
+ )
+
+ repo = GitRepo(str(repo_dir))
+
+ # First call loads the cache
+ assert repo.check_remote("refs/heads/master") == "aaa111"
+
+ # Tamper with the file — cache should prevent re-reading
+ (gitdir / "packed-refs").write_text("ccc333 refs/remotes/origin/master\n")
+
+ # Second call uses cached value (not the tampered file)
+ assert repo.check_remote("refs/heads/devel") == "bbb222"
+ # master still returns old cached value
+ assert repo.check_remote("refs/heads/master") == "aaa111"
diff --git a/tests/test_passthrough.py b/tests/test_passthrough.py
index 77528d9..b3c0f5c 100644
--- a/tests/test_passthrough.py
+++ b/tests/test_passthrough.py
@@ -24,15 +24,22 @@ def test_build_git_cmd_includes_config_and_repo_dir():
]
+class FakeTrackedPopen:
+ """Fake TrackedPopen for testing git_passthrough_worker output handling."""
+ def __init__(self, returncode, stdout, stderr):
+ self.returncode = returncode
+ self._stdout = stdout.encode() if isinstance(stdout, str) else stdout
+ self._stderr = stderr.encode() if isinstance(stderr, str) else stderr
+
+ def communicate(self):
+ return (self._stdout, self._stderr)
+
+
def test_git_passthrough_worker_prints_all_output_for_failures(monkeypatch, capsys):
monkeypatch.setattr(
- slug.subprocess,
- "run",
- lambda *args, **kwargs: SimpleNamespace(
- returncode=1,
- stdout="stdout line\n",
- stderr="stderr line\n",
- ),
+ slug,
+ "TrackedPopen",
+ lambda *args, **kwargs: FakeTrackedPopen(1, "stdout line\n", "stderr line\n"),
)
result = slug.git_passthrough_worker(
@@ -51,13 +58,9 @@ def test_git_passthrough_worker_prints_all_output_for_failures(monkeypatch, caps
def test_git_passthrough_worker_quiet_suppresses_success_stdout_only(monkeypatch, capsys):
monkeypatch.setattr(
- slug.subprocess,
- "run",
- lambda *args, **kwargs: SimpleNamespace(
- returncode=0,
- stdout="all clean\n",
- stderr="warning message\n",
- ),
+ slug,
+ "TrackedPopen",
+ lambda *args, **kwargs: FakeTrackedPopen(0, "all clean\n", "warning message\n"),
)
result = slug.git_passthrough_worker(
diff --git a/tests/test_proctrack.py b/tests/test_proctrack.py
new file mode 100644
index 0000000..a2385af
--- /dev/null
+++ b/tests/test_proctrack.py
@@ -0,0 +1,207 @@
+"""Tests for git_slug.proctrack — TrackedPopen and kill_all."""
+
+import os
+import signal
+import subprocess
+import sys
+import time
+
+import pytest
+
+from git_slug.proctrack import TrackedPopen, kill_all, _active, _lock
+
+
+def _clear_active():
+ """Remove all entries from the active tracking set."""
+ with _lock:
+ _active.clear()
+
+
+class TestTrackedPopenRegistration:
+ """TrackedPopen auto-registers on init and unregisters on communicate/wait/poll."""
+
+ def setup_method(self):
+ _clear_active()
+
+ def teardown_method(self):
+ _clear_active()
+
+ def test_register_on_init_unregister_on_communicate(self):
+ proc = TrackedPopen(
+ [sys.executable, "-c", "print('hello')"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ # Registered immediately after creation
+ with _lock:
+ assert proc in _active
+
+ proc.communicate()
+
+ # Unregistered after communicate()
+ with _lock:
+ assert proc not in _active
+
+ def test_register_on_init_unregister_on_wait(self):
+ proc = TrackedPopen(
+ [sys.executable, "-c", "pass"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ with _lock:
+ assert proc in _active
+
+ proc.wait()
+
+ with _lock:
+ assert proc not in _active
+
+ def test_unregister_on_poll_when_finished(self):
+ proc = TrackedPopen(
+ [sys.executable, "-c", "pass"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ # Wait for the process to actually finish
+ proc.wait()
+ # Re-add to simulate the scenario where only poll() is used
+ with _lock:
+ _active.add(proc)
+
+ ret = proc.poll()
+
+ assert ret is not None
+ with _lock:
+ assert proc not in _active
+
+ def test_poll_keeps_tracked_while_running(self):
+ # Use a process that sleeps briefly so poll returns None
+ proc = TrackedPopen(
+ [sys.executable, "-c", "import time; time.sleep(5)"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ try:
+ ret = proc.poll()
+ # Process still running — poll returns None, stays tracked
+ assert ret is None
+ with _lock:
+ assert proc in _active
+ finally:
+ proc.terminate()
+ proc.wait()
+
+ def test_communicate_returns_output(self):
+ proc = TrackedPopen(
+ [sys.executable, "-c", "import sys; print('out'); print('err', file=sys.stderr)"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ out, err = proc.communicate()
+ assert b"out" in out
+ assert b"err" in err
+
+ def test_start_new_session_default(self):
+ """TrackedPopen starts in its own session by default."""
+ proc = TrackedPopen(
+ [sys.executable, "-c", "pass"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ proc.wait()
+ # The process should have its own session (pgid == pid)
+ # We can't easily test this after the process exits, but we
+ # can verify start_new_session was set by checking the process
+ # ran without error.
+ assert proc.returncode == 0
+
+
+class TestKillAll:
+ """kill_all() sends SIGTERM to entire process groups."""
+
+ def setup_method(self):
+ _clear_active()
+
+ def teardown_method(self):
+ _clear_active()
+
+ def test_kill_all_terminates_running_process(self):
+ """kill_all() terminates a tracked process via process group signal."""
+ proc = TrackedPopen(
+ [sys.executable, "-c", "import time; time.sleep(60)"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ with _lock:
+ assert proc in _active
+
+ kill_all()
+
+ # Process should be terminated (may take a moment)
+ proc.wait(timeout=5)
+ # SIGTERM results in negative returncode on POSIX
+ assert proc.returncode != 0
+
+ def test_kill_all_reaches_child_processes(self):
+ """kill_all() kills the process group, including child processes."""
+ # Parent spawns a child that also sleeps. Both should be killed
+ # because they share a process group (start_new_session=True makes
+ # the parent the group leader).
+ script = (
+ "import subprocess, time; "
+ "subprocess.Popen(['sleep', '60']); "
+ "time.sleep(60)"
+ )
+ proc = TrackedPopen(
+ [sys.executable, "-c", script],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+
+ # Brief pause so the child subprocess actually starts
+ time.sleep(0.2)
+
+ kill_all()
+
+ proc.wait(timeout=5)
+ assert proc.returncode != 0
+
+ def test_kill_all_ignores_already_dead_processes(self):
+ """kill_all() doesn't crash on processes that already exited."""
+ proc = TrackedPopen(
+ [sys.executable, "-c", "pass"],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ proc.wait()
+ # Process is dead but still tracked (wait() unregistered it,
+ # so re-add to test the edge case)
+ with _lock:
+ _active.add(proc)
+
+ # Should not raise
+ kill_all()
+
+
+class TestWithGitRepo:
+ """TrackedPopen works correctly when used via GitRepo."""
+
+ def setup_method(self):
+ _clear_active()
+
+ def teardown_method(self):
+ _clear_active()
+
+ def test_commandexc_unregisters_after_completion(self):
+ """GitRepo.commandexc() properly unregisters the TrackedPopen."""
+ from git_slug.gitrepo import GitRepo
+
+ repo = GitRepo(None, None)
+ # 'git --version' should succeed everywhere
+ repo.command_prefix = ["git"]
+ out, err = repo.commandexc(["--version"])
+
+ assert b"git version" in out
+ # After commandexc returns, the process should be untracked
+ with _lock:
+ assert len(_active) == 0
================================================================
---- gitweb:
http://git.pld-linux.org/gitweb.cgi/projects/git-slug.git/commitdiff/4a7e426b8f1a3571094b5dc89412bc49b8f29666
More information about the pld-cvs-commit
mailing list