APT programming snippets for Debian system maintenance

The Python API for the Debian package manager APT is useful for writing practical system maintenance scripts, which are going beyond shell scripting capabilities. There are Python2 and Python3 libraries for that available , as well as a documentation in the package python-apt-doc. If that’s also installed, the documentation then could be found in /usr/share/doc/python-apt-doc/html/index.html, and there are also a couple of example scripts shipped into /usr/share/doc/python-apt-doc/examples. The mainly consists of Python bindings for the libapt-inst and libapt-pkg C++ core libraries of the , which makes it processing very fast. Debugging symbols are also available as packages (python{,3}-apt-dbg). The module apt_inst provides features like reading from binary packages, while apt_pkg resembles the functions of the package manager. There is also the apt abstraction layer which provides more convenient access to the library, like apt.cache.Cache() could be used to behave like apt-get:

from apt.cache import Cache
mycache = Cache()
mycache.update()                   # apt-get update
mycache.open()                     # re-open
mycache.upgrade(dist_upgrade=True) # apt-get dist-upgrade
mycache.commit()                   # apply

boil out selections

As widely known, there is a feature of dpkg which helps to move a package inventory from one installation to another by just using a text file with a list of installed packages. A selections list containing all installed package could be easily generated with $ dpkg --get-selections > selections.txt. The resulting file then looks something similar like this:

$ cat selections.txt
0ad                                 install
0ad-data                            install
0ad-data-common                     install
a2ps                                install
abi-compliance-checker              install
abi-dumper                          install
abigail-tools                       install
accountsservice                     install
acl                                 install
acpi                                install

The counterpart for this operation (--set-selections) could be used to reinstall (add) the complete package inventory on another installation resp. computer (that needs superuser rights), like that’s explained in the manpage dpkg(1). No problem so far.

The problem is, if that list contains a package which couldn’t be found in any of the package inventories which are set up in /etc/apt/sources.list(.d/) on the target system, dpkg stops the whole process:

# dpkg --set-selections < selections.txt
dpkg: warning: package not in database at line 524: google-chrome-beta
dpkg: warning: found unknown packages; this might mean the available database
is outdated, and needs to be updated through a frontend method

Thus, manually downloaded and installed “wild” packages from unofficial package sources are problematic for this approach, because the package installer simply doesn’t know where to get them.

Luckily, dpkg puts out the relevant package names, but instead of having them removed manually with an editor this little Python script for python3-apt automatically deletes any of these packages from a selections file:

#!/usr/bin/env python3
import sys
import apt_pkg

apt_pkg.init()
cache = apt_pkg.Cache()

infile = open(sys.argv[1])
outfile_name = sys.argv[1] + '.boiled'
outfile = open(outfile_name, "w")

for line in infile:
    package = line.split()[0]
    if package in cache:
        outfile.write(line)

infile.close()
outfile.close()
sys.exit(0)

The script takes one argument which is the name of the selections file which has been generated by dpkg. The low level module apt_pkg first has to been initialized with apt_pkg.init(). Then apt_pkg.Cache() can be used to instantiate a cache object (here: cache). That object is iterable, thus it’s easy to not perform something if a package from that list couldn’t be found in the database, like not copying the corresponding line into the outfile (.boiled), while the others are copied.

The result then looks something like this:

$ diff selections.txt selections.txt.boiled 
3780d3779
< python-timemachine   install
4438d4436
< wlan-supercracker    install

That script might be useful also for moving from one distribution resp. derivative to another (like from Ubuntu to Debian). For productive use, open() should be of course secured against FileNotFound and IOError-s to prevent program crashs on such events.

purge rc-s

Like also widely known, deinstalled packages leave stuff like configuration files, maintainer scripts and logs on the computer, to save that if the package gets reinstalled at some point in the future. That happens if dpkg has been used with -r/--remove instead of -P/--purge, which also removes these files which are left otherwise.

These packages are then marked as rc in the package archive, like:

$ dpkg -l | grep ^rc
rc  firebird2.5-common          2.5.6.27020.ds4-3   amd64   common files for firebird 2.5 servers and clients
rc  firebird2.5-server-common   2.5.6.27020.ds4-3   amd64   common files for firebird 2.5 servers
rc  firebird3.0-common          3.0.1.32609.ds4-8   all     common files for firebird 3.0 server, client and utilities
rc  imagemagick-common          8:6.9.6.2+dfsg-2    all     image manipulation programs -- infrastructure dummy package

It could be purged over them afterwards to completely remove them from the system. There are several shell coding snippets to be found on the net for completing this job automatically, like this one here:

dpkg -l | grep "^rc" | sed ­e "s/^rc //" ­e "s/ .*$//" | \
xargs dpkg ­­purge

The first thing which is needed to handle this by a Python script is the information that in apt_pkg, the package state rc per default is represented by the code 5:

>>> testpackage = cache['firebird2.5-common']
>>> testpackage.current_state
5

For changing things in the database apt_pkg.DepCache() could be docked onto an cache object to manipulate the installation state of a package within, like marking it to be removed resp. purged:

>>> mydepcache = apt_pkg.DepCache(mycache)
>>> mydepcache.mark_delete(testpackage, True) # True = purge
>>> mydepcache.marked_delete(testpackage)
True

That’s basically all which is needed for an old package purging maintenance script in Python 3, another iterator as package filter and there you go:

#!/usr/bin/env python3
import sys
import apt_pkg

from apt.progress.text import AcquireProgress
from apt.progress.base import InstallProgress
acquire = AcquireProgress()
install = InstallProgress()

apt_pkg.init()
cache = apt_pkg.Cache()
depcache = apt_pkg.DepCache(cache)

for paket in cache.packages:
    if paket.current_state == 5:
        depcache.mark_delete(paket, True)

depcache.commit(acquire, install)

The method DepCache.commit() applies the changes to the package archive at the end, and it needs apt_progress to perform.

Of course this script needs superuser rights to run. It then returns something like this:

$ sudo ./rc-purge 
Reading package lists... Done
Building dependency tree
Reading state information... Done
Fetched 0 B in 0s (0 B/s)
custom fork found
got pid: 17984
got pid: 0
got fd: 4
(Reading database ... 701434 files and directories currently installed.)
Purging configuration files for libmimic0:amd64 (1.0.4-2.3) ...
Purging configuration files for libadns1 (1.5.0~rc1-1) ...
Purging configuration files for libreoffice-sdbc-firebird (1:5.2.2~rc2-2) ...
Purging configuration files for vlc-nox (2.2.4-7) ...
Purging configuration files for librlog5v5 (1.4-4) ...
Purging configuration files for firebird3.0-common (3.0.1.32609.ds4-8) ...
Purging configuration files for imagemagick-common (8:6.9.6.2+dfsg-2) ...
Purging configuration files for firebird2.5-server-common (2.5.6.27020.ds4-3)

It’s not yet production ready (like there’s an infinite loop if dpkg returns error code 1 like from “can’t remove non empty folder”). But generally, ATTENTION: be very careful with typos and other mistakes if you want to use that code snippet, a false script performing changes on the package database might destroy the integrity of your system, and you don’t want that to happen.

detect “wild” packages

Like said above, installed Debian packages might be called “wild” if they have been downloaded from somewhere on the net and installed manually, like that is done from time to time on many systems. If you want to remove that whole class of packages again for any reason, the question would be how to detect them. A characteristic element is that there is no source connected to such a package, and that could be detected by Python scripting using again the bindings for the APT libraries.

The package object doesn’t have an associated method to query its source, because the origin is always connected to a specific package version, like some specific version might have come from security updates for example. The current version of a package can be queried with DepCache.get_candidate_ver() which returns a complex apt_pkg.Version object:

>>> import apt_pkg
>>> apt_pkg.init()
>>> mycache = apt_pkg.Cache()
Reading package lists... Done
Building dependency tree
Reading state information... Done
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> testpackage = mydepcache.get_candidate_ver(mycache['nano'])
>>> testpackage

For version objects there is the method file_list available, which returns a list containing PackageFile() objects:

>>> testpackage.file_list
[(, 669901L)]

These file objects contain the index files which are associated with a specific package source (a downloaded package index), which could be read out easily (using a for-loop because there could be multiple file objects):

>>> for files in testpackage.file_list:
...     print(files[0].filename)
/var/lib/apt/lists/httpredir.debian.org_debian_dists_testing_main_binary-amd64_Packages

That explains itself: the nano binary package on this amd64 computer comes from httpredir.debian.org/debian testing main. If a package is “wild” that means it was installed manually, so there is no associated index file to be found, but only /var/lib/dpkg/status (libcudnn5 is not in the official package archives but distributed by Nvidia as a .deb package):

>>> testpackage2 = mydepcache.get_candidate_ver(mycache['libcudnn5'])
>>> for files in testpackage2.file_list:
...     print(files[0].filename)
/var/lib/dpkg/status

The simple trick now is to find all packages which have only /var/lib/dpkg/status as associated system file (that doesn’t refer to what packages contain), an not an index file representing a package source. There’s a little pitfall: that’s truth also for . But virtual packages commonly don’t have an associated version (python-apt docs: “to check whether a package is virtual; that is, it has no versions and is provided at least once”), and that can be queried by Package.has_versions. A filter to find out any packages that aren’t virtual packages, are solely associated to one system file, and that file is /var/lib/dpkg/status, then goes like this:

for package in cache.packages:
    if package.has_versions:
        version = mydepcache.get_candidate_ver(package)
        if len(version.file_list) == 1:
            if 'dpkg/status' in version.file_list[0][0].filename:
                print(package.name)

On my Debian testing system this puts out a quite interesting list. It lists all the wild packages like libcudnn5, but also packages which are recently not in testing because they have been temporarily removed by AUTORM due to release critical bugs. Then there’s all the obsolete stuff which have been installed from the package archives once and then being forgotten like old kernel header packages (“obsolete packages” in dselect). So this snippet brings up other stuff, too. Thus, this might be more experimental stuff so far, though.