APT programming snippets for Debian system maintenance
Feb 15, 2017DebianPythonAPT
The Python API for the Debian package manager APT is useful for writing practical system maintenance scripts, which are going beyond shell scripting capabilities. There are Python2 and Python3 libraries for that available , as well as a documentation in the package python-apt-doc. If that’s also installed, the documentation then could be found in /usr/share/doc/python-apt-doc/html/index.html
, and there are also a couple of example scripts shipped into /usr/share/doc/python-apt-doc/examples
. The mainly consists of Python bindings for the libapt-inst and libapt-pkg C++ core libraries of the , which makes it processing very fast. Debugging symbols are also available as packages (python{,3}-apt-dbg). The module apt_inst
provides features like reading from binary packages, while apt_pkg
resembles the functions of the package manager. There is also the apt
abstraction layer which provides more convenient access to the library, like apt.cache.Cache()
could be used to behave like apt-get
:
from apt.cache import Cache
mycache = Cache()
mycache.update() # apt-get update
mycache.open() # re-open
mycache.upgrade(dist_upgrade=True) # apt-get dist-upgrade
mycache.commit() # apply
boil out selections
As widely known, there is a feature of dpkg
which helps to move a package inventory from one installation to another by just using a text file with a list of installed packages. A selections list containing all installed package could be easily generated with $ dpkg --get-selections > selections.txt
. The resulting file then looks something similar like this:
$ cat selections.txt
0ad install
0ad-data install
0ad-data-common install
a2ps install
abi-compliance-checker install
abi-dumper install
abigail-tools install
accountsservice install
acl install
acpi install
The counterpart for this operation (--set-selections
) could be used to reinstall (add) the complete package inventory on another installation resp. computer (that needs superuser rights), like that’s explained in the manpage dpkg(1)
. No problem so far.
The problem is, if that list contains a package which couldn’t be found in any of the package inventories which are set up in /etc/apt/sources.list(.d/)
on the target system, dpkg stops the whole process:
# dpkg --set-selections < selections.txt
dpkg: warning: package not in database at line 524: google-chrome-beta
dpkg: warning: found unknown packages; this might mean the available database
is outdated, and needs to be updated through a frontend method
Thus, manually downloaded and installed “wild” packages from unofficial package sources are problematic for this approach, because the package installer simply doesn’t know where to get them.
Luckily, dpkg puts out the relevant package names, but instead of having them removed manually with an editor this little Python script for python3-apt automatically deletes any of these packages from a selections file:
#!/usr/bin/env python3
import sys
import apt_pkg
apt_pkg.init()
cache = apt_pkg.Cache()
infile = open(sys.argv[1])
outfile_name = sys.argv[1] + '.boiled'
outfile = open(outfile_name, "w")
for line in infile:
package = line.split()[0]
if package in cache:
outfile.write(line)
infile.close()
outfile.close()
sys.exit(0)
The script takes one argument which is the name of the selections file which has been generated by dpkg. The low level module apt_pkg
first has to been initialized with apt_pkg.init()
. Then apt_pkg.Cache()
can be used to instantiate a cache object (here: cache
). That object is iterable, thus it’s easy to not perform something if a package from that list couldn’t be found in the database, like not copying the corresponding line into the outfile (.boiled
), while the others are copied.
The result then looks something like this:
$ diff selections.txt selections.txt.boiled
3780d3779
< python-timemachine install
4438d4436
< wlan-supercracker install
That script might be useful also for moving from one distribution resp. derivative to another (like from Ubuntu to Debian). For productive use, open()
should be of course secured against FileNotFound and IOError-s to prevent program crashs on such events.
purge rc-s
Like also widely known, deinstalled packages leave stuff like configuration files, maintainer scripts and logs on the computer, to save that if the package gets reinstalled at some point in the future. That happens if dpkg has been used with -r/--remove
instead of -P/--purge
, which also removes these files which are left otherwise.
These packages are then marked as rc
in the package archive, like:
$ dpkg -l | grep ^rc
rc firebird2.5-common 2.5.6.27020.ds4-3 amd64 common files for firebird 2.5 servers and clients
rc firebird2.5-server-common 2.5.6.27020.ds4-3 amd64 common files for firebird 2.5 servers
rc firebird3.0-common 3.0.1.32609.ds4-8 all common files for firebird 3.0 server, client and utilities
rc imagemagick-common 8:6.9.6.2+dfsg-2 all image manipulation programs -- infrastructure dummy package
It could be purged over them afterwards to completely remove them from the system. There are several shell coding snippets to be found on the net for completing this job automatically, like this one here:
dpkg -l | grep "^rc" | sed e "s/^rc //" e "s/ .*$//" | \
xargs dpkg purge
The first thing which is needed to handle this by a Python script is the information that in apt_pkg
, the package state rc
per default is represented by the code 5
:
>>> testpackage = cache['firebird2.5-common']
>>> testpackage.current_state
5
For changing things in the database apt_pkg.DepCache()
could be docked onto an cache object to manipulate the installation state of a package within, like marking it to be removed resp. purged:
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> mydepcache.mark_delete(testpackage, True) # True = purge
>>> mydepcache.marked_delete(testpackage)
True
That’s basically all which is needed for an old package purging maintenance script in Python 3, another iterator as package filter and there you go:
#!/usr/bin/env python3
import sys
import apt_pkg
from apt.progress.text import AcquireProgress
from apt.progress.base import InstallProgress
acquire = AcquireProgress()
install = InstallProgress()
apt_pkg.init()
cache = apt_pkg.Cache()
depcache = apt_pkg.DepCache(cache)
for paket in cache.packages:
if paket.current_state == 5:
depcache.mark_delete(paket, True)
depcache.commit(acquire, install)
The method DepCache.commit()
applies the changes to the package archive at the end, and it needs apt_progress
to perform.
Of course this script needs superuser rights to run. It then returns something like this:
$ sudo ./rc-purge
Reading package lists... Done
Building dependency tree
Reading state information... Done
Fetched 0 B in 0s (0 B/s)
custom fork found
got pid: 17984
got pid: 0
got fd: 4
(Reading database ... 701434 files and directories currently installed.)
Purging configuration files for libmimic0:amd64 (1.0.4-2.3) ...
Purging configuration files for libadns1 (1.5.0~rc1-1) ...
Purging configuration files for libreoffice-sdbc-firebird (1:5.2.2~rc2-2) ...
Purging configuration files for vlc-nox (2.2.4-7) ...
Purging configuration files for librlog5v5 (1.4-4) ...
Purging configuration files for firebird3.0-common (3.0.1.32609.ds4-8) ...
Purging configuration files for imagemagick-common (8:6.9.6.2+dfsg-2) ...
Purging configuration files for firebird2.5-server-common (2.5.6.27020.ds4-3)
It’s not yet production ready (like there’s an infinite loop if dpkg returns error code 1 like from “can’t remove non empty folder”). But generally, ATTENTION: be very careful with typos and other mistakes if you want to use that code snippet, a false script performing changes on the package database might destroy the integrity of your system, and you don’t want that to happen.
detect “wild” packages
Like said above, installed Debian packages might be called “wild” if they have been downloaded from somewhere on the net and installed manually, like that is done from time to time on many systems. If you want to remove that whole class of packages again for any reason, the question would be how to detect them. A characteristic element is that there is no source connected to such a package, and that could be detected by Python scripting using again the bindings for the APT libraries.
The package object doesn’t have an associated method to query its source, because the origin is always connected to a specific package version, like some specific version might have come from security updates for example. The current version of a package can be queried with DepCache.get_candidate_ver()
which returns a complex apt_pkg.Version
object:
>>> import apt_pkg
>>> apt_pkg.init()
>>> mycache = apt_pkg.Cache()
Reading package lists... Done
Building dependency tree
Reading state information... Done
>>> mydepcache = apt_pkg.DepCache(mycache)
>>> testpackage = mydepcache.get_candidate_ver(mycache['nano'])
>>> testpackage
For version objects there is the method file_list
available, which returns a list containing PackageFile()
objects:
>>> testpackage.file_list
[(, 669901L)]
These file objects contain the index files which are associated with a specific package source (a downloaded package index), which could be read out easily (using a for-loop because there could be multiple file objects):
>>> for files in testpackage.file_list:
... print(files[0].filename)
/var/lib/apt/lists/httpredir.debian.org_debian_dists_testing_main_binary-amd64_Packages
That explains itself: the nano binary package on this amd64 computer comes from httpredir.debian.org/debian testing main
. If a package is “wild” that means it was installed manually, so there is no associated index file to be found, but only /var/lib/dpkg/status
(libcudnn5 is not in the official package archives but distributed by Nvidia as a .deb package):
>>> testpackage2 = mydepcache.get_candidate_ver(mycache['libcudnn5'])
>>> for files in testpackage2.file_list:
... print(files[0].filename)
/var/lib/dpkg/status
The simple trick now is to find all packages which have only /var/lib/dpkg/status
as associated system file (that doesn’t refer to what packages contain), an not an index file representing a package source. There’s a little pitfall: that’s truth also for . But virtual packages commonly don’t have an associated version (python-apt docs: “to check whether a package is virtual; that is, it has no versions and is provided at least once”), and that can be queried by Package.has_versions
. A filter to find out any packages that aren’t virtual packages, are solely associated to one system file, and that file is /var/lib/dpkg/status
, then goes like this:
for package in cache.packages:
if package.has_versions:
version = mydepcache.get_candidate_ver(package)
if len(version.file_list) == 1:
if 'dpkg/status' in version.file_list[0][0].filename:
print(package.name)
On my Debian testing system this puts out a quite interesting list. It lists all the wild packages like libcudnn5, but also packages which are recently not in testing because they have been temporarily removed by AUTORM due to release critical bugs. Then there’s all the obsolete stuff which have been installed from the package archives once and then being forgotten like old kernel header packages (“obsolete packages” in dselect). So this snippet brings up other stuff, too. Thus, this might be more experimental stuff so far, though.