Package system

Joerg Sonnenberger posted a long writeup of possible directions for a packaging system. I’m repasting it verbatim, as there’s no need to sum up yet.

“after the long discussion about system installation and configuration, I
wanted to start discussing another important part of the system.

I want to discuss the pro and cons of using a virtual filesystem like
mentioned in the webspace. The filesystem horizon of a application is
influential in exactly two different situation. The first one is
building the application, the second is the runtime.

Restricting the visibility at build time allows semi-automatic
dependency tracing and permits stable builds, that means reproducable
builds. It is especially useful for packages supporting different
versions of one dependency (e.g. mc support glib1 and glib2) and those
which compiles additional code based on the packages found. Most package
system, building from source or not, do not provide such a facility and
depend on the maintainer “to do the right thing”. Others tried and
succeded at least partly (buildlink on NetBSD).

To achieve these restricted visibility the ports system (aka the source
build facility) could automatically create a chroot environment for the
build using the dependencies mentioned or provide a maintainer mode
where filesystem accesses are traced, ldd is execed and so on. These
needs a lot of space (for a copy of all dependencies) and a lot of time
to set up (same reason). But it would work right now.

Having a configurable virtual filesystem like a read-only nullfs with an
ability to selectivly filter the accessable directory entries, would
have an almost neglectiable impact in terms of build time and have its
use for normal system operation, too.

For the second situation, the package runtime, having such a restricted
visibility would bring in a lot of hassle. First of all, the only way
would be to install software e.g. under /usr/pkg and map the “visible”
tree under e.g. /usr/act_pkg. To hide all installed and unneeded
packages from the installed package is neither useful nor working. Just
think of a filemanager or a package management system.

It is important to differenciate between the various types of
dependencies. The first and simpliest is the build dependency. The
autotools are a good generic example. You might want to have four
versions of autoconf installed, but for building a special package (or
port) only one version is used and therefore visible. Those packages can
be updated without touching the dependent package.

The second and also easy to solve one is the library dependency. It
consists of a build dependency (the header files) and either a static or
shared library. The static library is another kind of build dependency,
the shared library is versioned by default. There is no problem to have
different versions of the same shared library installed as long as the
binary version is updated as needed.

The third one is an interpreter dep. For most interpreters it is
possible to adjust a few directory entries to provide versioning. Naming
the interpreting python2.3 or perl5.8 along with the embedded path names
allows having different version installed without conflicts. Special
handling is often necessary for extensions like modules, which are
installed on those version specific paths. Therefore modules dependent
on the interpreter version.

The last dependency is the need of a application. E.g. tla needs gtar,
gpatch and gdiff. Based on a per-package choice, those could be
versioned and coded into the package (e.g. gtar-1.13.52) or not.
Per-package as in the maintainer of gtar determines how stable and
backward compatible the interface is.

To better support number two and tree, adding a slot mechanism like
Gentoo’s portage has is worth a discussion. For those not familiar with
portage: A slot is an addition to the version number allowing multiple
packages of the same name with different slot version to coexist
peacefully. I.e. autoconf has version 2.13 providing slot 2.13, version
2.53 providing slot 2.53 and version 2.54 providing the same slot. It is
know possible to have both version 2.13 and 2.53 installed, but not
version 2.53 and 2.54. If asked to update version 2.53 can be updated to
version 2.54 and leaves 2.13 intact.

The second important aspect is the functionality provided and used by
the package management. I like the idea of OpenBSD’s ports system to
always build a package via the port mechanism and afterwards install it
using the pkg tools. Doing it that way is IMO necessary to support the
enforced visibility, because the installation has to go to some kind of
scratch area first. Therefore I will focus on the actual packagement and
not the build facilities.

Now, what kind of packagement systems do exist. The following are
perhaps the more important and used ones:
– BSD pkgs
– RPM packages
– DEB packages
– InstallShield/WISE packages

BSD pkgs and DEB packages have the advantage of using standard tools
availible on almost any Unix system (ar, tar, gzip, bzip2). Both can be
read and written by clever shell scripts. One important diffrence is the
existence of meta informations. DEB packages includes changelogs,
detailed dependencies, copyright informations and a classifaction of
normal files, documentation and configuration files. The dependencies
supported are “Depend on”, “Conflict with”, “Suggest”. DEB packages do
not support relocation or at least it is not a supported option.

RPM packages are a special purpose format similiar and convertable to
cpio archives. They have a similiar feature set as DEB packages and
basic support is a requirement for the Linux Standard Base (just to
mention that). RPM archives offer a relocation flag based on a
filesystem tree base. E.g. /opt/kde might be marked relocable and end up
as /usr/local.

The InstallShield setup program commonly known in the M$ world offers a
graphical selection mechanism for choosing those parts of an application
which are to be installed and where. The application is then installed
to the choosen path (minus DLLs installed under C:\Windows) and

Which of those package systems’ feature are worth implementing? I
suggest the following feature set:
– fine grained subpackages,
– depend, conflict, suggest informations,
– support for relocation on a file tree base,
– tagging of individual files for special handling,
– extensible meta data support.

The first point is useful for libraries to separate RTE and development
files. I’d like to offer a single package file for distribution, but
separating a subpackage should still be possible.

The second point is arguable, but providing “softdeps” for packages like
transcode which can utilize lots of external tools without any change is
valuable for every end user.

The relocation support might be worse implementing it or not, we should
discuss this. Does anyone use it?

The forth point simplifies handling of documentation e.g. info files,
cateloges and other files which need some special registration. It could
be used for automatic byte compiling of Python programs, too.

The last point should be a basic requirement of the whole system. It
should extensible to feature uses we haven’t thought of. It must be
possible to support I18N later.

The third part I want to talk about is the package format in details.
Employing a standard format like tar has it uses, but there are weak
points, too. ATM we have a list of included files, directory to be
deleted with the package and commands to be executed. We also have a tar
archive including these files. That seems redundant.

An interesting alternative would be the deployment of a hierachical
format like XML. For example:

<name>Mozilla</name> <version>1.4</version>
<name> Browser </name>
<file content=”1″mode=”0755″>/usr/local/bin/mozilla</file>
<name> Mail-Client </name>
<file content=”2″ mode=”0755″>/usr/local/bin/mozilla-mail</file>
<directory mode=”0755″>/usr/local/share/doc/mozilla-mail</directory>
<file content=”3″ mode=”0644″>
<file content=”3″ mode=”0644″ type=”info”>
<content id=”1″> … </content>


Actually I don’t really want to use XML, because the handling of binary
data is a head ache. An interesting base could be EBML
( I just don’t want to reinvent the wheel
;-). Having such a format can be even more portable then tarballs,
because of the problems of long path names, ACL support (yes, I want
this) and other “portable” problems. The input for the package creation
program could be an enriched PLIST, DESCR and so on like we already have
or direct XML. The later one is interesting if we have a real port editor.”