A New Package Manager – Dark Horse Linux

It’s been too long!

Another round of development on Dark Horse Linux is about to take place.

Beyond just a refresh of Pyrois, and consequently Dyad, package management will be introduced.

I thought long and hard about this, and even changed my mind a few times, but Dark Horse Linux will not use RPM for its native package manager as originally intended. It will instead use its own native package manager, and will have a utility that can convert RPM packages (and a few other package formats) to this native package manager’s format.

RPM hits all the requirements, but it’s got too many legacy features and introduces alot of complexity to what should be a very simple thing. It has many design decisions baked deeply into it that would have certainly been “one way to do it” a long time ago, but even then — we could have done much better with alot less complexity. To top that off, other mainstream package managers are rather derivative as opposed to being a product of a functionality need, and so borrowed alot of those concepts into their implementations as well. New package managers not necessarily tied to a distribution have a different problem, and that’s integration with the core system packages for whole system lifecycle management of packages on the OS, and they lead to configuration management issues, unnecessary system resource overhead and incompatibilities that are unnecessarily complex to troubleshoot — at a cost of long term reliability in alot of cases. We can just do so much better.

It’s easy to say that “Well, RPM can do alot”, and it can, but it shouldn’t. All of the things that RPM can do can be done better by returning to the UNIX design philosophy, with dedicated purpose components for each type of thing. We won’t lose anything this way, and we stand to do better than RedHat and RedHat derivatives if Dark Horse goes its own way with package management.

Designing a package manager is a daunting task. There’s alot that goes into it because we have overloaded what a package manager is. If you look at the first 2 sentences of the wikipedia page for “Package manager”, you’ll see it’s a pretty overloaded term there, too:

A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.^[1]

Truthfully, it doesn’t have to be that big to be secure, and usable for all the things consumers (and enterprises) need a package manager to do as of today. Essentially, it just needs to manage the lifecycle of versioned collections of files on the system. That’s it.

If you look at what RPM is doing, or if you’re already familiar with RedHat systems, you’ll see that the process of creating an RPM also involves often compiling the software as part of that rpmbuild process, and RPMbuild itself if you’ve ever built with it dedicates a great deal of its features to navigating the arduous process of compiling software. Even without compiling it’s like navigating a complex maze to build the package.

This is an artifact from an earlier time, when the users were regularly compiling software and wanted to reduce or automate their compile burden and just bundle the whole product lifecycle into that process. However, the user profile since those days changed to something else — a user base that didn’t do quite so much of that. The developer profile changed too — RPM is such an involved drag to work with for modern users that write software that they outright refuse to use it at all, even for major products. This happens in both enterprise and FOSS projects. So, what we’ve all ended up with is a huge ecosystem of software whose install process is more often than not just spraying files onto servers like animals, or some big bloated ansible job that eats everybody’s time to maintain and is prone to failures, or the use of super shitty third party package management systems like snap, or flatpak, or any of the other absolute train wrecks developers are writing to get around just packaging their software. They can compile their software just fine — their barrier is the complexity that many native package managers for even mainstream distributions introduce and it’s alot of unnecessary work. If you want an example, the Signal project, which actually wants to package their software properly due to the security implications, has been arguing about the right approach to build RPMs for four years and counting as of the time of this writing. If you want another project stunted by this issue, there’s Gitea, who’s in the same boat. This is hurting those projects. Surely enough, flatpak and snap got packages first for it, because they don’t create unnecessary complexity barriers to format adoption even if they are horrid, awful package management systems to use.

At the end of the day, RPM is a /good/ package manager compared to what currently exists, and I’m going to probably have some pieces that resemble what it currently does, done in much simpler ways than RPM does it.

For example, while RPM is installing the package on the OS, they have a utility called yum/dnf that wraps around it to pull from repos. That’s a good way to do that to keep purposes of components separated. DNF/YUM is pretty solid. I have some complaints about its repo management aspects, but it’s good.

Then they added a bunch of cruft to that too, with plugins and other things that don’t apply for the overwhelming majority of their uses cases but get used just enough to create a ridiculous spread in developer practice for building the packages, which really should just be very straightforward and boringly uniform.

Then, at some point, licensing was introduced as more barriers. To me, this is purpose defeating for a Linux distribution. Even if you sell support and subscriptions, the OS itself should be freely available. Any DRM you bake into it is going to deviate from the F/OSS approach and introduce fragility and trust issues. The way you make money with it is by selling something of value to people using it: support, add-ons etc.

So, the Dark Horse Linux package manager, yet to be given a name, will operate in a manner that assumes that the software is already compiled. It won’t care how you compiled your software and it expects you to have already done so. Let’s call it “build system agnostic”.

The package creation component will consist of some files that identify which files are in the package, their checksums, and whether they are “controlled by the package” or “not controlled by the package” (such as configuration files that should be left alone during an update). It will have an option for signing and there will be a package manifest for integrity validation. It will do dependency resolution.

There’s obviously an entire cycle that needs spent on design, but currently I’m envisioning something like this for the package structure:

package-name-1.0.0.pkg.xz
├── metadata/                    
│   ├── FILES_DIGEST
│   ├── FILES_DIGEST.sig
│   ├── PACKAGE_DIGEST
│   ├── DEPENDENCIES
│   └── contents.sig
└── contents.xz

This contains some elements of slackpkg, as well as RPM. The package itself is just an XZ archive (I may end up using a gzipped tarball instead).

At the top level is a directory named metadata. There is also a contents.xz that contains a directory structure of the packages in the paths they are intended to be on the system, from the context of the root filesystem:

contents.xz/
├── etc
│   └── myapp
│       └── conf.d
│           └── main.config
└── usr
    └── bin
        └── myapp

Under metadata, I envision a FILES_DIGEST and a PACKAGE_DIGEST. The FILES_DIGEST would contain something like a line delimited table:

C $CHECKSUM /usr/bin/myapp
N $CHECKSUM /etc/myapp/conf.d/main.config

Here you can see files marked as “CONTROLLED” or “NOT CONTROLLED”, to indicate whether update and remove operations will replace those files.

And then the package digest would perhaps be the checksum of the concatenated checksums of the files. And then notably the signature files and more to flesh out, with the signature aspect being optional.

Another piece is dependencies. I think a line delimited list of rules that it would check would be sufficient:

glibc > 2.21.0
glibc < 2.42.0
libstdc++ > 0

And then perhaps a few additional files like for AUTHOR, NAME, VERSION, ARCHITECTURE, DESCRIPTION, SOURCE et al that we’re all used to seeing.

Next is the database. You do need a database for querying installed packages, their names, description, dependencies, all the things I listed — including what files that package provides, or perhaps reverse lookups for which package provides a certain file, or even a check to say “have any of the controlled files this package has installed been modified” that does a little checksum comparison and reports accordingly based on what’s in FILES_DIGEST and PACKAGE_DIGEST and perhaps some additional cryptographic mechanism to ensure it hasn’t been tampered with (this is an example of where I’m going with it, and has issues to work out).

The problem with a package database is you can hose the system pretty good if you rely on a database as a single point of failure. Even RPM suffers from this where if the rpmdb is wiped you can recreate it but it’s not really “all the way repaired” in alot of cases.

So, maybe something like a directory structure of the objects in metadata moved to, say, /var/lib/${package_manager_name}/packages/${package_name}

And then having the “database” that is worked with be, say, an sqlite database generated from that tree on demand, so that the database itself is just a caching mechanism to improve query performance on lookups, and can be deleted any time if there is an issue because the next time it runs it’s going to rebuild the package database from that metadata directory tree.

One thing that I have not accounted for yet is post-operation scripting, such as “restart this service after updating this package” or “apply these tuning parameters to such and such after it’s installed”. I may leave this open ended and just have post-action hooks declarable in files, so, perhaps as a sibling to “contents.xz” and “metadata“, perhaps something like directory named “HOOKS” and it would check for the existence of reserved filenames, such as:

HOOKS/
├── PRE-INSTALL
├── PRE-INSTALL_ROLLBACK
├── POST-INSTALL
├── POST-INSTALL_ROLLBACK
├── PRE-UPDATE
├── PRE-UPDATE_ROLLBACK
├── POST-UPDATE
├── POST-UPDATE_ROLLBACK
├── PRE-REMOVE
├── PRE-REMOVE_ROLLBACK
├── POST-REMOVE
└── POST-REMOVE_ROLLBACK

And these would just be optionally placed shell scripts that do what is needed for their software for those operations. This way we’re agnostic to the code that’s actually running without hopping on the runaway complexity train.

Truthfully this is a naive approach, because software developers do some terribly stupid and destructive things to the point that giving them free scripting range without implementing an entire command framework to limit what they can have these do is probably the only way to do this that will protect the users from them. They do things like “the package is just installing the repo on your machine and it then downloads from our repo and pollutes your machine with garbage when all you told it to do was install a software package”. Slack does this. Their engineering department is aware of it and they refuse to fix it. After a certain point of complaint instead of fixing it, they dropped support for most distros. So maybe some sanity checks would be appropriate there because software engineering teams don’t care about the system.

And then obviously a component that wraps it that facilitates repository interaction to fetch packages remotely, much like DNF/YUM does for RPM but much simpler. Repos should be file-based or accessible via HTTPD and will rely on reserved directory names for repository metadata, much like DNF/YUM does. This is not a complex piece to implement.

What I’ve described here is just a stub of the design, and I’m sure it’ll pan out to become wildly different, but, this is where my brain is headed with it. This post is less about what the final design will be and more about the fact that yes, Dark Horse is still moving, and yes, it’s going to have its own package manager.

I will need a name. If someone emails me name suggestions to chris.punches@darkhorselinux.org and their suggestion is used, they will receive a usb stick preloaded with Dark Horse Linux at the next release that contains the package manager.

Leave a Reply Cancel reply