The problem we want to solve is to enable developers to create a specialized OS for every single application to ensure the best performance possible, security guarantees or desired target KPI. The requirement of enabling such high modularity across multiple system boundaries has led to a number of key design decisions:
Single address space: Target single application scenarios, with possibly different applications talking to each other through networked communications.
Fully modular system: All components, including operating system primitives, drivers, platform code and libraries should be easy to add and remove as needed; even APIs should be modular.
Single protection level: There should be no user-/kernel-space separation to avoid costly processor mode switches. This does not preclude compartmentalization (e.g., of micro-libraries), which can be achieved at reasonable cost.
Static linking: Enable compiler features, e.g., Dead-Code Elimination and Link-Time-Optimization (LTO), to automatically get rid of unneeded code.
POSIX support: In order to support existing or legacy applications and programming languages while still allowing for specialization under that API.
Platform abstraction: Seamless generation of images for a range of different hypervisors/VMMs.
To derive the core design principles of Unikraft, it is worth analyzing the features and (heavyweight) mechanisms of traditional OSes that are unnecessary or ill-suited to single application use cases:
Protection-domain switches between the application and the kernel might be redundant in a virtualization context because isolation is ensured by the hypervisor, and result in measurable performance degradation.
Multiple address spaces may be useless in a single application domain, but removing such support in standard OSes requires a massive reimplementation effort.
For RPC-style server applications, threading is not needed, with a single, run-to-completion event loop sufficing for high performance. This would remove the need for a scheduler within the VM and its associated overheads, as well as the mismatch between the guest and hypervisor schedulers.
For performance-oriented UDP-based apps, much of the OS networking stack is useless: the app could simply use the driver API, much like DPDK-style applications already do. There is currently no way to easily remove just the network stack but not the entire network sub-system from standard OSes.
Direct access to NVMe storage from apps removes the need for file descriptors, a VFS layer and a filesystem, but removing such support from existing OSes, built around layers of the storage API, is very difficult.
Memory allocators have a large impact on application performance, and general purpose allocators have been shown to be suboptimal for many apps. It would therefore be ideal if each app could choose its own allocator; this is however very difficult to do in today's operating systems because the allocators that kernels use are baked in.
This admittedly non-exhaustive list of application-specific optimizations implies that for each core functionality that a standard OS provides, there exists at least one or a few applications that do not need it. Removing such functionality would reduce code size and resource usage but would often require an important re-engineering effort.
Given these design decisions, the question thus stems of how to implement such a system, either: by minimizing an existing general-purpose OS; by starting from an existing unikernel project; or, starting from scratch. Existing work has taken three directions in tackling this problem.
The first direction takes existing OSes and adds or removes functionality. Key examples add support for a single address space and remove protection domain crossings: OSv and RUMP Kernel which adopt parts of the BSD kernel and re-engineer it to work in a unikernel context. Lupine Linux relies on a minimal, specialized configuration of the Linux kernel with Kernel Mode Linux (KML) patches.
These approaches make application porting easy because they provide binary compatibility or POSIX compatibility, but the resulting kernel is monolithic.
Existing monolithic OSes do have APIs for each component, but most APIs are quite rich as they have evolved organically, and component separation is often blurred to achieve performance (e.g., sendfile short circuits the networking and storage stacks). The Linux kernel, for instance, historically featured highly inter-dependent subsystems.
To better quantify this API complexity, we analyzed dependencies between the
main components of the Linux kernel. As a rough approximation, we used the
subdirectories in the kernel source tree to identify (broad) components. We
cscope to extract all function calls from the sources of all kernel
and then for each call checked to see if the function is defined in the same
component or a different one; in the latter case, we recorded a dependency. We
plot the dependency graph in Figure 1, the annotations on the edges show the
number of dependencies between nodes. This dense graph makes it obvious that
removing or replacing any single component in the Linux kernel requires
understanding and fixing all the dependencies of other components, a daunting
While full modularization is difficult, modularizing certain parts of a monolithic kernel has been done succesfully by Rump. There, the NetBSD kernel was split into base layers (which must be used by all kernels), functions provided by the host (scheduling, memory allocation,etc) and so-called factions that can be run on their own (e.g. network or filesystem support). Rump goes some way towards achieving our goals, however there are still many dependencies left which require that all kernels have the base and hypercall layers. Additionally, the dependencies on the host are limiting in the context of a VM, which is our target deployment.
The second direction is to bypass the OS altogether, mostly for I/O performance,
while leaving the original stack in place -- wasting resources in the process.
Even here, porting effort is required as apps must be coded against the new
Netmap or Linux's
subsystem) or storage (SPDK)
The third direction is to add the required OS functionality from scratch for each target application, possibly by reusing code from existing operating systems. This is the approach taken by ClickOS to support Click modular routers, MirageOS to support OCaml applications, and MiniCache to implement a web cache, to name a few. The resulting images are very lean, have great performance and have small boot times; the big problem is that the porting effort is huge, and that it has to be mostly repeated for every single application or language.
In sum, starting from an existing project is suboptimal since none of the projects in the three directions mentioned were designed to support the key principles we have outlined. We opt for a clean-slate API design approach, though we do reuse components from existing works where relevant. Course borrowing parts of code from existing projects, where relevant, in order not to reinvent the wheel (eg x86_64 KVM boot).
Unikraft consists of three basic components:
Library Components. Unikraft modules TODO, each of which provides some piece of functionality. As is expected of a library, they are the core building blocks of an application. Libraries can be arbitrarily small (e.g., a small library providing a proof-of-concept scheduler) or as large as standard libraries like libc. However, libraries in Unikraft often wrap pre-existing libraries, such as openssl, and as such existing applications can still make use of relevant, existing systems without having to re-work anything.
Configuration. Inspired by Linux's KConfig system, Unikraft uses this approach to quickly and easily allow users to pick and choose which libraries to include in the build process, as well as to configure options for each of them, where available. Like KConfig, the menu keeps track of dependencies and automatically selects them where applicable.
Build Tools. To seamlessly build and manage Unikraft, its components, configuration and its execution is a suite of tools ... core of Unikraft is a suite of tools which aid in the creation of the final unikernel image. Based on GNU Make, it takes care of compiling and linking all the relevant modules and of producing images for the different platforms selected via the configuration menu.
Unikraft libraries are grouped into two large groups: core (or internal) libraries, and external libraries. Core libraries generally provide functionality typically found in operating systems. Such libraries are grouped into categories such as memory allocators, schedulers, filesystems, network stacks and drivers, among others. Core libraries are part of the main Unikraft repo which also contains the build tool and configuration menu.
External libraries consist of existing code not specifically meant for Unikraft. This includes standard libraries such as libc or openssl, but also run-times like Python.
Whether core or external, from a user’s perspective these are indistinguishable: they simply show up as libraries in the menu.
Feel free to ask questions, report issues, and meet new people.