Compatibility

Unikraft is designed to be POSIX- and Linux-compatible, enabling seamless conversion of existing production applications to the newer deployment model.

Parts of this document contains extracts from an article published to USENIX ;login about Unikraft.

One challenge with Unikraft is the potential porting effort required for new applications. To address this, Unikraft includes a binary-compatibility layer which allows the running of Linux binaries (ELFs) on top of Unikraft.

For such a layer to work, Unikraft has the explicit goal to comply, as much as possible, with Linux's ABI (Application Binary Interface), providing a large range of its system call interface. Concretely, in Unikraft this is handled through its system call shim layer (also called syscall shim), which provides Linux-style mappings of system call numbers to actual system call handler functions.

Currently, binary-compatibility is available on x86_64, with work ongoing to make this available for AArch64. Also note that KVM is currently the only supported hypervisor, with QEMU and Firecracker as VMMs (Virtual Machine Monitors).

Currently, binaries have to be built as PIE (Position-Independent Executables). This is the default build mode for the majority of Linux distributions, so it shouldn't cause any problems.

Note that, because Linux binaries are included, constructing new Linux binaries requires a Linux or Linux-compatible development environement (such as WSL - Windows Subsystem for Linux). This is only the case for building binaries -- prebuilt binaries can be used and the ELF loader app itself can be built on multiple platforms (Linux, Windows, macOS).

Binary-compatible apps are located in the Unikraft application catalog repository. Follow the guides below to know how to use and develop for the application catalog:

While Linux has over 350 system calls, previous studies have shown through static analysis that only a subset (224) are needed to run a Ubuntu installation. This number is actually an overestimation due to various reasons, including the fact that not all such applications make sense in a unikernel context (e.g., desktop applications) and the imprecision of static analysis. The proof is that Unikraft's 160+ syscalls are plenty to run complex applications such as Redis, SQLite, NGINX, HAProxy, TFLite and Memcached, and languages like Python, Ruby and Go, to name a few.

The ability to run a wide range of applications and languages is paramount to how much deployment any unikernel project will see. Unikraft addresses this directly through what we term autoporting: we use an application's native build system to build the application against the musl C standard library and link the resulting object files against Unikraft (see Figure below). For this to work, Unikraft includes a port of the musl library, which means, since musl is meant for Linux, that system call support is required. To address this, Unikraft provides (a modular) system call shim layer along with implementations of a growing number of syscalls. Unikraft also supports binary compatibility mode, where unmodified ELFs can be run on top of Unikraft with a slight performance hit; this functionality will be upstreamed in the future.

Unikraft avoids porting of applications by building them using their native build systems with musl and linking the object files into the Unikraft build.

Figure 1: Unikraft avoids porting of applications by building them using their native build systems with musl and linking the object files into the Unikraft build.

A Trip Down POSIX-Compatibility Lane#

As a POSIX-like unikernel, Unikraft strives for a high degree of compatibility with existing applications by supporting the Linux system call API. Some system calls are more popular than others and the degree of compatibility of a given POSIX-like unikernel cannot simply be measured as the percentage of the Linux system call API it supports.

In a 2016 paper, Tsai et al. measured system call usage over the entire set of applications from a typical Ubuntu/Debian distribution. They concluded that to support 100% of these applications, 272 system calls needed to be implemented. That number went down to 202, 145, 81, and 40 system calls for the 90%, 50%, 10% and 1% most popular applications, respectively, suggesting that a large implementation effort would be required for an OS aiming at supporting even a small number of applications.

However, in the process of implementing POSIX system calls and checking whether applications where actually running, we gathered anecdotal evidence that these requirements were not as stringent as they seemed: whenever a system call is missing, the default behavior of Unikraft's system call shim layer is to stub it by returning ENOSYS; the result of this was that some applications where correctly running despite not having some of the system calls they invoked implemented, so we decided to take a closer look.

Looking Under the System Call API Hood#

To understand this behavior better, we performed a study that uses both dynamic as well as source-level static analysis. For the dynamic analysis we rely on seccomp to hook into each system call made by the application and to selectively disable it, returning either -ENOSYS (stubbed) or a success code even though the system call isn't actually implemented (faked). By monitoring the success/failure of the traced application, we can determine which system calls can be stubbed and/or faked. Further, we developed a static analysis tool that takes as input the application's and dependencies' code (e.g., libc) and outputs an estimation of the system calls the application may invoke at runtime. As a baseline we also ran the binary-level static analysis tools used by Tsai et al.

For this initial analysis we focused on 5 applications (Redis, NGINX, Memcached, SQLite and HAProxy), although we are in the process of adding many more to the tool. We selected these because they are (a) popular applications (b) good candidates for running as a unikernel/cloud environment and (c) they have thorough benchmarking tools (redis-benchmark, wrk, etc.) and test suites. We use the benchmarking tools to provide realistic workloads and the test suites good coverage as we measure which syscalls the applications are making actual use of.

Results and Insights#

Figure 2 below shows, for each application and corresponding benchmarking tool and test suite, the number of system calls statically identified and dynamically traced. Traced system calls are broken down between the ones whose implementation is absolutely required for the application/workload, as well as the ones that can be stubbed and/or faked.

Number of syscalls statically identified and dynamically traced for applications under traditional (bench) and unit tests (suite) workloads. Traced syscalls are broken down into the ones that can be stubbed and/or faked and the required ones.

Figure 2: Number of syscalls statically identified and dynamically traced for applications under traditional (bench) and unit tests (suite) workloads. Traced syscalls are broken down into the ones that can be stubbed and/or faked and the required ones.

The key insight is that applications are resilient to a significant portion of syscalls being stubbed and faked, and that the number of implemented syscalls they require to correctly run is significantly lower than the output of the static analysis suggests, let alone the total number of syscalls in the Linux API. Applications require much fewer system calls to run than a static analysis would suggest, and much less than the total system calls in the Linux API.

To confirm this, we took a look at the applications' source code: in cases where the failure of a system call is non-critical for the execution of the program, the program can detect the error and decide to continue as usual, in which case stubbing works. This snippet from the Redis code-base is a good example:

if (getrlimit(RLIMIT_NOFILE,&limit) == -1) {
    serverLog(LL_WARNING,"Unable to obtain the current NOFILE"
        "limit (%s), assuming 1024 and setting the max clients"
        "configuration accordingly.", strerror(errno));
    server.maxclients = 1024-CONFIG_MIN_RESERVED_FDS;
}

Here when we stub getrlimit or prlimit64, Redis handles it gracefully with a default value. In other cases however, the program can interpret the error code conservatively and decide to abort, in which case faking usually works (since the failure of the system call is, in reality, non-critical). This snippet from the NGINX codebase is a good example of such behavior:

if (prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0) == -1) {
    ngx_log_error(NGX_LOG_EMERG, cycle->log, ngx_errno,
                  "prctl(PR_SET_KEEPCAPS, 1) failed");
    /* fatal */
    exit(2);
}

Here prctl fails to force the retaining of capabilities upon UID transition; in the context of an OS that does not have a user/kernel separation (as is the case for unikernels), capabilities make little sense and so it is fine to fake the success of the check.

On average, the proportion of invoked system calls that can be stubbed/faked is 42% for test suites, and 60% for traditional workloads. This shows that the effort to provide comprehensive (test suite level) support for these popular applications is relatively limited, and is even lower when considering partial support i.e. traditional workloads. We observe that the system calls that can be stubbed/faked vary among applications. As an indication, the number of system calls that would need to be effectively implemented for all these 5 applications to be supported is 78 for test suites and only 37 for traditional benchmarks

A second observation is that static analysis produces many false positives and as such yields a large overestimation of the system calls made by an application. For example on Redis, binary-level static analysis identifies 89 system calls, vs. 68 dynamically traced from the test suite. These trends are the same for all programs. This is due to multiple reasons such as dead code but also the imprecision of binary-level static analysis. A concrete example is when such analysis encounters a system call wrapper like setxid: it may mark all the possible system calls that can be made through that wrapper as invoked, independently of those that will actually be made at runtime.

Source-level analysis does not suffer from such issues and as such is more precise than binary-level techniques. For example, on Redis, source-level static analysis reports 71 system calls which is close to the 68 traced at runtime on the test suite. Due to the difficulty of binary-level static analysis, this technique also suffers from a small number of false negatives. Based solely on static analysis, the amount of system calls that would need to be implemented to support all 5 applications is 141 for source-level and 125 for binary-level. We suspect that this smaller number for binary-level analysis may come from its false negatives. We conclude that relying solely on static analysis is not sufficient to get a good understanding of the implementation effort required for an OS aiming at POSIX-like compatibility.

In all, these results bring a message of hope to the level of POSIX-compatibility Unikraft, and unikernels more generally, can provide: the effort, while non-negligible, is not as insurmountable as past studies relying on static analysis seemed to suggest. Unikraft is constantly striving to increase its syscall support and thus its ability to run a large set of mainstream applications.

Compatibility

Unikraft is designed to be POSIX- and Linux-compatible, enabling seamless conversion of existing production applications to the newer deployment model.

Looking Under the System Call API Hood#

Results and Insights#

Connect with the community