In this session we look into running applications using the binary compatibility layer as well as understanding the inner workings of the system call shim layer.

One of the obstacles when trying to use Unikraft is the porting effort of new applications. This process can be made painless through the use of Unikraft’s binary compatibility layer. Binary compatibility is the possibility to take pre-built Linux ELF binaries and run them on top of Unikraft. This is done without any porting effort while maintaining the benefits of Unikraft: reduced memory footprint, high degree of configurability of library components.

For this, Unikraft must provide a similar ABI (Application Binary Interface) with the Linux kernel. This means that Unikraft has to provide a similar system call interface that Linux kernel provides, a POSIX compatible interface. For this, the system call shim layer (also called syscall shim) was created. The system call shim layer provides Linux-style mappings of system call numbers to actual system call handler functions.

Reminders

Configuring, Building and Running Unikraft

At this stage, you should be familiar with the steps of configuring, building and running any application within Unikraft and know the main parts of the architecture. Below you can see a list of the commands you have used so far.

Command Description
kraft list Get a list of all components that are available for use with kraft
kraft up -t <appname> <your_appname> Download, configure and build existing components into unikernel images
kraft run Run resulting unikernel image
kraft init -t <appname> Initialize the application
kraft configure Configure platform and architecture (interactive)
kraft configure -p <plat> -m <arch> Configure platform and architecture (non-interactive)
kraft build Build the application
kraft clean Clean the application
kraft clean -p Clean the application, fully remove the build/ folder
make clean Clean the application
make properclean Clean the application, fully remove the build/ folder
make distclean Clean the application, also remove .config
make menuconfig Configure application through the main menu
make Build configured application (in .config)
qemu-guest -k <kernel_image> Start the unikernel
qemu-guest -k <kernel_image> -e <directory> Start the unikernel with a filesystem mapping of fs0 id from <directory>
qemu-guest -k <kernel_image> -g <port> -P Start the unikernel in debug mode, with GDB server on port <port>

System Calls

A system call is the programmatic way in which a process requests a privileged service from the kernel of the operating system.

A system call is not a function, but specific assembly instructions that do the following:

  • setup information to identify the system call and its parameters
  • trigger a kernel mode switch
  • retrieve the result of a system call

In Linux, system calls are identified by a system call ID (a number) and the parameters for system calls are machine word sized (32 or 64 bit). There can be a maximum of 6 system call parameters. Both the system call number and the parameters are stored in certain registers.

For example, on 32bit x86 architecture, the system call identifier is stored in the EAX register, while parameters in registers EBX, ECX, EDX, ESI, EDI, EBP.

Usually an application does not make a system call directly, but call functions in the system libraries (e.g. libc) that implement the actual system call.

Let’s take an example that you can see in the below image:

  1. Application program makes a system call by invoking a wrapper function in the C library.
  2. Each system call has a unique call number which is used by kernel to identify which system call is invoked. The wrapper function again copies the system call number into specific CPU registers.
  3. The wrapper function takes care of copying the arguments to the correct registers.
  4. Now the wrapper function executes trap instruction (int 0x80 or syscall or sysenter). This instruction causes the processor to switch from user mode to kernel mode.
  5. We reach a trap handler, that will call the correct kernel function based on the id we passed.
  6. The system call service routine is called.

system_call_image

Now, let’s take a quick look at unikernels. As stated above, in Linux, we use system calls to talk to the operating system, but there is a slight problem. The system calling process adds some overhead to our application, because we have to do all the extra operations to switch from user space to kernel space. In unikernels, because we don’t have a delimitation between kernel space and user space we do not need system calls so everything can be done as simple function calls. This is both good and bad. It is good because we do not get the overhead that Linux does when doing a system call. At the same time it is bad because we need to find a way to support applications that are compiled on Linux, so application that do system calls, even though we don’t need them.

Overview

01. The Process of Loading and Running an Application with Binary Compatibility

For Unikraft to achieve binary compatibility there are two main objectives that need to be met:

  1. The ability to pass the Linux ELF binary to Unikraft at boot time.
  2. The ability to load the passed ELF binary into memory and jump to its entry point.

The dominant format for executables is the Executable and Linkable File format (ELF), so, in order to run executables we need an ELF loader. The job of the ELF Loader is to load the executable into the main memory. It does so by reading the program headers located in the ELF formatted executable and acting accordingly.

As an overview of the whole process, when we want to run an application on Unikraft using binary compatibility, the first step is to pass the executable file to the unikernel as an initial ram disk. Once the unikernel gets the executable, it reads the executable segments and loads them accordingly. After the program is loaded, the last step is to jump to its entry point and start executing.

The unikernel image is the app-elfloader application. This application parses the ELF file and then loads it accordingly. It’s a custom application developed for Unikraft.

We require PIE (position-independent executable) ELFs. This is fine, as default Linux executables are built as PIE.

We have collected PIE executables in:

  • the dynamic-apps repository - storing dynamically-linked executables
  • the static-pie-apps repository - storing statically-linked executables

02. Unikraft Syscall Shim

As stated previously, the system call shim layer in Unikraft is what we use in order to achieve the same system call behavior as the Linux kernel.

Let’s take a code snippet that does a system call from a binary:

mov	edx,4		; message length
mov	ecx,msg		; message to write
mov	ebx,1		; file descriptor (stdout)
mov	eax,4		; system call number (sys_write)
syscall		    ; call kernel

In this case, when the syscall instruction gets executed, we have to reach the write function inside our unikernel. In our case, when the syscall instruction gets called there are a few steps taken until we reach the system call inside Unikraft:

  1. After the syscall instruction gets executed we reach the ukplat_syscall_handler. This function has an intermediate role, printing some debug messages and passing the correct parameters further down. The next function that gets called is the uk_syscall6_r function.

    void ukplat_syscall_handler(struct __regs *r)
    {
    	UK_ASSERT(r);
    
    	uk_pr_debug("Binary system call request \"%s\" (%lu) at ip:%p (arg0=0x%lx, arg1=0x%lx, ...)\n",
    		    uk_syscall_name(r->rsyscall), r->rsyscall,
    		    (void *) r->rip, r->rarg0, r->rarg1);
    	r->rret0 = uk_syscall6_r(r->rsyscall,
    				 r->rarg0, r->rarg1, r->rarg2,
    				 r->rarg3, r->rarg4, r->rarg5);
    }
    
  2. The uk_syscall6_r is the function that redirects the flow of the program to the actual system call function inside the kernel.

    switch (nr) {
    	case SYS_brk:
    		return uk_syscall_r_brk(arg1);
    	case SYS_arch_prctl:
    		return uk_syscall_r_arch_prctl(arg1, arg2, arg3);
    	case SYS_exit:
    		return uk_syscall_r_exit(arg1);
        ...
    

All the above functions are generated, so the only thing that we have to do when we want to register a system call to the system call shim layer is to use the correct macros.

There are four definition macros that we can use in order to add a system call to the system call shim layer:

  • UK_SYSCALL_DEFINE - to implement the libc style system calls, that return -1 and set the errno accordingly.
  • UK_SYSCALL_R_DEFINE - to implement the raw variant which returns a negative error value in case of errors. errno is not used at all.

The above two macros will generate the following functions:

/* libc-style system call that returns -1 and sets errno on errors */
long uk_syscall_e_<syscall_name>(long <arg1_name>, long <arg2_name>, ...);

/* Raw system call that returns negative error codes on errors */
long uk_syscall_r_<syscall_name>(long <arg1_name>, long <arg2_name>, ...);

/* libc-style wrapper (the same as uk_syscall_e_<syscall_name> but with actual types) */
<return_type> <syscall_name>(<arg1_type> <arg1_name>,
                              <arg2_type> <arg2_name>, ...);

For the case that the libc-style wrapper does not match the signature and return type of the underlying system call, a so called low-level variant of these two macros are available: UK_LLSYSCALL_DEFINE, UK_LLSYSCALL_R_DEFINE. These macros only generate the uk_syscall_e_<syscall_name> and uk_syscall_r_<syscall_name> symbols. You can then provide the custom libc-style wrapper on top.

Apart from using the macro to define the function, we also have to register the system call by adding it to UK_PROVIDED_SYSCALLS-y withing the corresponding Makefile.uk file. Let’s see how this is done with an example for the write system call. We have the following definition of the write system call:

ssize_t write(int fd, const void * buf, size_t count)
{
    ssize_t ret;

    ret = vfs_do_write(fd, buf, count);
    if (ret < 0) {
        errno = EFAULT;
        return -1;
    }
    return ret;
}

The next step is to define the function using the correct macro:

#include <uk/syscall.h>

UK_SYSCALL_DEFINE(ssize_t, write, int, fd, const void *, buf, size_t, count)
{
    ssize_t ret;

    ret = vfs_do_write(fd, buf, count);
    if (ret < 0) {
        errno = EFAULT;
        return -1;
    }
    return ret;
}

Or the raw variant:

#include <uk/syscall.h>

UK_SYSCALL_R_DEFINE(ssize_t, write, int, fd, const void *, buf, size_t, count)
{
    ssize_t ret;

    ret = vfs_do_write(fd, buf, count);
    if (ret < 0) {
        return -EFAULT;
    }
    return ret;
}

The last step is to add the system call to UK_PROVIDED_SYSCALLS-y in the Makefile.uk file. The format is:

UK_PROVIDED_SYSCALLS-$(CONFIG_<YOURLIB>) += <syscall_name>-<number_of_arguments>

So, in our case, we need to add:

UK_PROVIDED_SYSCALLS-$(CONFIG_LIBWRITESYS) += write-3

Summary

The binary compatibility layer is a very important part of the Unikraft unikernel. It helps us run applications that were not build for Unikraft while, at the same time, keeps the classic benefits of Unikraft: speed, security and small memory footprint.

Work Items

Support Files

Session support files are available in the repository. If you already cloned the repository, update it and enter the session directory:

$ cd path/to/repository/clone

$ git pull --rebase

$ cd content/en/community/hackathons/sessions/bincompat/

$ ls
demo  index.md  work

If you haven’t cloned the repository yet, clone it and enter the session directory:

$ git clone https://github.com/unikraft/docs.git

$ cd docs/

$ cd content/en/community/hackathons/sessions/bincompat/

$ ls
demo  index.md  work

00. Setup

To easily setup, build and run Linux ELFs with app-elfloader, best way is to use the scripts repository. Clone the scripts repository on your machine to get started.

$ git clone https://github.com/unikraft-upb/scripts

$ cd scripts/

$ cd make-based/app-elfloader/

$ ./do.sh setup

$ ./do.sh run
'run' command requires target application as argument
Target applications: helloworld_static server_static helloworld_go_static server_go_static helloworld_cpp_static helloworld_rust_static_musl helloworld_rust_static_gnu nginx_static redis_static sqlite3 bc_static gzip_static helloworld server helloworld_go server_go helloworld_cpp helloworld_rust nginx redis sqlite3 bc gzip

$ ./do.sh run helloworld
[...]                       # many messages

$ ./do.sh run sqlite3
[...]                       # many messages

The last commands run the dynamic versions of helloworld and sqlite3 applications, the ones in the dynamic-apps repository. There is a lot of output because, by default, a pre-build version of app-elfloader is being used, with debugging enabled.

01. Run Binary Applications

Run as many executables as possible from the list of applications listed by the command:

$ ./do.sh run
'run' command requires target application as argument
Target applications: helloworld_static server_static helloworld_go_static server_go_static helloworld_cpp_static helloworld_rust_static_musl helloworld_rust_static_gnu nginx_static redis_static sqlite3 bc_static gzip_static helloworld server helloworld_go server_go helloworld_cpp helloworld_rust nginx redis sqlite3 bc gzip

02. Debug Run

See the instructions in the README to run an application in debugging mode. Add breakpoints to system call functions such as uk_syscall_r_open.

03. Build app-elfloader from Existing Config

Build the app-elfloader from an existing configuration.

Copy the .config file from work/03/config to the app-elfloader folder. Now you can build it:

$ WITH_ZYDIS=y make

In the build/ folder you should have the app-elfloader_kvm-x86_64 binary.

To run it, go to the run-app-elfloader folder and run the run_elfloader script by passing it the -k option with the correct path to the built binary.

04. Doing it From Scratch

Inside the app-elfloder folder, remove previous build and configuration files:

$ make distclean

Now configure it from scratch by running:

$ WITH_ZYDIS=y make menuconfig

In the configuration menu, do the following changes:

  1. Select KVM guest from the Platform Configuration screen.
  2. Under the Platform Configuration -> Platform Interface Options select Virtual Memory API.
  3. Under the Library Configuration screen, unselect ukmmap and select ukvmem and posix-mmap.
  4. Under the Library Configuration -> ukvmem screen, select all the Use dedicated * options.
  5. If you want to use a filesystem with your application, under the Library Configuration -> vfscore: Configuration, select the Automatically mount a root filesysytem option and choose the default root filesystem to be 9PFS.
  6. Change the Default root device to fs0 in the vfscore: Configuration screen above, to be able to use the qemu-guest script.
  7. Select lwip under the Library Configuration screen if the applications that we will run require networking support.

Now you can build it:

$ WITH_ZYDIS=y make

Test it using the run_elfloader script in the run-app-elfloader repository.

05. Build with Debugging

Use different ukdebug configurations and build the app-elfloader with those. Run applications and see the different messages they print.

06. Create your Own Application

Create your own application or get an existing application, build it and run it in binary compatability mode.

See the existing examples in the dynamic-apps repository or the static-pie-apps repository.

07. Give Us Feedback

We want to know how to make the next sessions better. For this we need your feedback. Thank you!

Further Reading

Elf Loaders, Libraries and Executables on Linux