Coding Conventions

Coding conventions are important to maintain consistency and readability in software development, making code easier to understand, debug, review and maintain by multiple developers over time. Learn how the Unikraft core codebase is organized and how its code is written.

These are conventions for writing code and naming and placing things inside the Unikraft core codebase. Maintainers would be the ones who judge if there are certain corner cases when a rule doesn't need to be applied.

Some example of extreme cases:

Having line with more than 80 characters, as they are easily searchable. You can see an example here.
Using typedef struct to enhance readability, as you can see here.

Basics#

The layout uses the following basic rules. You may install EditorConfig as there is a .editorconfig file in the top-level directory to automatically enforce these.

Unikraft uses tabs + spaces for indentation, with tabs being 8 characters wide. Tabs first, further indentation less than 8 characters are done with spaces (Linux style).
Source files are 80 columns wide.
- There are some scenarios where this limit could be passed, due to readability reasons.
No trailing whitespaces.
Insert a final newline.

Always run support/scripts/checkpatch.uk to make sure your source file complies with the latest Unikraft coding conventions.

License Header#

Each file should have an SPDX identifier and license comment. There is an empty newline after the license. Unikraft uses the BSD-3-Clause.

/* SPDX-License-Identifier: BSD-3-Clause */
/* Copyright (c) 2023, Unikraft GmbH and The Unikraft Authors.
 * Licensed under the BSD-3-Clause License (the "License").
 * You may not use this file except in compliance with the License.
 */

#include <somefile.h>

Using Foreign Code#

When using foreign code, always make sure that the code uses a compatible (permissive) license. We have a strong preference for BSD-3-Clause. Other compatible licenses are, for example: Public domain, BSD-2-Clause, MIT, Apache 2.0. Incompatible licenses are, for example: GPL, LGPL.

This in particular means that basing your work on Linux kernel source code is not permitted.

Minor Foreign Contributions#

If you based only parts of your implementation on some other author's work (e.g., importing a function, using an algorithm etc.) you can add a separate block after the license header or before the particular piece of code:

/* You may not use this file except in compliance with the License. */

/* Parts of the implementation are based on
 * PROJECT (VERSION) by AUTHOR <EMAIL>
 * Git repository, commit ID
 * LICENSE IF NEEDED
 */

#include <somefile.h>

The version information is the software version number. If a Git repository is available, provide a link to the repository and the commit hash used. If the foreign license is the same as the source file’s license you do not need to provide the additional license copy.

Major Foreign Contributions#

If you import a foreign source file, keep the existing author and license information. Note that when you make changes to the file, some licenses (e.g., Apache 2.0) require that you explicitly state that you changed the file.

Adding Additional Authors#

If your changes are small (e.g., a bugfix or a small extension) and it is not required by the license, consider not adding your name to the list of authors (and the corresponding copyright) if present. This improves the visibility of the information about who the original authors are. Your changes will be tracked by the Git history anyway and show up using git blame.

Coding Style#

For improved readability and maintainability your code should adhere to the rules laid out in this chapter. The conventions are similar to the Linux kernel coding style to make developing code for Unikraft a familiar experience for people who have already worked with the Linux kernel code. Thus, some of the examples have been taken from the Linux kernel coding style guide.

Braces#

Opening braces are placed at the end of the line that opens the new scope. Closing braces are put on a separate line:

if (condition) {
        /* ... */
}

struct my_struct {
        /* ... */
};

However, there are some exceptions. Opening braces for functions are placed on a new line:

void func(void)
{
        /* ... */
}

Separate lines for the closing braces are also not used in the following cases to reduce the number of empty lines:

if (condition) {
        /* ... */
} else if (condition) {
        /* ... */
} else {
        /* ... */
}

do {
        /* ... */
} while (condition);

Do not use braces for single line statements:

if (condition)
        single_statement();
else
        single_statement();

while (condition)
        single_statement();

But use braces for the else block if the if block requires them:

if (condition) {
        single_statement();
        single_statement();
} else {
        single_statement();
}

Spaces#

Generally, put a space after keywords like if, switch, case, for, do, while, return.

if (condition)

if(condition)

Also put a space after a comma:

void func(int arg0, int arg2)

void func(int arg0,int arg2)
void func(int arg0 ,int arg2)

And around most operators (except unary operators):

a = b + c;

if (a == b)

a = (condition) ? 1 : 2;

For unary operators and structure member operators do not use a space:

p = &var;

if (!condition)

a = ~c & b;

a->field = 0xdeadb0b0;

Do not put spaces around keywords like sizeof, typeof, alignof,__attribute__ and around (inside) parenthesized expressions:

s = sizeof(p);

s = sizeof ( p );

if ( condition )

Always put the asterisk (*) for pointer definitions adjacent to the data or function name. This is to avoid the misconception that char* a, b defines two pointers, whereas, in fact, only a is a pointer.

char *myfunc(void *p)
{
        char *a, *b, c;
        /* ... */

        return (char *) p;
}

Comments#

Always use multi-line comments escaping even for single line comments:

/* Single line comment */
if (condition)

int var;  /* comment */

int var;  // comment

Use a separate line to close the comment for actual multi-line comments. There is a leading space to align the lines:

/* This is a multi line comment because it is so incredibly
 * useful that the contents does not fit into one line
 */

Ending the comment with a period is not necessary. Just keep it consistent.
Use spaces instead of tabs to indent within comments.
Start comments with capital letters in case of sentences.

Function Documentation#

Functions that are exported by a library or define an otherwise publicly visible interface need to be properly documented in the header file. Use Doxygen-style comments for this. Note the double-asterisk (**) at the beginning:

/**
 * Tries to allocate the requested number of bytes from the default memory
 * allocator. The actual amount of allocated memory can be smaller.
 *
 * @param size
 *   Number of bytes to allocate
 * @param[out] p
 *   Pointer to the newly allocated memory on success, NULL otherwise
 *
 * @return
 *   The number of bytes actually allocated
 */
size_t try_alloc(size_t size, void **p);

Use dots when ending a sentence or when a new sentence starts afterward.
Start the description on a new line for Doxygen variables (@param, @return).
For referencing Doxygen variables, you can use the official documentation.
The return can be omitted for trivial getter functions:
```
/**
 * Returns the size of the given buffer
 */
size_t get_size(void *p);
```
For return values, use the convention:
- \- (0): Successfully initialized
- \- (<0): Negative value with error code
Basically, list return values as list items.

Struct Member Documentation#

Use double asterisk for multi-line comments (/**).
When having flags, point to definition of macros in the same or another file.
Check how Doxygen renders macro comments and then decide on the best approach to format.

Definitions#

Variable Definitions#

Variables are defined at the beginning of the block. The block is separated from the code with a blank line.

int my_func(void)
{
        int *ptr;
        int i, num;

        ptr = malloc(10 * sizeof(int));
        if (unlikely(!ptr))
                return -ENOMEM;

        for (i = 0; i < 10; i++)
                num = /* ... */;
}

int my_func(void)
{
        void *ptr = malloc(10 * sizeof(int));

        if (unlikely(!ptr))
                return -ENOMEM;

        for (int i = 0; i < 10; i++)
                int num = /* ... */;
}

Naming#

Identifiers in Unikraft use snake case:

void my_func(void); /* Snake Case  */

void myFunc(void);  /* Camel Case  */

void MyFunc(void);  /* Pascal Case */

An exception to this are constants and preprocessor macros, which should use all uppercase and words separated by underscores:

const int MAGIC_VALUE = 0xdeadb0b0;

#define IS_DEAD(bobo) ((bobo) == MAGIC_VALUE)

Macro names may also be regular function names if they are just wrappers for actual functions (see uk_vprintk() below) or if the functionality might be either implemented as macro or function (e.g., depending on architecture).

Leading Underscores#

A leading single underscore is discouraged but may be used selectively to mark identifiers which should not be used directly, like in the header <uk/print.h>:

#if CONFIG_LIBUKDEBUG_PRINTK
/* please use the uk_printk(), uk_vprintk() macros because they compile in the
 * function calls only if the configured debug level requires it
 */
void _uk_vprintk(int lvl, const char *libname, const char *srcname,
                 unsigned int srcline, const char *fmt, va_list ap);


#define uk_vprintk(lvl, fmt, ap) \
        do { \
                if ((lvl) <= KLVL_MAX)  \
                        _uk_vprintk((lvl), __STR_LIBNAME__, __STR_BASENAME__, \
                        __LINE__, (fmt), ap); \
        } while (0)

Prefer using UK_PAGE_SIZE instead of _PAGE_SIZE if required to avoid a symbol clash.

However, leading underscores should not be used to name private symbols within the same source file. You should use the static keyword for private symbols.

static void _i_am_private(int arg)
{
        /* ... */
}

void i_am_public(int arg)
{
        _i_am_private(arg);
}

Leading double underscores may only be used for:

low-level native Unikraft types like __u64 and __spinlock.
header include guards.
preprocessor macros to add another level of indirection for macro resolution.

The reason for that may be potential clashes with libc/compiler symbols. So, for example, instead of __NULL use UK_NULL. And instead of __nonnull use uk_nonnull.

Rule of thumb: include libc headers first and then Unikraft headers. This is similar to Linux Kernel.
Furthermore, include files should be added to files in alphabetical order if possible.

Prefixes#

Unikraft uses name prefixes to express that an identifier is part of a certain API, component or namespace.

<UK API>_<COMPONENT>_<NAME>

Unikraft reserves the following API prefixes to describe functions, function-like macros and data types of the Unikraft API. This also applies (CAPITALIZED) for macros:

uk_ - denotes a public Unikraft API at the library level. The implementation is usually found in one of the libraries in /lib or defined in the headers in /include/uk. For example: uk_alloc().
ukplat_ - denotes a public Unikraft API, whose implementation is usually provided in /plat and depends on the platform (e.g., KVM) for which the unikernel is compiled. For example: ukplat_page_map().
ukarch_ - denotes a public Unikraft API, whose implementation is usually provided in /arch and depends on the architecture (e.g., x86-64) for which the unikernel is compiled. For example: ukarch_tlb_flush().

Use a verb at the end of the function name (_init, _clean, _add, _clone).

The component prefix makes it easy to identify related symbols. For example, functions related to thread management use the thread prefix and functions related to page mapping use the page prefix:

void uk_thread_block(struct uk_thread *thread);

int ukplat_page_map(...);

The component prefix should also be used for private functions, but without the public API prefix (cc = coding conventions):

static int cc_myfunc(int arg)
{
        /* ... */
}

int uk_cc_myfunc(int arg)
{
       return cc_myfunc(arg);
}

If the library defines functions that are private, but architecture specific, they should follow the pattern used for the public Unikraft API (here ccarch_):

static int cc_myfunc(int arg)
{
        if (arg == 42)
                ccarch_do_something();
}

Public is defined in exportsyms.uk.

Private is not as part of exportsyms.uk or part of localsyms.uk.

If there is no exportsyms.uk defined, then all functions are exported.

Typedefs#

Rule of thumb: Do not use typedefs.

Do not use the _t suffix when defining types. It seems that _t types are POSIX-reserved. Consider using _func suffix if you require defining function pointer data types (those situations should be rare - such as passing a function pointer to another function or using it in multiple places).

typedef int (*cmp_func)(int, int);
void sort(int *array1, cmp_func *f)

Usage of Unikraft Types#

Unikraft defines aliases for most of the basic data types (e.g., __u8, __sz). Use them where possible, when the component can live without libc dependency (nolibc, musl, newlib); use them instead of libc data types such as uint32_t. For sure, architecture, platform code, drivers.

sizeof#

Prefer to take the size of a variable instead of the variable's type so that the size remains valid if the variable's type changes:

struct my_struct p;
s = sizeof(p);

struct my_struct *p;
s = sizeof(*p);

struct my_struct p;
s = sizeof(struct my_struct);

Return Values#

For functions that should return error values, use negative error codes from errno.h if possible. These are well known and provide adequate values in most cases. Avoid defining custom error codes. Use 0 for success. This is to be used both for Unikraft APIs and for lower-level components.

Rationale
errno values have good coverage in the types of errors. It is easier to use a "standard" set of values (errno) to signal errors, even in lower-level components.

#include <errno.h>

int do_something(void arg)
{
        if (arg > 42)
                return -EINVAL; /* failure */

        return 0; /* success */
}

Include errno.h in the source code files, not in the API header files. Platform API shouldn’t provide errno.h.

Boolean-type functions should use __bool as return type and indicate:

true (success) with a non-zero value.
false (failure) with 0.

Do not assume that true is 1; true may be a non-zero value. When using the return value of a boolean function, use if (result) or if (!result) to cover the case for true being a non-zero value.

__bool is_bobo(int bobo)
{
        if (bobo == 0xb0b0)
                return 1; /* true, it is b0b0 */


        return 0; /* false */
}

Switch Statements#

The cases in switch statements are not indented.

switch (var) {
case 0:
        /* ... */
        break;
case 1:
        /* ... */
        break;
default:
        /* ... */
        break;
}

Goto Statements#

goto labels should NOT be indented. The name of the label should be: err_... or out_..., depending on the context.

Generally, goto should be avoided for simple "retry"-style constructs. Use a while-loop instead.

Gotos can be a good solution to implement a common error/finalization path. This reduces the probability that some error paths may leak resources or do not release all locks (in the correct order). This can easily happen as error paths are usually less well tested. If, however, possible, return directly.

int cc_myfunc(void)
{
        void *a, *b;
        int rc = 0;

        a = malloc(...);
        if (unlikely(!a)) {
                rc = -ENOMEM;
                goto err_out;
        }

        b = malloc(...);
        if (unlikely(!b)) {
                rc = -ENOMEM;
                goto err_free_a;
        }

        /* ... */
        free(b);

err_free_a:
        free(a);
err_out:
        return rc;
}

int cc_myfunc(void)
{
        void *a, *b;

        a = malloc(...);
        if (unlikely(!a))
                return -ENOMEM;


        b = malloc(...);
        if (unlikely(!b)) {
                free(a);
                return -ENOMEM;
        }

        /* ... */
        free(b);
        free(a);
        return 0;
}

Assembly#

Code#

Indentation for ASM code:

Multi line (Tab at the beginning, Tab between instructions and operands, \n at the end)
Removing the Tab could also be considered. But this would lead to having \n\t at the end. Which might not be ideal if the code is later modified and requires the addition of a label.

Inline Assembly#

asm volatile (
    "1:      instr      op, op\n"
    "        instr2     op, op\n"
    :::);

Single-line
Prefer using asm volatile over __asm__ __volatile__.
Aim to add a comment for each inline assembly line, detailing what it does.
Ignore for repetitive operations (such as mov-s).
- Ignore long lines for inline assembly, when including comments.
- Aim to have short comments. We could also have multi-line comments.

Breaking Long Lines#

Break before function name (after function return type).
Align function arguments after break to open parenthesis.

For functions, align arguments on different line:

UK_SYSCALL_R_DEFINE(int, getpeername, int, sock,
                    struct sockaddr *restrict, addr,
                    socklen_t *restrict, addr_len)

For conditions, always align to the parenthesis.

if (a == 0 && \
    (b == 10 || \
     c == 12) || \
    d == 15)

Function name and first argument are atomic.

Prefer the Linux style with the return type on the same line as the function name. In case of long functions + first argument, you can break after the return type.

Preprocessor Macros#

This is referenced in the definitions section for prefixes (i.e. the use of uk_, ukplat_, ukarch_ prefixes). Recommendation for using macros as (inline) functions:

Follow the same conventions as functions.
This brings flexibility in switching between one or the other.
Some macros as functions would still be uppercase (such as UK_MIN, UK_MAX).
Place each argument (that is a variable) between parentheses.
If using a variable multiple times, it’s better to use a temporary variable declaration in the macro that stores the variable.

UK_ASSERT#

Have UK_ASSERT even in the absence of ukdebug.
UK_CRASH is for situations where you can not formalize an assertion. Something like UK_ASSERT(false).
UK_BUG_ON should never be removed from the code (with condition). The same applies for UK_CRASH (without condition).
Change behavior of UK_BUG_ON to always be part of the code, even in release builds.

Parameter Validation#

Do rigorous parameter validation. However, parameter validation should be done using assertions where the parameter can be expected to be not directly determined by external input. This means in particular that uk_, ukplat_, and ukarch_ APIs should use UK_ASSERT():

int uk_cc_myfunc(int bobo)
{
        UK_ASSERT(bobo == 0xdead);

        /* do something */
}

int uk_cc_myfunc(int bobo)
{
        if (unlikely(bobo == 0xdead))
                return -EINVAL;

        /* do something */
}

This is in contrast to, for example, POSIX, where explicit parameter validation is expected. In a unikernel, though, we see the OS components as an (internal) extension to the application. Users of the APIs are thus expected to make sure only sane parameters are used just like with internal application functions.

Generally, use the unlikely() macro when checking for error cases (only). It is defined in <uk/essentials.h>.

When working with the low-level Unikraft API, it is essential to validate parameters using UK_ASSERT when specific parameter ranges need to be enforced. In cases where there is no explicit specification, performing an error check is recommended to ensure parameter validity.

When working with higher-level APIs, parameter validation should be handled by returning appropriate error codes according to the relevant standard (such as POSIX).

Comment Keywords#

NOTE: Description of how the code works (when it isn't self-evident).
XXX: Warning about possible pitfalls, can be used as NOTE:XXX:.
HACK: Not very well written or malformed code to circumvent a problem/bug. Should be used as HACK:FIXME.
FIXME: This works, sort of, but it could be done better (usually code written in a hurry that needs rewriting).
BUG: There is a problem here.
TODO: No problem, but additional code needs to be written, usually when you are skipping something.
UNDONE: A reversal or "roll back" of previous code
OPTIMIZE: Code can / should be optimized.

Magic Numbers#

Any magic number using hexspeak and that is introduced with new code should not be discriminatory or sexual in any way. Rumor has it that our Unikraft mascot is named Bobo. So, newly introduced magic numbers have manifested around Bobo (0xB0B0). This list of magic numbers sorted by categories can help in choosing one:

Magic number	Source
`DEADB0B0`	Dead Bobo
`B0B0DEAD`	Bobo dead
`BADB0B0`	Bad Bobo
`B0B0F00D`	Bobo food (obviously a banana)
`B0B0CAFE`	Bobo coffee
`B0B0C0DE`	Bobo code (was it coded by Bobo?)
`B007B0B0`	Boot Bobo (Unikraft bootinfo magic)
`B0B0B055`	Bobo boss

Only use capital hex digits (B instead of b).

When used in source code, the symbol of the magic number should be used, never the value. For example:
#define UKPLAT_BOOTINFO_MAGIC  0xB007B0B0 /* Boot Bobo */
#define UKTLS_SANITY_MAGIC     0xB0B0F00D /* Bobo food */

Coding Conventions