Menu

An Introduction to ADL (or how to double your native interop performance)

Foreword

In today's modern, cross-platform .NET world we're swimming in unprecedented support for different operating systems and processor architectures. It's a bonanza of new and exciting technology and opportunities - one that is partially soured for mixed developers who want or have to work in both the cozy, comfy managed world of .NET and the performance-critical wild west of low-level code.

It might be surprising to hear that P/Invoke hasn't changed much - if at all - since its introduction in .NET 1.1. Developers wishing to leverage the power and freedom of low-level programming in C# are locked to either using static classes and DllImport attributes, or building their own ramshackle solutions with delegates and Marshal.GetDelegateForFunctionPointer.

Unfortunately, both of these solutions have their individual issues - lack of flexibility, performance overhead, reliance on compile-time library names - the list goes on.

To solve most - if not all - of these issues, a friend (BlackCentipede) and I have developed a new solution for native interop in the CLR - AdvancedDLSupport (or ADL, for short).

The library was created with three things in mind: flexibility, modernity, and speed. It takes a new approach to binding to native code, using familiar tools in a new way. Furthermore, it targetsĀ .NET Standard 2.0, giving it wide compatibility with existing projects and runtimes.

The library is available for free on Github and Nuget - read on to see how you can use it to simplify your life with P/Invoke.

Table of Contents

  1. Basic Usage
  2. Mixed-Mode Classes
  3. Under the Hood
    1. Delegate-based Binding
    2. Indirect Calls
  4. Performance

Basic Usage

Let's take this simple C library.

math.h

int TimesUsed;

int Multiply(int a, int b);  
int Subtract(int a, int b);  

math.c

int Multiply(int a, int b)  
{
    ++TimesUsed;
    return a * b;
}

int Subtract(int a, int b)  
{
    ++TimesUsed;
    return a - b;
}

In your typical DllImport-driven interop, you might declare a static class like this, and import the functions from the library.

public static class NativeMath  
{
    [DllImport("math")];
    public static extern int Multiply(int a, int b);

    [DllImport("math")];
    public static extern int Subtract(int a, int b);
}

This is all well and good, but you're now faced with some annoying constraints. In order to use this on multiple platforms, you are forced to rely on platform-specific logic to resolve the location of your library: math.dll on Windows, and libmath.so or libmath.dylib on *nix and macOS.

Additionally, the class is static, and is difficult to use in modern scenarios; because of its static nature, the class can't be passed around, it can't be instantiated, it can't inherit from any class, other classes cannot inherit from it, etc. Finally, it's slow. DllImport carries some overhead, which can be painfully noticeable in high-churn applications.

ADL takes a different approach. Instead of declaring a class, we declare an interface.

public interface IMath : IDisposable  
{
    int TimesUsed { get; }

    int Multiply(int a, int b);
    int Subtract(int a, int b);
}

Using this interface, we can then instantiate a type that implements the interface, and binds to the native functions. Of note is the property, which will bind to a global variable - something DllImport can't do at all.

using (var mathLibrary = NativeLibraryBuilder.Default.ActivateInterface<IMath>(LibraryName))  
{
    int mySubtraction = mathLibrary.Subtract(10, 5);
    int myMultiplication mathLibrary.Multiply(5, 5);

    int timesUsed = mathLibrary.TimesUsed;

    Console.WriteLine($"Subtraction: {mySubtraction}, Multiplication: {myMultiplication}, Times used: {timesUsed}");
}

// Output:
// Subtraction: 5, Multiplication: 25, Times used: 2

This has several benefits.

  1. You are in complete control of when and how to load and unload the native functions.
  2. You decide, at runtime, the location or name of your library.
  3. The library is now an instance, and is not static. You can inject it into your types, create mocks for unit tests, use it for generic constraints, etc.

Beyond basic usage like this, ADL supports all the typical P/Invoke patterns and mechanisms (passing structs by ref or by value, StringBuilder, passing classes, etc), outlined in the documentation.

ADL also brings some refreshing new features to the table:

  1. Mixed-mode classes
  2. Per-symbol disposal checks
  3. Integrated support for Mono's DllMap mechanism
  4. Lazy loaded symbols
  5. Support for T? and ref T? parameters as first-class citizens
  6. Support for binding to global variables

These are covered in ADL's advanced configuration documentation, but mixed-mode classes are quite interesting on their own. Let's take a look.

Mixed-Mode Classes

Beyond simple interfaces, ADL also allows you to seamlessly mix managed and native code via the use of mixed-mode classes. In short, you can have a managed class implement an unmanaged interface, making the native methods available right inside your own code.

Using the previously outlined native library, we can create a class in similar vein to this:

public abstract class MixedModeClass : NativeLibraryBase, IMath  
{
    public MixedModeClass(string path, Type interfaceType, ImplementationConfiguration configuration, TypeTransformerRepository transformerRepository)
        : base(path, interfaceType, configuration, transformerRepository)
    {
    }

    public bool RanManagedSubtract { get; private set; }

    public bool RanManagedSetter { get; private set; }

    public int ManagedAdd(int a, int b)
    {
        return a + b;
    }

    public int TimesUsed 
    {
        get => return 32;
        set => this.RanManagedSetter = true;
    }

    public abstract int Multiply(int value, int multiplier);

    public int Subtract(int value, int other) 
    {
        RanManagedSubtract = true;
        return value - other;
    }
}

As you can see, managed functions can coexist with unmanaged ones, and managed code can override implementations of the unmanaged functions - as seen in Subtract. This allows you to more easily wrap object-oriented managed code, or provide mixed access to your native code. Perhaps you want to have more detailed input verification, or perhaps you want to isolate certain portions of your code.

In order to create a mixed-mode class, simply declare an abstract class that inherits from NativeLibraryBase, and implements the interface you want to activate. Any interface members can have explicit managed implementations, or remain abstract to be routed to their corresponding native implementations.

There are a few limitations:

  • Mixed-mode class must inherit from NativeLibraryBase
  • Mixed-mode classes must be abstract
  • Properties may only be fully managed or fully unmanaged - no mixing of getters and setters

Once you have your class definition, instances of it can be created in much the same way as you would interface instances:

NativeLibraryBuilder::ActivateClass<MixedModeClass, IMixedModeLibrary>(LibraryName);  

The produced class will inherit from the base class, and implement the given native interface.

Under the hood

ADL leverages some known techniques and some more arcane approaches to enable flexible and efficient binding to native libraries.

At its core, ADL inspects the interface you pass it, and generates a new type at runtime that implements the interface, forwarding calls to the interface methods to their native counterparts.

First and foremost, it uses the native platform's method to load dynamic libraries at runtime, and look up unmanaged function pointers from them. On Unix and BSDs, this means libdl, and on Windows, the LoadLibrary/GetProcAddress methods from kernel32.

public class Math_Generated : IMath  
{
    private IntPtr _libraryPtr;
}

After this, depending on the way you configure it, there are two primary paths it takes.

Delegate-based Binding

In C#, there exist methods to take an unmanaged function pointer and turn it into a delegate - Marshal.GetDelegateForFunctionPointer(IntPtr ptr, Type delegateType). This lies at the core of ADL's delegate-based approach, and it generates a matching delegate type for your methods.

This method of binding to native code is flexible, but is unfortunately quite slow.

private delegate int Multiply_dt(int a, int b);  
private Multiply_dt Multiply_dtm;

public Math()  
{
    var symbolPtr = LoadSymbol(_libraryPtr, nameof(Multiply));
    Multiply_dtm = (Multiply_dt)Marshal.GetDelegateForFunctionPointer(symbolPtr, typeof(Multiply_dt));
}

public int Multiply(int a, int b)  
{
    return Multiply_dtm(a, b);
}

It's simple, but reliable. All of this can be done by hand, of course, but it ends up being a cumbersome amount of boilerplate code that has to be written and then maintained. ADL, being that it generates this automatically, saves you a significant chunk of time and maintenance costs.

Indirect Calls

If, however, you want to squeeze some extra speed out of your interop, ADL also offers another way to bind - by using the calli opcode. This is a fairly unknown opcode in the CLR, but it's been used with great success by large projects like OpenTK and SharpDX to speed up their native interop.

calli, in a nutshell, directly calls an unmanaged function pointer described by a callsite, bypassing all of the overhead that stems from type checking, delegate generation, and runtime code verification.

Instead of generating a delegate, we can simply call the symbol pointer directly.

private IntPtr Multiply_ptr;

public Math()  
{
    Multiply_ptr = LoadSymbol(_libraryPtr, nameof(Multiply));
}

public int Multiply(int a, int b)  
{
    ldarg a;
    ldarg b;
    ldfld Multiply_ptr;
    calli;
}

This way of calling the unmanaged pointer produces massive speed benefits, resulting in between 2 and 8 times the speed of normal DllImport or delegates.

It is, however, not without its faults. Code with calli is inherently unverifiable, and will not run under partial trust on Windows. However, the default security policy is to run executables with full trust - so unless you have a restricted platform, it won't be an issue.

Additionally, .NET Core lacks a way to set the unmanaged calling convention at runtime, resulting in unreliable results when running as a 32-bit process on Windows. This is, fortunately, a very uncommon configuration.

This method is normally not available to developers - the various CLR compilers (C#, F#, VB.NET) never emit the calli opcode on their own.

Performance

EDIT: It was noted that the tests on Linux were with the debugger attached. New data has been uploaded without the debugger.

So we've been banging on about performance for a while now - let's see some numbers. This is a benchmark of Matrix2 inversions performed using ADL and BenchmarkDotNet, managed code, and traditional DllImport. The tests have been run under Mono, .NET Core, and the full .NET Framework (v4.7.1).

The Mono and .NET Core tests were performed on Linux Mint 18.3, using an i7-4790K with 16GB RAM.

The full FX tests were performed on Windows 10, using an i7-7600K with 16GB RAM.

Each test case is as follows:

Managed                       : Managed code, no interop  
DllImport                     : Traditional DllImport  
Delegates                     : Delegates, with disposal checks  
DelegatesWithoutDisposeChecks : Delegates, no disposal checks  
calli                         : Using the calli opcode  

Mono

BenchmarkDotNet=v0.10.14, OS=linuxmint 18.3  
Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores  
  [Host] : Mono 5.10.1.42 (tarball Wed), 64bit
  Mono   : Mono 5.10.1.42 (tarball Wed), 64bit


                               Method |         Mean |      Error |     StdDev |
------------------------------------- |-------------:|-----------:|-----------:|
                           CalliByRef |     8.774 ns |  0.1943 ns |  0.1723 ns |
                       DllImportByRef |    10.844 ns |  0.0133 ns |  0.0125 ns |
                         ManagedByRef |    12.922 ns |  0.1052 ns |  0.0984 ns |
   DelegatesWithoutDisposeChecksByRef |   948.430 ns | 17.7136 ns | 17.3971 ns |
                       DelegatesByRef |   958.051 ns |  2.0486 ns |  1.7106 ns |

                         CalliByValue |    21.163 ns |  0.0866 ns |  0.0768 ns |
                     DllImportByValue |    21.991 ns |  0.2201 ns |  0.2058 ns |
                       ManagedByValue |    22.528 ns |  0.0371 ns |  0.0347 ns |
 DelegatesWithoutDisposeChecksByValue | 1,215.534 ns |  4.5346 ns |  3.7866 ns |
                     DelegatesByValue | 1,228.969 ns |  9.6864 ns |  8.0886 ns |

.NET Core

BenchmarkDotNet=v0.10.14, OS=linuxmint 18.3  
Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores  
.NET Core SDK=2.1.4
  [Host] : .NET Core 2.0.5 (CoreCLR 4.6.0.0, CoreFX 4.6.26018.01), 64bit RyuJIT
  Core   : .NET Core 2.0.5 (CoreCLR 4.6.0.0, CoreFX 4.6.26018.01), 64bit RyuJIT


                               Method |      Mean |     Error |    StdDev |
------------------------------------- |----------:|----------:|----------:|
                         ManagedByRef |  3.660 ns | 0.0045 ns | 0.0042 ns |
                           CalliByRef |  8.010 ns | 0.0373 ns | 0.0312 ns |
                       DllImportByRef | 15.204 ns | 0.1530 ns | 0.1356 ns |
   DelegatesWithoutDisposeChecksByRef | 20.411 ns | 0.4039 ns | 0.3778 ns |
                       DelegatesByRef | 22.027 ns | 0.0572 ns | 0.0478 ns |


                         ManagedByRef |  3.660 ns | 0.0045 ns | 0.0042 ns |
                         CalliByValue | 21.912 ns | 0.0149 ns | 0.0132 ns |
                     DllImportByValue | 29.796 ns | 0.0347 ns | 0.0271 ns |
                     DelegatesByValue | 36.662 ns | 0.5188 ns | 0.4853 ns |
 DelegatesWithoutDisposeChecksByValue | 35.504 ns | 0.3495 ns | 0.3269 ns |

.NET FX

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.371 (1709/FallCreatorsUpdate/Redstone3)  
Intel Core i7-7600U CPU 2.80GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores  
Frequency=2835937 Hz, Resolution=352.6171 ns, Timer=TSC  
  [Host]     : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2633.0
  Clr        : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2633.0

                               Method |     Mean |     Error |    StdDev |
------------------------------------- |---------:|----------:|----------:|
                         ManagedByRef | 26.74 ns | 0.3286 ns | 0.3074 ns |
                           CalliByRef | 27.00 ns | 0.4233 ns | 0.3960 ns |
                       DllImportByRef | 29.90 ns | 0.1497 ns | 0.1250 ns |
   DelegatesWithoutDisposeChecksByRef | 58.63 ns | 1.2115 ns | 1.0740 ns |
                       DelegatesByRef | 75.16 ns | 1.2799 ns | 1.1972 ns |

                       ManagedByValue | 19.29 ns | 0.2555 ns | 0.2265 ns |                       
                     DllImportByValue | 27.20 ns | 0.1880 ns | 0.1666 ns |
                         CalliByValue | 36.52 ns | 0.4173 ns | 0.3903 ns |
 DelegatesWithoutDisposeChecksByValue | 67.60 ns | 1.3390 ns | 1.3151 ns |
                     DelegatesByValue | 91.47 ns | 0.5847 ns | 0.5469 ns |

Across all platforms, calli remains quite consistent, while delegates and DllImport see some rather worrying fluctuations. Delegates on Mono appears to be an outlier and is significantly slower than other methods.

You can view the source code of and run the benchmark yourself here: AdvancedDLSupport.Benchmark

ADL is available for free on Github and Nuget. For companies, established open-source projects, or individuals wanting a custom license, please chuck us an email.