Singularity/Library/PackageCache/com.unity.burst@1.8.4/Documentation~/csharp-function-pointers.md
2024-05-06 11:45:45 -07:00

8.6 KiB

Function pointers

To work with dynamic functions that process data based on other data states, use FunctionPointer<T>. Because Burst treats delegates as managed objects, you can't use C# delegates to work with dynamic functions.

Support details

Function pointers don't support generic delegates. Also, avoid wrapping BurstCompiler.CompileFunctionPointer<T> within another open generic method. If you do this, Burst can't apply required attributes to the delegate, perform additional safety analysis, or perform potential optimizations.

Argument and return types are subject to the same restrictions as DllImport and internal calls. For more information, see the documentation on DllImport and internal calls.

Interoperability with IL2CPP

Interoperability of function pointers with IL2CPP requires System.Runtime.InteropServices.UnmanagedFunctionPointerAttribute on the delegate. Set the calling convention to CallingConvention.Cdecl. Burst automatically adds this attribute to delegates that are used with BurstCompiler.CompileFunctionPointer<T>.

Using function pointers

To use function pointers, identify the static functions that you want Burst to compile and do the following:

  1. Add a [BurstCompile] attribute to these functions

  2. Add a [BurstCompile] attribute to the containing type. This helps the Burst compiler find the static methods that have [BurstCompile] attribute

  3. Declare a delegate to create the "interface" of these functions

  4. Add a [MonoPInvokeCallbackAttribute] attribute to the functions. You need to add this so that IL2CPP works with these functions. For example:

    // Instruct Burst to look for static methods with [BurstCompile] attribute
    [BurstCompile]
    class EnclosingType {
        [BurstCompile]
        [MonoPInvokeCallback(typeof(Process2FloatsDelegate))]
        public static float MultiplyFloat(float a, float b) => a * b;
    
        [BurstCompile]
        [MonoPInvokeCallback(typeof(Process2FloatsDelegate))]
        public static float AddFloat(float a, float b) => a + b;
    
        // A common interface for both MultiplyFloat and AddFloat methods
        public delegate float Process2FloatsDelegate(float a, float b);
    }
    
  5. Compile these function pointers from regular C# code:

        // Contains a compiled version of MultiplyFloat with Burst
        FunctionPointer<Process2FloatsDelegate> mulFunctionPointer = BurstCompiler.CompileFunctionPointer<Process2FloatsDelegate>(MultiplyFloat);
    
        // Contains a compiled version of AddFloat with Burst
        FunctionPointer<Process2FloatsDelegate> addFunctionPointer = BurstCompiler.    CompileFunctionPointer<Process2FloatsDelegate>(AddFloat);
    

Using function pointers in a job

To use the function pointers directly from a job, pass them to the job struct:

    // Invoke the function pointers from HPC# jobs
    var resultMul = mulFunctionPointer.Invoke(1.0f, 2.0f);
    var resultAdd = addFunctionPointer.Invoke(1.0f, 2.0f);

Burst compiles function pointers asynchronously for jobs by default. To force a synchronous compilation of function pointers use [BurstCompile(SynchronousCompilation = true)].

Using function pointers in C# code

To use these function pointers from regular C# code, cache the FunctionPointer<T>.Invoke property (which is the delegate instance) to a static field to get the best performance:

    private readonly static Process2FloatsDelegate mulFunctionPointerInvoke = BurstCompiler.CompileFunctionPointer<Process2FloatsDelegate>(MultiplyFloat).Invoke;

    // Invoke the delegate from C#
    var resultMul = mulFunctionPointerInvoke(1.0f, 2.0f);

Using Burst-compiled function pointers from C# might be slower than their pure C# version counterparts if the function is too small compared to the overhead of P/Invoke interop.

Performance considerations

Where possible, you use a job over a function pointer to run Burst compiled code, because jobs are more optimal. Burst provides better aliasing calculations for jobs because the job safety system has more optimizations by default.

You also can't pass most of the [NativeContainer] structs like NativeArray directly to function pointers and must use a job struct to do so. Native container structs contain managed objects for safety checks that the Burst compiler can work around when compiling jobs, but not for function pointers.

The following example shows a bad example of how to use function pointers in Burst. The function pointer computes math.sqrt from an input pointer and stores it to an output pointer. MyJob feeds this function pointer sources from two NativeArrays which isn't optimal:

///Bad function pointer example
[BurstCompile]
public class MyFunctionPointers
{
    public unsafe delegate void MyFunctionPointerDelegate(float* input, float* output);

    [BurstCompile]
    public static unsafe void MyFunctionPointer(float* input, float* output)
    {
        *output = math.sqrt(*input);
    }
}

[BurstCompile]
struct MyJob : IJobParallelFor
{
     public FunctionPointer<MyFunctionPointers.MyFunctionPointerDelegate> FunctionPointer;

    [ReadOnly] public NativeArray<float> Input;
    [WriteOnly] public NativeArray<float> Output;

    public unsafe void Execute(int index)
    {
        var inputPtr = (float*)Input.GetUnsafeReadOnlyPtr();
        var outputPtr = (float*)Output.GetUnsafePtr();
        FunctionPointer.Invoke(inputPtr + index, outputPtr + index);
    }
}

This example isn't optimal for the following reasons:

  • Burst can't vectorize the function pointer because it's being fed a single scalar element. This means that 4-8x performance is lost from a lack of vectorization.
  • The MyJob knows that the Input and Output native arrays can't alias, but this information isn't communicated to the function pointer.
  • There is a non-zero overhead to constantly branching to a function pointer somewhere else in memory.

To use a function pointer in an optimal way, always process batches of data in the function pointer, like so:

[BurstCompile]
public class MyFunctionPointers
{
    public unsafe delegate void MyFunctionPointerDelegate(int count, float* input, float* output);

    [BurstCompile]
    public static unsafe void MyFunctionPointer(int count, float* input, float* output)
    {
        for (int i = 0; i < count; i++)
        {
            output[i] = math.sqrt(input[i]);
        }
    }
}

[BurstCompile]
struct MyJob : IJobParallelForBatch
{
     public FunctionPointer<MyFunctionPointers.MyFunctionPointerDelegate> FunctionPointer;

    [ReadOnly] public NativeArray<float> Input;
    [WriteOnly] public NativeArray<float> Output;

    public unsafe void Execute(int index, int count)
    {
        var inputPtr = (float*)Input.GetUnsafeReadOnlyPtr() + index;
        var outputPtr = (float*)Output.GetUnsafePtr() + index;
        FunctionPointer.Invoke(count, inputPtr, outputPtr);
    }
}

Thee modified MyFunctionPointer takes a count of elements to process, and loops over the input and output pointers to do a lot of calculations. The MyJob becomes an IJobParallelForBatch, and the count is passed directly into the function pointer. This is better for performance because of the following reasons:

  • Burst vectorizes the MyFunctionPointer call.
  • Because Burst processes count items per function pointer, any overhead of calling the function pointer is reduced by count times. For example, if you run a batch of 128, the function pointer overhead is 1/128th per index of what it was previously.
  • Batching results in a 1.53x performance gain over not batching.

However, to get the best possible performance, use a job. This gives Burst the most visibility over what you want it to do, and the most opportunities to optimize:

[BurstCompile]
struct MyJob : IJobParallelFor
{
    [ReadOnly] public NativeArray<float> Input;
    [WriteOnly] public NativeArray<float> Output;

    public unsafe void Execute(int index)
    {
        Output[i] = math.sqrt(Input[i]);
    }
}

This runs 1.26x faster than the batched function pointer example, and 1.93x faster than the non-batched function pointer examples. Burst has perfect aliasing knowledge and can make the broadest modifications to the above. This code is also a lot simpler than either of the function pointer cases.