Performance of Thunks

So far, I have only mentioned that calls via thunks are slower than method calls that do not cross a managed-unmanaged boundary. To decide whether the overhead of a transition is acceptable or not, you will likely need more precise information. The next sections explain the overhead of the different transitions. I will especially examine transitions based on interoperability vtables, P/Invoke metadata, and the CALLI instruction. Furthermore, I will discuss how to use a special optimization option for P/Invoke metadata and certain internal aspects of managed-unmanaged thunks that can affect the way you write code.

To discuss the overhead of transitions, I have written an application that measures different interoperability scenarios. The code for this application is provided in Appendix B. You can also download it from the Source Code/Download section of the Apress web site (www.apress. com/) so that you can reproduce the performance tests on your machine or adapt them to your own special scenarios.

To determine the performance of thunks for C functions, the application uses a native and a managed global function defined in the application itself (fManagedLocal and fNativeLocal), as well as a native and a managed function imported from an external DLL (fManagedFromDLL and fNativeFromDLL). The application measures how long it takes to call each of these methods directly, as well as via function pointers from native and managed code. For each measurement, 100 million calls are done. Table 9-2 shows the measured results on my machine (Pentium M 780, 2.26 GHz).

Table 9-2. Performance of Thunks for Global Functions

Direct Method Invocations (Not Via Function Pointers)

Indirect Calls (Via Function Pointers)

Caller

Managed Code

Native Code

Managed Code

Native Code

Callee

Managed Function from Same Assembly

(fManagedLocal)

0.32s M > M

2.12s U > M

Via stdcall*: 2.30s

M > U > M Via __clrcall*: 0.30s M > M

2.07s U > M

Native Function From Same Assembly

(fNativeLocal)

0.63s M > U

0.41s U > U

0.63s M > U

0.41s U > U

Managed Function From Imported DLL

(fManagedFromDLL)

3.54s M > U > M

2.12s U > M

2.39s M > U > M

2.07s U > M

Native Function from Imported DLL (fNativeFromDLL)

1.97s M > U

0.41s U > U

0.63s M > U

0.41s U > U

In addition to C junctions, the application measures calls to virtual and nonvirtual functions of C++ classes defined in the application itself, as well as in an external DLL. The results are shown in Table 9-3. Again, all functions are called from managed code as well as from native code.

Table 9-3. Performance of Thunks for Member Functions of Native Classes

Nonvirtual function calls

Virtual function calls

Caller

Managed Code

Native Code

Managed Code

Native Code

Callee

Native Class from Same Assembly with Managed Member Functions

0.27s M > M

2.25s U > M

_thiscall virtual function: 2.39s M > U > M __clrcall virtual function: 0.40s M > M

2.25s U > M

Native Class from Same Assembly with Native Member Functions

0.74s M > U

0.48s U > U

0.72s M > U

0.54s U > U

Native Class from Imported DLL with Managed Member Functions

3.80s M > U > M

2.30s U > M

2.39s M > U > M

2.25s U > M

Native Class from Imported DLL with Native Member Functions

2.12s M > U

0.49s U > U

0.72s M > U

0.55s U > U

As you can see in both tables, calls across managed-unmanaged boundaries produced by C++/CLI can be more than 500 percent slower than calls without transitions. However, unless you have a large number of transitions, this overhead can likely be ignored. The difference in overhead between the 10 million calls to fManagedLocal from native callers (~2.12s) and the 10 million calls from managed callers (~0.32s) is about 1.8 seconds.

In addition to the measured time, both tables also show the transitions that occur in the different scenarios. For example, for the direct call to fManagedLocal from managed code, the text "M > M" shows that a call from managed code to managed code has occurred. Cells with the text "U > M" indicate an unmanaged-to-managed transition. Likewise, "M > U" stands for a managed-to-unmanaged transition.

For the indirect call to fManagedLocal from managed code, the text M > U > M indicates a transition from managed code to unmanaged code and back to managed code. This is the double-thunking scenario discussed earlier. In addition to the double-thunking case, Table 92 also shows the cost for an indirect method call with a_clrcall function pointer, which can prevent double thunking, as discussed earlier. As you can see, double thunking can easily increase the costs for method calls by more than 600 percent. Table 9-3 shows similar results for the double-thunking problem related to virtual function calls.

Was this article helpful?

0 0

Post a comment