Spin Your Critical Sections

As I mentioned in the previous section, critical sections are the preferred method of synchronization when you are only synchronizing inside a process. However, you can get a considerable performance boost using critical sections if you remember to spin!

Years ago, some folks at Microsoft were wondering about multithreaded application performance, so they came up with several testing scenarios to find out more. After lots of study, they found something quite counterintuitive, though not unheard of in computer science. They found that in certain cases it was much faster to poll than to actually perform an operation. We've all been told since we were wee programmers never to poll, but in the case of critical sections, that's exactly what you want to do.

The vast majority of critical-section protection was for small data protection cases. As I described in the last section, a critical section is protected by a semaphore, and making the call into kernel mode to acquire that critical section is extremely expensive. The original implementation of EnterCriticalSection simply looked to see whether the critical section could be acquired. If it couldn't, EnterCriticalSection went right into kernel mode. In most cases, by the time the thread got into kernel mode and back down, the other thread had released the critical section a million years ago in computer time. The counterintuitive idea the Microsoft researchers came up with on multiple CPU systems was to check whether the critical section was available and, if it wasn't, spin the CPU, then check again. On single CPU systems, the spin count, obviously, is ignored. If after the second check the critical section wasn't available, finally transition to kernel mode. The idea was that keeping the thread in user mode, even though it was spinning on nothing, was tremendously faster than transitioning to kernel mode.

Two functions allow you to set the critical-section spin count. The first is initializeCriticalSectionAndSpinCount, which you should use in place of initializeCriticalSection. For the second function,

SetCriticalSectionSpinCount, you want to change the value you originally started with, or you need to change the value for library code that uses only InitializeCriticalSection. Of course, I am assuming that you can access the critical-section pointer in your derived code.

Determining your spin count can be problematic. If you work in an environment in which you have the two to three weeks to run through all the scenarios, grab all those interns sitting around and have fun. However, most of us aren't that lucky. I always use the value 4,000 for my spin count. That's what Microsoft uses for the operating system heaps, and I always figured that my code was probably less intensive than those. Using that number also would be big enough should I keep my code in user mode almost all the time.

Don't Use CreateThread/ExitThread

One of the more insidious mistakes that people make in multithreaded development is using CreateThread. Of course, that begs this question: if you can't use CreateThread to start a thread, how can you get any threads cranked up? Instead of CreateThread, you should always use _beginthreadex, the C run-time function to start your threads. As you'd expect, since ExitThread is paired with CreateThread to end a thread, _beginthreadex has its own matching exit function, _exitthreadex, that you'll need to use instead as well.

You might be using CreateThread in your application right now and not be experiencing any problems whatsoever. Unfortunately, some very subtle bugs can occur because the C run time is not initialized when you use CreateThread. The C run time relies on some per-thread data, so certain standard C run-time functions designed before high speed multithreaded applications were the norm. For example, the function strtok holds the string to parse in per-thread storage. Using _beginthreadex ensures that the per-thread data is there along with other things the C run time needs. To ensure proper thread cleanup, use _exitthreadex, which will ensure the C run time resources are cleaned up when you need to exit the thread prematurely.

The _beginthreadex function works the same way and takes the same type of parameters as CreateThread. To end your thread, simply return from the thread function or call _endthreadex. However, if you want to leave early, use the _endthreadex C run time function to end your threads. As with the CreateThread API function, _beginthreadex returns the thread handle, which you must pass to CloseHandle to avoid a handle leak.

If you look up _beginthreadex, you'll also see a C run time function named _beginthread. You'll want to avoid using that function like the plague because its default behavior is a bug, in my opinion. The handle returned by _beginthread is cached, so if the thread ends quickly, another spawned thread could overwrite that location. In fact, the documentation on _beginthread indicates that it's safer to use _beginthreadex. When reviewing your code, make sure to note calls to _beginthread and _endthread so that you can change them to _beginthreadex and _endthreadex, respectively.

+1 -1

Post a comment