Avoiding Deadlocks

From cxwiki

Deadlocks are scenarios where one or more threads of execution become blocked in a manner that they can never recover. The simple example is a condition where one thread holds lock A and is attempting to take lock B, and where another thread holds lock B and is attempting to take lock A. It is evident that neither thread will succeed in taking their lock, and thus neither thread will ever progress.
 
There are numerous strategies possible for an application to avoid deadlocks. The most basic approaches are as follows:
  1. Where possible, protect data elements which have no outside dependencies. This ensures that you can take a lock, perform any data access, and release the lock without having to take any further locks.
  2. Reduce the number of locks in play. Use locks which cover an entire system, rather than independantly protecting many subcomponents. This not only reduces the chance of a deadlock being possible, but also reduces the CPU time spent locking and unlocking and (in the case of many small objects) the resource cost of constructing the locks and keeping them in memory.
  3. For each function which may be called with a lock already held, annotate which locks MAY be held, which locks MUST be held, and which locks MUST NOT be held. Check that each function which is called in turn is annotated in a compatible fashion.
  4. When it is necessary to take multiple locks simultaneously, always take the locks in a fixed order. "A > B", "A > B", "B > C", "A > C", etc. but never "B > A" or "C > A".
  5. In some cases it is possible to take multiple locks in a single atomic operation. This is appropriate as an alternative to taking those same locks in a fixed order, as long as the locks are not being taken recursively.
  6. Where a large number of elements exist that must be locked independantly (to allow multithreaded operation) and in undefined orders, a deadlock-prevention system must be implemented. This system must track which threads hold which locks, and must reject any attempt to put the system into a deadlocked state. The calling threads must be written to handle such lock rejections, typically by rolling back the current operation and releasing all locks, with the potential to automatically retry the operation when the other parties to the potential deadlock signal completion.

It is also important to consider exactly what can be considered a lock:

  • Obviously, a CXMutex or similar is a lock.
  • Less obviously, using CXWorkerHost::EnqueueTask() and similar may be considered equivalent to a semaphore; if all worker threads are each blocking on Resource A, and the thread which holds Resource A is trying to pass control of the resource to a new worker task, then a deadlock may occur. (Even if care is taken to keep the number of outstanding tasks lower than the number of worker threads, this can still happen by having two different codepaths each hitting a similar problem at the same time, halving the number of worker threads available to each codepath.) One solution here is to ensure that a thread waiting on a given resource is made available to run worker tasks for that resource in the interim.
  • If allowing a thread to be used to run worker tasks, be cautious of which locks are held, and which tasks can be run. If you simply allow any worker task to run while holding locks, then your locking semantics determined in #3 above may be violated. You should only allow specific worker tasks to run while holding a lock, and you should ensure that your lock is documented as legal to hold during the call to that worker task.