Fast Concurrent Queues for x86 Processors
Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues. This work takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct the LCRQ: a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.
Fast Concurrent Queues for x86 Processors. Adam Morrison and Yehuda Afek. PPoPP 2013.
(This manuscript updates Figure 5b with 2 lines that were omitted in the proceedings version.)
- The LCRQ source code package contains a complete benchmark program which is meant to be plugged into the sim and synch synchronization techniques benchmark framework. This version of the package fixes several issues in the original release. The fixes do not affect our published performance results, and they should improve performance for other kinds of workloads.