Overview

While multicore computers have become mainstream, multicore programming remains full of pitfalls. Using P cores to speed up one’s program by a factor close to P turns out to be far more challenging than it might sound at first. Beyond the immediate difficultly of parallelizing existing algorithms, one faces the problem of amortizing the costs of thread creation, the challenge of dealing with dynamic load balancing, the hazards of concurrent programming, and the headaches associated with weak memory models.

Since January 2011, we have been working on the development of new abstractions and scheduling algorithms for multicore programming. We belive that, by using the right programming language abstractions together with a flexible scheduler, multicore programming can become much more accessible. To experiment our ideas, we have been developing a C++ library, called pasl.

Related publications

Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages

(Acar, Charguéraud, and Rainey 2011)

paper slides

Efficient Primitives for Creating and Scheduling Parallel Computations

(Acar, Charguéraud, and Rainey 2012)

paper slides

Scheduling Parallel Programs by Work Stealing with Private Deques

(Acar, Charguéraud, and Rainey 2013)

paper slides

Theory and Practice of Chunked Sequences

(Acar, Charguéraud, and Rainey 2014)

paper slides

Software

The pasl sources are available via github.

Screenshots

Speedup curves

Speedup curve for matrix-multiply (matrices of size 3500x3500).

Speedup curve for string search (pattern of 10 chars, text of 10 billion chars).

Processor utilization

The plot shows utilization along the y axis and time along the x axis in the computation of matrix-multiply.

Red rectangles correspond to idle time.

Experimental results

Our experimental results show that work stealing algorithms based on private deques are competitive with traditional concurrent work stealing deques.

Parallel runs averaged on 20 runs (usually 5% to 10% difference between a fast and a slow run); see our PPoPP’13 paper for details.

Theoretical result

We have proved the efficiency of a work stealing scheduler based on private deques.

See our PPoPP’13 paper for details.

Team

Collaborators

References

Get the bibtex file used to generate these references.

Acar, Umut A., Arthur Charguéraud, and Mike Rainey. 2011. “Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages.” In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, (OOPSLA), edited by Cristina Videira Lopes and Kathleen Fisher, 499–518. ACM. http://arthur.chargueraud.org/research/2011/oracle.

———. 2012. “Efficient Primitives for Creating and Scheduling Parallel Computations.” In Declarative Aspects and Applications of Multicore Programming (DAMP).

———. 2013. “Scheduling Parallel Programs by Work Stealing with Private Deques.” In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 219–28. PPoPP ’13. ACM.

———. 2014. “Theory and Practice of Chunked Sequences.” In Algorithms - ESA 2014 - 22th Annual European Symposium. Proceedings, edited by Andreas S. Schulz and Dorothea Wagner, 8737:25–36. Lecture Notes in Computer Science. Springer.