January 19th, 2011 — Determinism and Fail-Stop Races for Sane Multiprocessing — Luis Ceze

Abstract

Current multicore systems are nondeterministic. Each time they execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging, limits the ability to properly test multithreaded code and hinders fault-tolerant scenarios. Moreover, data-races often lead to surprising behavior and complicate the semantics of programming languages.

In this talk, I will argue that nondeterminism should be kept at a minimum both during development as well as in the field, making the benefits of deterministic execution go beyond debugging. I will also argue that concurrency errors should be treated as exceptions, have fail-stop behavior and precise semantics. I will present our work on fully deterministic shared memory multiprocessing (DMP), exploring multiple deterministic execution strategies with different performance, complexity and scalability trade-offs. I will also present some of our work on architecture support for fail-stop behavior of data-races. I will end the talk with a brief overview of our efforts in complex concurrency bug detection and avoidance.

Bio

Luis Ceze is an Assistant Professor in the Computer Science and Engineering Department at the University of Washington. His research focuses on computer architecture, compiler, programming models and OS to improve the programmability, reliability and energy efficiency of multiprocessor systems. He has co-authored over 40 papers in these areas, and had several papers selected as IEEE Micro Top Picks and CACM research Highlights. He participated in the Blue Gene, Cyclops, and PERCS projects at IBM and is a recipient of several IBM awards. He is also a recipient of an NSF CAREER Award, a Sloan Research Fellowship, and a Microsoft Research Faculty Fellowship. He co-founded Corensic, a UW-CSE spin-off company, where he is a part-time consultant.

Resources

Slides.