Publications
Selected Publications
[DPH+16a] | Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, and Nathalie Drach. Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11-15, 2016, pages 125--137, 2016. [ DOI |http ] Best Paper Award - PACT 2016
Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform abstraction of both computing and memory resources for task-parallel programming models while achieving high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system and placement information from the operating system. We achieve 94% of local memory accesses on a 192-core system with 24 NUMA nodes, up to 5x higher performance than NUMA-aware hierarchical work-stealing, and even 5.6x compared to static interleaved allocation. Finally, we show that state-of-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications. |
[PC13] | Antoniu Pop and Albert Cohen. Openstream: Expressiveness and data-flow compilation of openmp streaming programs. ACM Trans. Archit. Code Optim., 9(4):53:1-53:25, January 2013. [ DOI | http | .pdf]
We present OpenStream, a data-flow extension of OpenMP to express dynamic dependent tasks. The language supports nested task creation, modular composition, variable and unbounded sets of producers/consumers, and first-class streams. These features, enabled by our original compilation flow, allow translating high-level parallel programming patterns, like dependences arising from StarSs' array regions, or universal low-level primitives like futures. In particular, these dynamic features can be embedded efficiently and naturally into an unmanaged imperative language, avoiding the complexity and overhead of a concurrent garbage collector. We demonstrate the performance advantages of a data-flow execution model compared to more restricted task and barrier models. We also demonstrate the efficiency of our compilation and runtime algorithms for the support of complex dependence patterns arising from StarSs benchmarks. |
[PC12] | Antoniu Pop and Albert Cohen. Control-Driven Data Flow. Rapport de recherche RR-8015, INRIA, July 2012. [ http | .pdf]
This paper presents CDDF, a model of computation underpinning the formal semantics of a number of parallel programming languages. CDDF integrates control flow elements for the dynamic construction of task graphs, and data flow elements to express dependent computations and to decouple these using unbounded streams (Kahn process networks). It is a common ground to define the formal semantics of imperative programming languages with dynamic task creation, as well as data-flow or concurrent functional languages, as a special case of more general dependent task languages with channels or streams. We prove essential properties for languages fitting this model of computation, including deadlock-freedom, functional and deadlock determinism, and serializability. We also compare the model's hypotheses with Cilk's strictness and the Kahn principle. |
Related Publications
[DPHC16] | Andi Drebes, Antoniu Pop, Karine Heydemann, and Albert Cohen. Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2016, Uppsala, Sweden, April 17-19, 2016, pages 274--283, 2016. [ DOI |http ] |
[DBP+16] | Andi Drebes, Jean-Baptiste Bréjon, Antoniu Pop, Karine Heydemann, and Albert Cohen. Language-centric performance analysis of openmp programs with aftermath. In OpenMP: Memory, Devices, and Tasks - 12th International Workshop on OpenMP, IWOMP 2016, Nara, Japan, October 5-7, 2016, Proceedings, pages 237--250, 2016. [ DOI |http ] |
[DPH+16b] | Andi Drebes, Antoniu Pop, Karine Heydemann, Nathalie Drach, and Albert Cohen. Numa-aware scheduling and memory allocation for data-flow task-parallel applications. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, pages 44:1--44:2, 2016. [ DOI | http ] |
[Dre15] | Andi Drebes. Dynamic optimization of data-flow task-parallel applications for large-scale NUMA systems. PhD thesis, Université Pierre et Marie Curie-Paris VI, 2015. [ .pdf] |
[RNPL15] | Andrey Rodchenko, Andy Nisbet, Antoniu Pop, and Mikel Luján. Effective barrier synchronization on intel xeon phi coprocessor. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings, pages 588--600, 2015. [ DOI | http ] |
[GBB+14] | Roberto Giorgi, Rosa M. Badia, François Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Rahulkumar Gayatri, Sylvain Girbal, Daniel Goodman, Behram Khan, Souad Koliai, Joshua Landwehr, Nhat Minh Lê, Feng Li, Mikel Luján, Avi Mendelson, Laurent Morin, Nacho Navarro, Tomasz Patejko, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Ian Watson, Sebastian Weis, Stéphane Zuckerman, and Mateo Valero. TERAFLUX: harnessing dataflow in next generation teradevices. Microprocessors and Microsystems - Embedded Hardware Design, 38(8):976--990, 2014. [ DOI | http ] |
[DHD+14] | Andi Drebes, Karine Heydemann, Nathalie Drach, Antoniu Pop, and Albert Cohen. Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. TACO, 11(3):30:1--30:25, 2014. [ DOI | http ] |
[KPP+14] | Martin Kong, Antoniu Pop, Louis-Noël Pouchet, R. Govindarajan, Albert Cohen, and P. Sadayappan. Compiler/runtime framework for dynamic dataflow parallelization of tiled programs. TACO, 11(4):61:1--61:30, 2014. [ DOI | http ] |
[LCG+14] | Mihai T. Lazarescu, Albert Cohen, Adrien Guatto, Nhat Minh Lê, Luciano Lavagno, Antoniu Pop, Manuel Prieto, Andrei Terechko, and Alexandru Sutii. Energy-aware parallelization flow and toolset for C code. In 17th International Workshop on Software and Compilers for Embedded Systems, SCOPES '14, Sankt Goar, Germany, June 10-11, 2014, pages 79--88, 2014. [ DOI | http ] |
[DHP+14a] | Andi Drebes, Karine Heydemann, Antoniu Pop, Albert Cohen, and Nathalie Drach. Automatic detection of performance anomalies in task-parallel programs. CoRR, abs/1405.2916, 2014. [ http ] |
[DPH+14b] | Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, and Nathalie Drach-Temam. Aftermath: A graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems. In 7th workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2014). [ .pdf] |
[PC13] | Antoniu Pop and Albert Cohen. Openstream: Expressiveness and data-flow compilation of openmp streaming programs. TACO, 9(4):53:1--53:25, 2013. [ DOI | http ] |
[SBB+13] | Marco Solinas, Rosa M. Badia, François Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Sylvain Girbal, Daniel Goodman, Behram Khan, Souad Koliai, Feng Li, Mikel Luján, Laurent Morin, Avi Mendelson, Nacho Navarro, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Mateo Valero, Sebastian Weis, Ian Watson, Stéphane Zuckerman, and Roberto Giorgi. The TERAFLUX project: Exploiting the dataflow paradigm in next generation teradevices. In 2013 Euromicro Conference on Digital System Design, DSD 2013, Los Alamitos, CA, USA, September 4-6, 2013, pages 272--279, 2013. [ DOI | http ] |
[PVB+13] | Hector Posadas, Eugenio Villar, Florian Broekaert, Michel Bourdellès, Albert Cohen, Antoniu Pop, Nhat Minh Lê, Adrien Guatto, Mihai T. Lazarescu, Luciano Lavagno, Andrei Terechko, Miguel Glassee, Daniel Calvo, and Eduardo de las Heras. EU FP7-288307 pharaon project: Parallel and heterogeneous architecture for real-time applications. In 2013 Euromicro Conference on Digital System Design, DSD 2013, Los Alamitos, CA, USA, September 4-6, 2013, pages 371--378, 2013. [ DOI | http ] |
[LPCN13] | Nhat Minh Lê, Antoniu Pop, Albert Cohen, and Francesco Zappa Nardelli. Correct and efficient work-stealing for weak memory models. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, Shenzhen, China, February 23-27, 2013, pages 69--80, 2013. [ DOI | http ] |
[LGCP13] | Nhat Minh Lê, Adrien Guatto, Albert Cohen, and Antoniu Pop. Correct and efficient bounded FIFO queues. In 25th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2013, Porto de Galinhas, Pernambuco, Brazil, October 23-26, 2013, pages 144--151, 2013. [ DOI | http ] |
[Pop13] | Antoniu Pop. Openstream: a data-flow approach to solving the von neumann bottlenecks. In International Workshop on Software and Compilers for Embedded Systems, M-SCOPES '13, Sankt Goar, Germany, June 19-21, 2013, page 2, 2013. [ DOI | http ] |
[LPC12] | Feng Li, Antoniu Pop, and Albert Cohen. Automatic extraction of coarse-grained data-flow threads from imperative programs. IEEE Micro, 32(4):19--31, 2012. [ DOI | http ] |
[PC12] | Antoniu Pop and Albert Cohen. Work-streaming Compilation of Futures. In Proceedings of the Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software (PLACES), March 2012. |
[MAB+11] | Harm Munk, Eduard Ayguadé, Cédric Bastoul, Paul M. Carpenter, Zbigniew Chamski, Albert Cohen, Marco Cornero, Philippe Dumont, Marc Duranton, Mohammed Fellahi, Roger Ferrer, Razya Ladelsky, Menno Lindwer, Xavier Martorell, Cupertino Miranda, Dorit Nuzman, Andrea C. Ornstein, Antoniu Pop, Sebastian Pop, Louis-Noël Pouchet, Alex Ramírez, David Ródenas, Erven Rohou, Ira Rosen, Uzi Shvadron, Konrad Trifunovic, and Ayal Zaks. ACOTES project: Advanced compiler technologies for embedded streaming. International Journal of Parallel Programming, 39(3):397--450, 2011. [ DOI | http ] |
[PC11] | Antoniu Pop and Albert Cohen. A stream-computing extension to openmp. In High Performance Embedded Architectures and Compilers, 6th International Conference, HiPEAC 2011, Heraklion, Crete, Greece, January 24-26, 2011. Proceedings, pages 5--14, 2011. [ DOI | http ] |
[Pop11] | Antoniu Pop. Leveraging Streaming for Deterministic Parallelization: an Integrated Language, Compiler and Runtime Approach. PhD thesis, MINES ParisTech, September 2011. |
[MPD+10] | Cupertino Miranda, Antoniu Pop, Philippe Dumont, Albert Cohen, and Marc Duranton. Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes. In Proceedings of the 2010 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2010, Scottsdale, AZ, USA, October 24-29, 2010, pages 11--20, 2010. [ DOI | http ] |
[MDC+10] | Cupertino Miranda, Philippe Dumont, Albert Cohen, Marc Duranton, and Antoniu Pop. ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance. In Proceedings of the 7th Conference on Computing Frontiers, 2010, Bertinoro, Italy, May 17-19, 2010, pages 119--120, 2010. [ DOI | http ] |