PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP

Authors:
Jack Dongarra

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Mark Gates

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Azzam Haidar

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Jakub Kurzak

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Piotr Luszczek

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Panruo Wu

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Ichitaro Yamazaki

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Asim Yarkhan

University of Tennessee, Knoxville, USA

University of Tennessee, Knoxville, USA
View Profile

,
Maksims Abalenkovs

The University of Manchester, Manchester, UK

The University of Manchester, Manchester, UK
View Profile

,
Negin Bagherpour

The University of Manchester, Manchester, UK

The University of Manchester, Manchester, UK
View Profile

,
Sven Hammarling

The University of Manchester, Manchester, UK

The University of Manchester, Manchester, UK
View Profile

,
Jakub Šístek

The University of Manchester, Manchester, UK

The University of Manchester, Manchester, UK

0000-0002-5231-7830
View Profile

,
David Stevens

The University of Manchester, Manchester, UK

The University of Manchester, Manchester, UK
View Profile

,
Mawussi Zounon

The University of Manchester, Manchester, UK

The University of Manchester, Manchester, UK
View Profile

,
Samuel D. Relton

The University of Leeds, Leeds, UK

The University of Leeds, Leeds, UK
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 45 Issue 2Article No.: 16pp 1–35https://doi.org/10.1145/3264491

Published:03 May 2019Publication History

ACM Transactions on Mathematical Software

Abstract

The recent version of the Parallel Linear Algebra Software for Multicore Architectures (PLASMA) library is based on tasks with dependencies from the OpenMP standard. The main functionality of the library is presented. Extensive benchmarks are targeted on three recent multicore and manycore architectures, namely, an Intel Xeon, Intel Xeon Phi, and IBM POWER 8 processors.

References

Jan Ole Aasen. 1971. On the reduction of a symmetric matrix to tridiagonal form. BIT Numer. Math. 11, 3 (1971), 233--242.Google ScholarDigital Library
Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon. 2017b. PLASMA 17 Performance Report. Technical Report 292. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn292.pdf.Google Scholar
Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon. 2017a. PLASMA 17.1 Functionality Report. Technical Report 293. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn293.pdf.Google Scholar
Ahmad Abdelfattah, Hartwig Anzt, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki, and Asim YarKhan. 2016. Linear algebra software for large-scale accelerated multicore computing. Acta Numer. 25 (2016), 1--160.Google ScholarCross Ref
Emmanuel Agullo, Henricus Bouwmeester, Jack Dongarra, Jakub Kurzak, Julien Langou, and Lee Rosenberg. 2010. Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In Proceedings of the International Conference on High Performance Computing for Computational Science. Springer, 129--138. Google ScholarDigital Library
Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. In Journal of Physics: Conference Series, Vol. 180. IOP Publishing, 012037.Google Scholar
Edward Anderson, Zhaojun Bai, Christian Bischof, Susan L. Blackford, James W. Demmel, Jack J. Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven J. Hammarling, Alan McKenney, and Danny C. Sorensen. 1999. LAPACK User’s Guide (3rd ed.). Society for Industrial and Applied Mathematics, Philadelphia.Google Scholar
Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, and Stanimire Tomov. 2009. Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180, 12 (2009), 2526--2533.Google ScholarCross Ref
Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo, and Ichitaro Yamazaki. 2014. A communication avoiding symmetric indefinite factorization. SIAM J. Matrix Anal. Appl. 35, 4 (2014), 1364--1406.Google ScholarCross Ref
Pieter Bellens, Josep M. Perez, Rosa M Badia, and Jesus Labarta. 2006. CellSs: A programming model for the Cell BE architecture. In Proceedings of the ACM/IEEE SC Conference. IEEE, 5--5. Google ScholarDigital Library
Susan L. Blackford, Jaeyoung Choi, Andrew Cleary, Ed D’Azeuedo, James W. Demmel, Inderjit Dhillon, Jack J. Dongarra, Sven J. Hammarling, Greg Henry, Antoine Petitet, Ken Stanley, David W. Walker, and Clint R. Whaley. 1997. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarDigital Library
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, and Jack Dongarra. 2011. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing Workshops (IPDPSW’11). IEEE Computer Society, Washington, DC, 1432--1441. Google ScholarDigital Library
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim Yarkhan, and Jack J. Dongarra. 2010a. Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA. Technical Report. Innovative Computing Laboratory, University of Tennessee. Retrieved from http://icl.cs.utk.edu/news_pub/submissions/ut-cs-10-660.pdf.Google Scholar
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim Yarkhan, and Jack J. Dongarra. 2010b. Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project. Technical Report 232. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn232.pdf UT-CS-10-660.Google Scholar
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. 2012. DAGuE: A generic distributed DAG engine for high-performance computing. Parallel Comput. 38, 1--2 (2012), 37--51. Google ScholarDigital Library
Henricus Bouwmeester and Julien Langou. 2010. A critical path approach to analyzing parallelism of algorithmic variants. Application to Cholesky inversion. Retrieved from https://arxiv.org/abs/1010.2000.Google Scholar
Alfredo Buttari, Jack Dongarra, Julie Langou, Julien Langou, Piotr Luszczek, and Jakub Kurzak. 2007. Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21, 4 (2007), 457--466. Google ScholarDigital Library
Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2008. Parallel tiled QR factorization for multicore architectures. Concurr. Comput.: Pract. Exper. 20, 13 (2008), 1573--1590. Google ScholarDigital Library
Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 1 (2009), 38--53. Google ScholarDigital Library
Anthony Castaldo and Clint Whaley. 2010. Scaling LAPACK panel operations using parallel cache assignment. In ACM SIGPLAN Notices, Vol. 45. 223--232. Google ScholarDigital Library
James W. Demmel, Laura Grigori, Mark F. Hoemmen, and Julien Langou. 2008. Communication-optimal Parallel and Sequential QR and LU Factorizations. Technical Report 204. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn204.pdf.Google Scholar
Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki. 2015. A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exper. 27, 5 (2015), 1292--1309. Google ScholarDigital Library
Jack Dongarra, Mathieu Faverge, Thomas Hérault, Mathias Jacquelin, Julien Langou, and Yves Robert. 2013. Hierarchical QR factorization algorithms for multi-core clusters. Parallel Comput. 39, 4--5 (2013), 212--232. Google ScholarDigital Library
Jack Dongarra, Mathieu Faverge, Hatem Ltaief, and Piotr Luszczek. 2014. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exper. 26, 7 (2014), 1408--1431. Google ScholarDigital Library
Jack J. Dongarra, J. Du Croz, Iain S. Duff, and Sven J. Hammarling. 1990a. Algorithm 679: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 1--17. Google ScholarDigital Library
Jack J. Dongarra, J. Du Croz, Iain S. Duff, and Sven J. Hammarling. 1990b. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 18--28. Google ScholarDigital Library
Jack J. Dongarra, J. Du Croz, Sven J. Hammarling, and R. Hanson. 1988a. Algorithm 656: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1988), 18--32. Google ScholarDigital Library
Jack J. Dongarra, J. Du Croz, Sven J. Hammarling, and R. Hanson. 1988b. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1988), 1--17. Google ScholarDigital Library
Mathieu Faverge, Julien Langou, Yves Robert, and Jack Dongarra. 2016. Bidiagonalization with Parallel Tiled Algorithms. Retrieved from https://arxiv.org/abs/1611.06892.Google Scholar
Fred Gustavson, Lars Karlsson, and Bo Kågström. 2012. Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. 38, 3 (2012), article No. 17. Google ScholarDigital Library
Azzam Haidar, Heike Jagode, Asim YarKhan, Phil Vaccaro, Stanimire Tomov, and Jack Dongarra. 2017. Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). 1--7.Google ScholarCross Ref
Azzam Haidar, Hatem Ltaief, Asim YarKhan, and Jack Dongarra. 2011. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exper. 24, 3 (2011), 305--321. Google ScholarDigital Library
Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics (SIAM), Philadelphia. Google Scholar
Bo Kågström, Per Ling, and Charles van Loan. 1998. GEMM-based Level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24, 3 (1998), 268--302. Google ScholarDigital Library
Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. 2008. Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19, 9 (2008), 1175--1186. Google ScholarDigital Library
Jakub Kurzak and Jack Dongarra. 2006. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In Proceedings of the International Workshop on Applied Parallel Computing. Springer, 147--156. Google ScholarDigital Library
Jakub Kurzak and Jack Dongarra. 2007. Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurr. Comput.: Pract. Exper. 19, 10 (2007), 1371--1385. Google ScholarDigital Library
Jakub Kurzak and Jack Dongarra. 2009. QR factorization for the cell broadband engine. Sci. Program. 17, 1--2 (2009), 31--42. Google ScholarDigital Library
Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra. 2013. Multithreading in the PLASMA library. In Multicore Computing: Algorithms, Architectures, and Applications, S. Rajasekaran, L. Fiondella, M. Ahmed, R. A. Ammar (Eds.). Chapman and Hall/CRC, 119--141.Google Scholar
Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. 2006. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In Proceedings of the ACM/IEEE SC Conference. IEEE, 50--50. Google ScholarDigital Library
Charles L. Lawson, Richard J. Hanson, David Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. 5 (1979), 308--323. Google ScholarDigital Library
Miroslav Rozložník, Gil Shklarski, and Sivan Toledo. 2011. Partitioned triangular tridiagonalization. ACM Trans. Math. Softw. 37, 4 (2011), 1--16. Google ScholarDigital Library
Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s J. 30, 3 (2005), 202--210.Google Scholar
Ichitaro Yamazaki, Jakub Kurzak, Panruo Wu, Mawussi Zounon, and Jack Dongarra. 2018. Symmetric indefinite linear solver using OpenMP task on multicore architecture. IEEE Trans. Parallel Distrib. Syst. 29, 8 (2018), 1879--1892.Google ScholarCross Ref
Asim YarKhan, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra. 2016. Porting the PLASMA numerical library to the OpenMP standard. Int. J. Parallel Program. 45, 3 (2016), 1--22. Google ScholarDigital Library

Index Terms

PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Shared memory algorithms
2. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance
    2. Solvers

Recommendations

Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

This paper presents the steps followed to GPU-offload parts of the core solver of EFIT-AI, an equilibrium reconstruction code suitable for tokamak experiments and burning plasmas. For this work, we will focus on the fitting procedure that consists of a ...
Read More
STAC-A2 on intel architecture: from scalar code to heterogeneous application
WHPCF '14: Proceedings of the 7th Workshop on High Performance Computational Finance

STAC-A2^™ is compute and memory intensive industry benchmark in the field of market risk analysis. The benchmark specifications were created by the Securities Technology Analysis Center (aka STAC®) and are based on inputs collected from the leading ...
Read More
Explicit Fourth-Order Runge---Kutta Method on Intel Xeon Phi Coprocessor

This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge---Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Mathematical Software Volume 45, Issue 2
June 2019
255 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3326465
Editors:
Zhaojun Bai
University of California at Davis, USA
,
Wolfgang Bangerth
Colorado State University, USA
Issue’s Table of Contents
Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2019
- Accepted: 1 July 2018
- Revised: 1 June 2018
- Received: 1 October 2017
Published in toms Volume 45, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Numerical linear algebra libraries
OpenMP
PLASMA
multicore processors
task-based programming
tile algorithms
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 3,832
  Total Downloads
- Downloads (Last 12 months)628
- Downloads (Last 6 weeks)64
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code

STAC-A2 on intel architecture: from scalar code to heterogeneous application

Explicit Fourth-Order Runge---Kutta Method on Intel Xeon Phi Coprocessor