Abstract
The recent version of the Parallel Linear Algebra Software for Multicore Architectures (PLASMA) library is based on tasks with dependencies from the OpenMP standard. The main functionality of the library is presented. Extensive benchmarks are targeted on three recent multicore and manycore architectures, namely, an Intel Xeon, Intel Xeon Phi, and IBM POWER 8 processors.
- Jan Ole Aasen. 1971. On the reduction of a symmetric matrix to tridiagonal form. BIT Numer. Math. 11, 3 (1971), 233--242.Google ScholarDigital Library
- Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon. 2017b. PLASMA 17 Performance Report. Technical Report 292. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn292.pdf.Google Scholar
- Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon. 2017a. PLASMA 17.1 Functionality Report. Technical Report 293. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn293.pdf.Google Scholar
- Ahmad Abdelfattah, Hartwig Anzt, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki, and Asim YarKhan. 2016. Linear algebra software for large-scale accelerated multicore computing. Acta Numer. 25 (2016), 1--160.Google ScholarCross Ref
- Emmanuel Agullo, Henricus Bouwmeester, Jack Dongarra, Jakub Kurzak, Julien Langou, and Lee Rosenberg. 2010. Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In Proceedings of the International Conference on High Performance Computing for Computational Science. Springer, 129--138. Google ScholarDigital Library
- Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. In Journal of Physics: Conference Series, Vol. 180. IOP Publishing, 012037.Google Scholar
- Edward Anderson, Zhaojun Bai, Christian Bischof, Susan L. Blackford, James W. Demmel, Jack J. Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven J. Hammarling, Alan McKenney, and Danny C. Sorensen. 1999. LAPACK User’s Guide (3rd ed.). Society for Industrial and Applied Mathematics, Philadelphia.Google Scholar
- Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, and Stanimire Tomov. 2009. Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180, 12 (2009), 2526--2533.Google ScholarCross Ref
- Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo, and Ichitaro Yamazaki. 2014. A communication avoiding symmetric indefinite factorization. SIAM J. Matrix Anal. Appl. 35, 4 (2014), 1364--1406.Google ScholarCross Ref
- Pieter Bellens, Josep M. Perez, Rosa M Badia, and Jesus Labarta. 2006. CellSs: A programming model for the Cell BE architecture. In Proceedings of the ACM/IEEE SC Conference. IEEE, 5--5. Google ScholarDigital Library
- Susan L. Blackford, Jaeyoung Choi, Andrew Cleary, Ed D’Azeuedo, James W. Demmel, Inderjit Dhillon, Jack J. Dongarra, Sven J. Hammarling, Greg Henry, Antoine Petitet, Ken Stanley, David W. Walker, and Clint R. Whaley. 1997. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarDigital Library
- George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, and Jack Dongarra. 2011. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing Workshops (IPDPSW’11). IEEE Computer Society, Washington, DC, 1432--1441. Google ScholarDigital Library
- George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim Yarkhan, and Jack J. Dongarra. 2010a. Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA. Technical Report. Innovative Computing Laboratory, University of Tennessee. Retrieved from http://icl.cs.utk.edu/news_pub/submissions/ut-cs-10-660.pdf.Google Scholar
- George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim Yarkhan, and Jack J. Dongarra. 2010b. Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project. Technical Report 232. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn232.pdf UT-CS-10-660.Google Scholar
- George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. 2012. DAGuE: A generic distributed DAG engine for high-performance computing. Parallel Comput. 38, 1--2 (2012), 37--51. Google ScholarDigital Library
- Henricus Bouwmeester and Julien Langou. 2010. A critical path approach to analyzing parallelism of algorithmic variants. Application to Cholesky inversion. Retrieved from https://arxiv.org/abs/1010.2000.Google Scholar
- Alfredo Buttari, Jack Dongarra, Julie Langou, Julien Langou, Piotr Luszczek, and Jakub Kurzak. 2007. Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21, 4 (2007), 457--466. Google ScholarDigital Library
- Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2008. Parallel tiled QR factorization for multicore architectures. Concurr. Comput.: Pract. Exper. 20, 13 (2008), 1573--1590. Google ScholarDigital Library
- Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 1 (2009), 38--53. Google ScholarDigital Library
- Anthony Castaldo and Clint Whaley. 2010. Scaling LAPACK panel operations using parallel cache assignment. In ACM SIGPLAN Notices, Vol. 45. 223--232. Google ScholarDigital Library
- James W. Demmel, Laura Grigori, Mark F. Hoemmen, and Julien Langou. 2008. Communication-optimal Parallel and Sequential QR and LU Factorizations. Technical Report 204. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn204.pdf.Google Scholar
- Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki. 2015. A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exper. 27, 5 (2015), 1292--1309. Google ScholarDigital Library
- Jack Dongarra, Mathieu Faverge, Thomas Hérault, Mathias Jacquelin, Julien Langou, and Yves Robert. 2013. Hierarchical QR factorization algorithms for multi-core clusters. Parallel Comput. 39, 4--5 (2013), 212--232. Google ScholarDigital Library
- Jack Dongarra, Mathieu Faverge, Hatem Ltaief, and Piotr Luszczek. 2014. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exper. 26, 7 (2014), 1408--1431. Google ScholarDigital Library
- Jack J. Dongarra, J. Du Croz, Iain S. Duff, and Sven J. Hammarling. 1990a. Algorithm 679: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 1--17. Google ScholarDigital Library
- Jack J. Dongarra, J. Du Croz, Iain S. Duff, and Sven J. Hammarling. 1990b. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 18--28. Google ScholarDigital Library
- Jack J. Dongarra, J. Du Croz, Sven J. Hammarling, and R. Hanson. 1988a. Algorithm 656: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1988), 18--32. Google ScholarDigital Library
- Jack J. Dongarra, J. Du Croz, Sven J. Hammarling, and R. Hanson. 1988b. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1988), 1--17. Google ScholarDigital Library
- Mathieu Faverge, Julien Langou, Yves Robert, and Jack Dongarra. 2016. Bidiagonalization with Parallel Tiled Algorithms. Retrieved from https://arxiv.org/abs/1611.06892.Google Scholar
- Fred Gustavson, Lars Karlsson, and Bo Kågström. 2012. Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. 38, 3 (2012), article No. 17. Google ScholarDigital Library
- Azzam Haidar, Heike Jagode, Asim YarKhan, Phil Vaccaro, Stanimire Tomov, and Jack Dongarra. 2017. Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). 1--7.Google ScholarCross Ref
- Azzam Haidar, Hatem Ltaief, Asim YarKhan, and Jack Dongarra. 2011. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exper. 24, 3 (2011), 305--321. Google ScholarDigital Library
- Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics (SIAM), Philadelphia. Google Scholar
- Bo Kågström, Per Ling, and Charles van Loan. 1998. GEMM-based Level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24, 3 (1998), 268--302. Google ScholarDigital Library
- Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. 2008. Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19, 9 (2008), 1175--1186. Google ScholarDigital Library
- Jakub Kurzak and Jack Dongarra. 2006. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In Proceedings of the International Workshop on Applied Parallel Computing. Springer, 147--156. Google ScholarDigital Library
- Jakub Kurzak and Jack Dongarra. 2007. Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurr. Comput.: Pract. Exper. 19, 10 (2007), 1371--1385. Google ScholarDigital Library
- Jakub Kurzak and Jack Dongarra. 2009. QR factorization for the cell broadband engine. Sci. Program. 17, 1--2 (2009), 31--42. Google ScholarDigital Library
- Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra. 2013. Multithreading in the PLASMA library. In Multicore Computing: Algorithms, Architectures, and Applications, S. Rajasekaran, L. Fiondella, M. Ahmed, R. A. Ammar (Eds.). Chapman and Hall/CRC, 119--141.Google Scholar
- Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. 2006. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In Proceedings of the ACM/IEEE SC Conference. IEEE, 50--50. Google ScholarDigital Library
- Charles L. Lawson, Richard J. Hanson, David Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. 5 (1979), 308--323. Google ScholarDigital Library
- Miroslav Rozložník, Gil Shklarski, and Sivan Toledo. 2011. Partitioned triangular tridiagonalization. ACM Trans. Math. Softw. 37, 4 (2011), 1--16. Google ScholarDigital Library
- Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s J. 30, 3 (2005), 202--210.Google Scholar
- Ichitaro Yamazaki, Jakub Kurzak, Panruo Wu, Mawussi Zounon, and Jack Dongarra. 2018. Symmetric indefinite linear solver using OpenMP task on multicore architecture. IEEE Trans. Parallel Distrib. Syst. 29, 8 (2018), 1879--1892.Google ScholarCross Ref
- Asim YarKhan, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra. 2016. Porting the PLASMA numerical library to the OpenMP standard. Int. J. Parallel Program. 45, 3 (2016), 1--22. Google ScholarDigital Library
Index Terms
- PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP
Recommendations
Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisThis paper presents the steps followed to GPU-offload parts of the core solver of EFIT-AI, an equilibrium reconstruction code suitable for tokamak experiments and burning plasmas. For this work, we will focus on the fitting procedure that consists of a ...
STAC-A2 on intel architecture: from scalar code to heterogeneous application
WHPCF '14: Proceedings of the 7th Workshop on High Performance Computational FinanceSTAC-A2™ is compute and memory intensive industry benchmark in the field of market risk analysis. The benchmark specifications were created by the Securities Technology Analysis Center (aka STAC®) and are based on inputs collected from the leading ...
Explicit Fourth-Order Runge---Kutta Method on Intel Xeon Phi Coprocessor
This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge---Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this ...
Comments