skip to main content
research-article
Open Access

PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP

Authors Info & Claims
Published:03 May 2019Publication History
Skip Abstract Section

Abstract

The recent version of the Parallel Linear Algebra Software for Multicore Architectures (PLASMA) library is based on tasks with dependencies from the OpenMP standard. The main functionality of the library is presented. Extensive benchmarks are targeted on three recent multicore and manycore architectures, namely, an Intel Xeon, Intel Xeon Phi, and IBM POWER 8 processors.

References

  1. Jan Ole Aasen. 1971. On the reduction of a symmetric matrix to tridiagonal form. BIT Numer. Math. 11, 3 (1971), 233--242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon. 2017b. PLASMA 17 Performance Report. Technical Report 292. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn292.pdf.Google ScholarGoogle Scholar
  3. Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon. 2017a. PLASMA 17.1 Functionality Report. Technical Report 293. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn293.pdf.Google ScholarGoogle Scholar
  4. Ahmad Abdelfattah, Hartwig Anzt, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki, and Asim YarKhan. 2016. Linear algebra software for large-scale accelerated multicore computing. Acta Numer. 25 (2016), 1--160.Google ScholarGoogle ScholarCross RefCross Ref
  5. Emmanuel Agullo, Henricus Bouwmeester, Jack Dongarra, Jakub Kurzak, Julien Langou, and Lee Rosenberg. 2010. Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In Proceedings of the International Conference on High Performance Computing for Computational Science. Springer, 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. In Journal of Physics: Conference Series, Vol. 180. IOP Publishing, 012037.Google ScholarGoogle Scholar
  7. Edward Anderson, Zhaojun Bai, Christian Bischof, Susan L. Blackford, James W. Demmel, Jack J. Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven J. Hammarling, Alan McKenney, and Danny C. Sorensen. 1999. LAPACK User’s Guide (3rd ed.). Society for Industrial and Applied Mathematics, Philadelphia.Google ScholarGoogle Scholar
  8. Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, and Stanimire Tomov. 2009. Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180, 12 (2009), 2526--2533.Google ScholarGoogle ScholarCross RefCross Ref
  9. Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo, and Ichitaro Yamazaki. 2014. A communication avoiding symmetric indefinite factorization. SIAM J. Matrix Anal. Appl. 35, 4 (2014), 1364--1406.Google ScholarGoogle ScholarCross RefCross Ref
  10. Pieter Bellens, Josep M. Perez, Rosa M Badia, and Jesus Labarta. 2006. CellSs: A programming model for the Cell BE architecture. In Proceedings of the ACM/IEEE SC Conference. IEEE, 5--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Susan L. Blackford, Jaeyoung Choi, Andrew Cleary, Ed D’Azeuedo, James W. Demmel, Inderjit Dhillon, Jack J. Dongarra, Sven J. Hammarling, Greg Henry, Antoine Petitet, Ken Stanley, David W. Walker, and Clint R. Whaley. 1997. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, and Jack Dongarra. 2011. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing Workshops (IPDPSW’11). IEEE Computer Society, Washington, DC, 1432--1441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim Yarkhan, and Jack J. Dongarra. 2010a. Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA. Technical Report. Innovative Computing Laboratory, University of Tennessee. Retrieved from http://icl.cs.utk.edu/news_pub/submissions/ut-cs-10-660.pdf.Google ScholarGoogle Scholar
  14. George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim Yarkhan, and Jack J. Dongarra. 2010b. Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project. Technical Report 232. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn232.pdf UT-CS-10-660.Google ScholarGoogle Scholar
  15. George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. 2012. DAGuE: A generic distributed DAG engine for high-performance computing. Parallel Comput. 38, 1--2 (2012), 37--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Henricus Bouwmeester and Julien Langou. 2010. A critical path approach to analyzing parallelism of algorithmic variants. Application to Cholesky inversion. Retrieved from https://arxiv.org/abs/1010.2000.Google ScholarGoogle Scholar
  17. Alfredo Buttari, Jack Dongarra, Julie Langou, Julien Langou, Piotr Luszczek, and Jakub Kurzak. 2007. Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21, 4 (2007), 457--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2008. Parallel tiled QR factorization for multicore architectures. Concurr. Comput.: Pract. Exper. 20, 13 (2008), 1573--1590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 1 (2009), 38--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anthony Castaldo and Clint Whaley. 2010. Scaling LAPACK panel operations using parallel cache assignment. In ACM SIGPLAN Notices, Vol. 45. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. James W. Demmel, Laura Grigori, Mark F. Hoemmen, and Julien Langou. 2008. Communication-optimal Parallel and Sequential QR and LU Factorizations. Technical Report 204. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn204.pdf.Google ScholarGoogle Scholar
  22. Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki. 2015. A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exper. 27, 5 (2015), 1292--1309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jack Dongarra, Mathieu Faverge, Thomas Hérault, Mathias Jacquelin, Julien Langou, and Yves Robert. 2013. Hierarchical QR factorization algorithms for multi-core clusters. Parallel Comput. 39, 4--5 (2013), 212--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jack Dongarra, Mathieu Faverge, Hatem Ltaief, and Piotr Luszczek. 2014. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exper. 26, 7 (2014), 1408--1431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jack J. Dongarra, J. Du Croz, Iain S. Duff, and Sven J. Hammarling. 1990a. Algorithm 679: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jack J. Dongarra, J. Du Croz, Iain S. Duff, and Sven J. Hammarling. 1990b. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 18--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jack J. Dongarra, J. Du Croz, Sven J. Hammarling, and R. Hanson. 1988a. Algorithm 656: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1988), 18--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jack J. Dongarra, J. Du Croz, Sven J. Hammarling, and R. Hanson. 1988b. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1988), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mathieu Faverge, Julien Langou, Yves Robert, and Jack Dongarra. 2016. Bidiagonalization with Parallel Tiled Algorithms. Retrieved from https://arxiv.org/abs/1611.06892.Google ScholarGoogle Scholar
  30. Fred Gustavson, Lars Karlsson, and Bo Kågström. 2012. Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. 38, 3 (2012), article No. 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Azzam Haidar, Heike Jagode, Asim YarKhan, Phil Vaccaro, Stanimire Tomov, and Jack Dongarra. 2017. Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  32. Azzam Haidar, Hatem Ltaief, Asim YarKhan, and Jack Dongarra. 2011. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exper. 24, 3 (2011), 305--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics (SIAM), Philadelphia. Google ScholarGoogle Scholar
  34. Bo Kågström, Per Ling, and Charles van Loan. 1998. GEMM-based Level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24, 3 (1998), 268--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. 2008. Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19, 9 (2008), 1175--1186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jakub Kurzak and Jack Dongarra. 2006. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In Proceedings of the International Workshop on Applied Parallel Computing. Springer, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jakub Kurzak and Jack Dongarra. 2007. Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurr. Comput.: Pract. Exper. 19, 10 (2007), 1371--1385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jakub Kurzak and Jack Dongarra. 2009. QR factorization for the cell broadband engine. Sci. Program. 17, 1--2 (2009), 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra. 2013. Multithreading in the PLASMA library. In Multicore Computing: Algorithms, Architectures, and Applications, S. Rajasekaran, L. Fiondella, M. Ahmed, R. A. Ammar (Eds.). Chapman and Hall/CRC, 119--141.Google ScholarGoogle Scholar
  40. Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. 2006. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In Proceedings of the ACM/IEEE SC Conference. IEEE, 50--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Charles L. Lawson, Richard J. Hanson, David Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. 5 (1979), 308--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Miroslav Rozložník, Gil Shklarski, and Sivan Toledo. 2011. Partitioned triangular tridiagonalization. ACM Trans. Math. Softw. 37, 4 (2011), 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s J. 30, 3 (2005), 202--210.Google ScholarGoogle Scholar
  44. Ichitaro Yamazaki, Jakub Kurzak, Panruo Wu, Mawussi Zounon, and Jack Dongarra. 2018. Symmetric indefinite linear solver using OpenMP task on multicore architecture. IEEE Trans. Parallel Distrib. Syst. 29, 8 (2018), 1879--1892.Google ScholarGoogle ScholarCross RefCross Ref
  45. Asim YarKhan, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra. 2016. Porting the PLASMA numerical library to the OpenMP standard. Int. J. Parallel Program. 45, 3 (2016), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format