List of publications

42 results found

Search by title or abstract

Search by author

Select year

Filter by type

 
2025 metadata only access

Communication-reduced Conjugate Gradient Variants for GPU-accelerated Clusters

Linear solvers are key components in any software platform for scientific and engineering computing. The solution of large and sparse linear systems lies at the core of physics-driven numerical simulations relying on partial differential equations (PDEs) and often represents a significant bottleneck in data-driven procedures, such as scientific machine learning. In this paper, we present an efficient implementation of the preconditioned s-step Conjugate Gradient (CG) method, originally proposed by Chronopoulos and Gear in 1989, for large clusters of Nvidia GPU-accelerated computing nodes. The method, often referred to as communication-reduced or communication-avoiding CG, reduces global synchronizations and data communication steps compared to the standard approach, enhancing strong and weak scalability on parallel computers. Our main contribution is the design of a parallel solver that fully exploits the aggregation of low-granularity operations inherent to the s-step CG method to leverage the high throughput of GPU accelerators. Additionally, it applies overlap between data communication and computation in the multi-GPU sparse matrix-vector product. Experiments on classic benchmark datasets, derived from the discretization of the Poisson PDE, demonstrate the potential of the method.

communication-reduced algorithms GPUs linear solvers s-step preconditioned Krylov methods
2024 Articolo in rivista open access

Why diffusion-based preconditioning of Richards equation works: spectral analysis and computational experiments at very large scale.

We consider here a cell-centered finite difference approximation of the Richards equation in three dimensions, averaging for interface values the hydraulic conductivity, a highly nonlinear function, by arithmetic, upstream and harmonic means. The nonlinearities in the equation can lead to changes in soil conductivity over several orders of magnitude and discretizations with respect to space variables often produce stiff systems of differential equations. A fully implicit time discretization is provided by backward Euler one-step formula; the resulting nonlinear algebraic system is solved by an inexact Newton Armijo-Goldstein algorithm, requiring the solution of a sequence of linear systems involving Jacobian matrices. We prove some new results concerning the distribution of the Jacobians eigenvalues and the explicit expression of their entries. Moreover, we explore some connections between the saturation of the soil and the ill conditioning of the Jacobians. The information on eigenvalues justifies the effectiveness of some preconditioner approaches which are widely used in the solution of Richards equation. We also propose a new software framework to experiment with scalable and robust preconditioners suitable for efficient parallel simulations at very large scales. Performance results on a literature test case show that our framework is very promising in the advance toward realistic simulations at extreme scale.

algebraic multigrid spectral analysis Richards equation high performance computing
2024 Contributo in Atti di convegno restricted access

The TEXTAROSSA Project: Cool all the Way Down to the Hardware

Filgueras, Antonio ; Agosta, Giovanni ; Aldinucci, Marco ; Álvarez, Carlos ; D'Ambra, Pasqua ; Bernaschi, Massimo ; Biagioni, Andrea ; Cattaneo, Daniele ; Celestini, Alessandro ; Celino, Massimo ; Chiarini, Carlotta ; Cicero, Francesca Lo ; Cretaro, Paolo ; Fornaciari, William ; Frezza, Ottorino ; Galimberti, Andrea ; Giacomini, Francesco ; de Haro Ruiz, Juan Miguel ; Iannone, Francesco ; Jaschke, Daniel ; Jiménez-González, Daniel ; Kulczewski, Michal ; Leva, Alberto ; Lonardo, Alessandro ; Martinelli, Michele ; Martorell, Xavier ; Montangero, Simone ; Morais, Lucas ; Oleksiak, Ariel ; Palazzari, Paolo ; Pontisso, Luca ; Reghenzani, Federico ; Rossi, Cristian ; Saponarat, Sergio ; Lodi, Carlo Saverio ; Simula, Francesco ; Terraneo, Federico ; Vicini, Piero ; Vidal, Miguel ; Zoni, Davide ; Zummo, Giuseppe

The TEXTAROSSA project aims to bridge the technology gaps that exascale computing systems will face in the near future in order to overcome their performance and energy efficiency challenges. This project provides solutions for improved energy efficiency and thermal control, seamless integration of heterogeneous accelerators in HPC multi-node platforms, and new arithmetic methods. Challenges are tacked through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models, and tools derived from European research.

High-performance computing heterogeneous computing GPU
2024 Articolo in rivista open access

Alya toward exascale: algorithmic scalability using PSCToolkit

In this paper, we describe an upgrade of the Alya code with up-to-date parallel linear solvers capable of achieving reliability, efficiency and scalability in the computation of the pressure field at each time step of the numerical procedure for solving a Large Eddy Simulation formulation of the incompressible Navier–Stokes equations. We developed a software module in the Alya’s kernel to interface the libraries included in the current version of PSCToolkit, a framework for the iterative solution of sparse linear systems, on parallel distributed-memory computers, by Krylov methods coupled to Algebraic MultiGrid preconditioners. The Toolkit has undergone various extensions within the EoCoE-II project with the primary goal of facing the exascale challenge. Results on a realistic benchmark for airflow simulations in wind farm applications show that the PSCToolkit solvers significantly outperform the original versions of the Conjugate Gradient method available in the Alya’s kernel in terms of scalability and parallel efficiency and represent a very promising software layer to move the Alya code toward exascale.

65F08 65F10 65M55 65Y05 65Z05 Algebraic MultiGrid Iterative linear solvers Navier–Stokes equations Parallel scalability
2023 Articolo in rivista open access

Parallel Sparse Computation Toolkit

P D'Ambra ; F Durastante ; S Filippone

This paper presents a new software framework for solving large and sparse linear systems on current hybrid architectures, from small servers to high-end supercomputers, embedding multi-core CPUs and Nvidia GPUs at the node level. The framework has a modular structure and is composed of three main components, which separate basic functionalities for managing distributed sparse matrices and executing some sparse matrix computations involved in iterative Krylov projection methods, eventually exploiting multi-threading and CUDA-based programming models, from the functionalities for setup and application of different types of one-level and multi-level algebraic preconditioners.

Linear solvers Algebraic preconditioners HPC
2023 Articolo in rivista restricted access

A multi-GPU aggregation-based AMG preconditioner for iterative linear solvers

We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting Nvidia Graphics Processing Unit (GPU) accelerators. The work extends previous efforts of some of the authors in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the GPU kernels. Strong and weak scalability results of the new version of the library on well-known benchmark test cases are discussed. Comparisons with the Nvidia AmgX solution show a speedup, in the solve phase, up to 2.0x.

GPU accelerators heterogeneous computing iterative sparse linear solvers parallel numerical algorithms scalability
2023 Articolo in rivista open access

Automatic coarsening in Algebraic Multigrid utilizing quality measures for matching-based aggregations

P D'Ambra ; F Durastante ; S Filippone ; L Zikatanov

In this paper, we discuss the convergence of an Algebraic MultiGrid (AMG) method for general symmetric positive-definite matrices. The method relies on an aggregation algorithm, named coarsening based on compatible weighted matching, which exploits the interplay between the principle of compatible relaxation and the maximum product matching in undirected weighted graphs. The results are based on a general convergence analysis theory applied to the class of AMG methods employing unsmoothed aggregation and identifying a quality measure for the coarsening; similar quality measures were originally introduced and applied to other methods as tools to obtain good quality aggregates leading to optimal convergence for M-matrices. The analysis, as well as the coarsening procedure, is purely algebraic and, in our case, allows an a posteriori evaluation of the quality of the aggregation procedure which we apply to analyze the impact of approximate algorithms for matching computation and the definition of graph edge weights. We also explore the connection between the choice of the aggregates and the compatible relaxation convergence, confirming the consistency between theories for designing coarsening procedures in purely algebraic multigrid methods and the effectiveness of the coarsening based on compatible weighted matching. We discuss various completely automatic algorithmic approaches to obtain aggregates for which good convergence properties are achieved on various test cases.

AMG; Convergence Compatible relaxation Aggregation; Graph matching;
2023 Articolo in rivista open access

Extending bootstrap AMG for clustering of attributed graphs

P D'Ambra ; PS Vassilevski ; L Cutillo

In this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as proposed in [1, 2]. The augmented graph is then embedded in a Euclidean space associated to its Laplacian and we cluster vertices via a modified K-means algorithm, using a new vector-valued distance in the embedding space. Main novelty of our method, which can be classified as an early fusion method, i.e., a method in which additional information on vertices are fused to the structure information before applying clustering, is the interpretation of attributes as new realizations of graph vertices, which can be dealt with as coordinate vectors in a related Euclidean space. This allows us to extend a scalable generalized spectral clustering procedure which substitutes graph Laplacian eigenvectors with some vectors, named algebraically smooth vectors, obtained by a linear-time complexity Algebraic MultiGrid (AMG) method. We discuss the performance of our proposed clustering method by comparison with recent literature approaches and public available results. Extensive experiments on different types of synthetic datasets and real-world attributed graphs show that our new algorithm, embedding attributes information in the clustering, outperforms structure-only-based methods, when the attributed network has an ambiguous structure. Furthermore, our new method largely outperforms the method which originally proposed the graph augmentation, showing that our embedding strategy and vector-valued distance are very effective in taking advantages from the augmented-graph representation.

Attributed graphs clustering graph augmentation bootstrap AMG
2023 Contributo in Atti di convegno restricted access

AMG Preconditioners based on Parallel Hybrid Coarsening and Multi-objective Graph Matching

Pasqua D'Ambra ; Fabio Durastante ; S M Ferdous ; Salvatore Filippone ; Mahantesh Halappanavar ; Alex Pothen

We describe preliminary results from a multiobjectivegraph matching algorithm, in the coarsening step of anaggregation-based Algebraic MultiGrid (AMG) preconditioner,for solving large and sparse linear systems of equations on highendparallel computers. We have two objectives. First, we wishto improve the convergence behavior of the AMG method whenapplied to highly anisotropic problems. Second, we wish to extendthe parallel package PSCToolkit to exploit multi-threadedparallelism at the node level on multi-core processors. Ourmatching proposal balances the need to simultaneously computehigh weights and large cardinalities by a new formulation ofthe weighted matching problem combining both these objectivesusing a parameter ?. We compute the matching by a parallel2/3 - ?-approximation algorithm for maximum weight matchings.Results with the new matching algorithm show that for a suitablechoice of the parameter ? we compute effective preconditionersin the presence of anisotropy, i.e., smaller solve times, setup times,iterations counts, and operator complexity.

Sparse solvers AMG Sparse solvers AMG Matching MPI OpenMP Scalability
2022 Working paper metadata only access

Alya towards Exascale: Algorithmic Scalability using PSCToolkit

H Owen ; O Lehmkuhl ; P D'Ambra ; F Durastante ; S Filippone

In this paper, we describe some work aimed at upgrading the Alya code with up-to-date parallel linear solvers capable of achieving reliability, efficiency, and scalability in the computation of the pressure field at each time step of the numerical procedure for solving an LES formulation of the incompressible Navier-Stokes equations. We developed a software module in Alya's kernel to interface the libraries included in the current version of PSCToolkit, a framework for the iterative solution of sparse linear systems on parallel distributed-memory computers by Krylov methods coupled to Algebraic MultiGrid preconditioners. The Toolkit has undergone some extensions within the EoCoE-II project with the primary goal to face the exascale challenge. Results on a realistic benchmark for airflow simulations in wind farm applications show that the PSCToolkit solvers significantly outperform the original versions of the Conjugate Gradient method available in the Alya kernel in terms of scalability and parallel efficiency and represent a very promising software layer to move the Alya code towards exascale.

Navier-Stokes equations iterative linear solvers algebraic multigrid parallel scalability
2022 Working paper metadata only access

Why diffusion-based preconditioning of Richards equation works: spectral analysis and computational experiments at very large scale.

Bertaccini D ; D'Ambra P ; Durastante F ; Filippone S

We consider here a cell-centered finite difference approximation of the Richards equation in three dimensions, averaging for interface values the hydraulic conductivity, a highly nonlinear function, by arithmetic, upstream and harmonic means. The nonlinearities in the equation can lead to changes in soil conductivity over several orders of magnitude and discretizations with respect to space variables often produce stiff systems of differential equations. A fully implicit time discretization is provided by backward Euler one-step formula; the resulting nonlinear algebraic system is solved by an inexact Newton Armijo-Goldstein algorithm, requiring the solution of a sequence of linear systems involving Jacobian matrices. We prove some new results concerning the distribution of the Jacobians eigenvalues and the explicit expression of their entries. Moreover, we explore some connections between the saturation of the soil and the ill conditioning of the Jacobians. The information on eigenvalues justifies the effectiveness of some preconditioner approaches which are widely used in the solution of Richards equation. We also propose a new software framework to experiment with scalable and robust preconditioners suitable for efficient parallel simulations at very large scales. Performance results on a literature test case show that our framework is very promising in the advance towards realistic simulations at extreme scale.

Richards equation Parallel scalability spectral analy AMG preconditioners
2022 Articolo in rivista open access

Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approach

In the near future, Exascale systems will need to bridge three technology gaps to achieve high performance while remaining under tight power constraints: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetic; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA addresses these gaps through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models, and tools derived from European research.

HPC Scientific Computing Software
2022 metadata only access

Network Clustering by Embedding of Attribute-augmented Graphs

D'Ambra P ; Vassilevski P ; Cutillo L

In this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. The aim is to group vertices which are similar not only in terms of structural connectivity but also in terms of attribute values. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as proposed in [6, 38]. The augmented graph is then embedded in a Euclidean space associated to its Laplacian where a modified K-means algorithm is applied to identify clusters. The modified K-means relies on a vector distance measure where to each original vertex we assign a suitable vector-valued set of coordinates depending on both structural connectivity and attribute similarities, so that each original graph vertex is thought as representative of m + 1 vertices of the augmented graph, if m is the number of vertex attributes. To define the coordinate vectors we employ our recently proposed algorithm based on an adaptive AMG (Algebraic MultiGrid) method, which identifies the coordinate directions in the embedding Euclidean space in terms of algebraically smooth vectors with respect to the augmented graph Laplacian, and thus extending our previous result for graphs without attributes. We analyze the effectiveness of our proposed clustering method by comparison with some well known methods, whose software implementation is freely available, and also with results reported in the literature, on two different types of widely used synthetic graphs and on some real-world attributed graphs.

graph embedding attributed networks spectral clustering algebraically smooth vectors AMG
2022 Abstract in Atti di convegno metadata only access

BootCMatchGX: a scalable iterative linear solver for multi-GPU systems

HPC
2022 Abstract in Atti di convegno metadata only access

AMG Preconditioners for Computational and Data Science at Extreme Scale

Sparse solvers AMG HPC
2022 Working paper metadata only access

Automatic coarsening in Algebraic Multigrid utilizing quality measures for matching-based aggregations Pasqua D'Ambra, Fabio Durastante, Salvatore Filippone, Ludmil Zikatanov

D'Ambra P ; Durastante F ; Filippone S ; L Zikatanov

In this paper, we discuss the convergence of an Algebraic MultiGrid (AMG) method for general symmetric positive-definite matrices. The method relies on an aggregation algorithm, named coarsening based on compatible weighted matching, which exploits the interplay between the principle of compatible relaxation and the maximum product matching in undirected weighted graphs. The results are based on a general convergence analysis theory applied to the class of AMG methods employing unsmoothed aggregation and identifying a quality measure for the coarsening; similar quality measures were originally introduced and applied to other methods as tools to obtain good quality aggregates leading to optimal convergence for M-matrices. The analysis, as well as the coarsening procedure, is purely algebraic and, in our case, allows an a posteriori evaluation of the quality of the aggregation procedure which we apply to analyze the impact of approximate algorithms for matching computation and the definition of graph edge weights. We also explore the connection between the choice of the aggregates and the compatible relaxation convergence, confirming the consistency between theories for designing coarsening procedures in purely algebraic multigrid methods and the effectiveness of the coarsening based on compatible weighted matching. We discuss various completely automatic algorithmic approaches to obtain aggregates for which good convergence properties are achieved on various test cases.

AMG graph matching aggregation compatible relaxation
2021 Contributo in Atti di convegno metadata only access

AMG4PSBLAS Linear Algebra Package brings Alya one step closer to Exascale

H Owen ; G Houzeaux ; F Durastante ; S Filippone ; P D'Ambra

In this work, we interfaced to the Alya code the development version of a software framework for efficient and reliable solution of the sparse linear systems for computation of the pressure field at each time step. We developed a software module in Alya's kernel to interface the current development version of the PSBLAS package (Parallel Sparse Basic Linear Algebra Subroutines) and the sibling package AMG4PSBLAS. PSBLAS implements parallel basic linear algebra operations and support routines for sparse matrix management tailored for iterative sparse linear solvers on parallel distributedmemory computers, supporting heterogeneity at the node level. It has gone under extension within the EoCoE-II project with the primary goal to face the exascale challenge. AMG4PSBLAS is a package of Algebraic MultiGrid (AMG) preconditioners built on the top of PSBLAS, which inherits all the flexibility and efficiency features of the PSBLAS infrastructure, and implements up-to-date AMG preconditioners exploiting aggregation of unknowns for the setup of the AMG hierarchy. Many preconditioners employing different aggregation schemes, AMG cycles, and parallel smoothers are available and were tested within the simulation carried out with the Alya code. Results show that the new solvers vastly outperform the original Deflated Conjugate Gradient method available in the Alya kernel in terms of scalability and parallel efficiency and represent a very promising software layer to move the Alya code towards exascale.

CFD HPC Scalable linear solvers
2021 Contributo in Atti di convegno metadata only access

Scalable AMG Preconditioners for Computational Science at Extreme Scale

The challenge of exascale requires rethinking numerical algorithms and mathematical software for efficient exploitation of heterogeneous massively parallel supercomputers. In this talk, we present some activities aimed at developing highly scalable and robust sparse linear solvers for solving scientific and engineering applications with a huge number of degrees of freedom (dof)[1]. We discuss algorithmic advances and implementation aspects in the design of Algebraic MultiGrid (AMG) preconditioners based on aggregation, to be used in conjunction with Krylov-subspace projection methods, suitable to exploit high levels of parallelism of current petascale supercomputers. These activities are carried on within two ongoing European Projects, the Energy-oriented Center of Excellence (EoCoE-II) and the EuroHPC TEXTAROSSA project, having the final aim to provide methods and tools for preparing scientific applications in facing and successfully grasping the near future exascale challenge. Beyond possible advances in base software technology to make available programming environments that tend to hide the details of the hardware, we still need to rethink and redesign numerical methods and applications, especially for irregular computations and memory-bound kernels, like sparse solvers.

Parallel Scalability Numerical Linear Algebra
2021 Articolo in rivista restricted access

AMG Preconditioners for Linear Solvers towards Extreme Scale

Linear solvers for large and sparse systems are a key element of scientific applications, and their efficient implementation is necessary to harness the computational power of current computers. Algebraic Multigrid (AMG) Preconditioners are a popular ingredient of such linear solvers; this is the motivation for the present work where we examine some recent developments in a package of AMG preconditioners to improve efficiency, scalability and robustness on extreme scale problems. The main novelty is the design and implementation of a new parallel coarsening algorithm based on aggregation of unknowns employing weighted graph matching techniques; this is a completely automated procedure, requiring no information from the user, and applicable to general symmetric positive definite (s.p.d.) matrices. The new coarsening algorithm improves in terms of numerical scalability at low operator complexity over decoupled aggregation algorithms available in previous releases of the package. The preconditioners package is built on the parallel software framework PSBLAS, which has also been updated to progress towards exascale. We present weak scalability results on two of the most powerful supercomputers in Europe, for linear systems with sizes up to O(10^{10}) unknowns.

algebraic multigrid preconditioners parallel scalability
2020 Articolo in rivista open access

AMG based on compatible weighted matching on GPUs

GPU version of AMG preconditioner

AMG GPU