![]() ![]()
Our implementation is available at and also in the Trilinos framework ( ). We designed hierarchical, thread-scalable spgemm algorithms and implement them using the Kokkos programming model. We designed two thread-scalable data structures (multilevel hashmap accumulators and a memory pool) to achieve scalability on various platforms, and a graph compression technique to speedup the symbolic factorization of spgemm. Our contributions in is summarized below. It demonstrated better performance on gpus and the current generation of XeonPhi processors, Knights Landing ( knls), w.r.t. It addressed the issue of performance-portability for spgemm with an algorithm called kkmem. #MATRIX MULTIPLICATION SYMBOLIC CALCULATOR CODE#The code divergence in the implementation is limited to access strategies of different data structures and how different levels of parallelism in the algorithm are mapped to computational units.Īn earlier version of this paper focused on spgemm from the perspective of performance-portability. The algorithms in this paper aim to minimize revisiting algorithmic design for these different architectures. For example, traditional cpus have powerful cores with large caches, while XeonPhi processors have many lightweight cores, and GPUs provide extensive hierarchical parallelism with very simple computational units. #MATRIX MULTIPLICATION SYMBOLIC CALCULATOR PORTABLE#In this work, we provide portable algorithms for the spgemm kernel and their implementations using Kokkos programming model with minimal changes for the architectures’ very different characteristics. There are optimized kernels available on different architectures ,, ,, providing us with good comparison points. The kernel has been studied extensively in the contexts of sequential , shared memory parallel , and gpu ,, , implementations. ![]() spgemm is a fundamental kernel that is used in various applications such as graph analytics and scientific computing, especially in the setup phase of multigrid solvers . We develop multithreaded algorithms for sparse matrix-matrix multiply ( spgemm) kernels in this work. Such an environment increases the importance of designing flexible algorithms for performance-critical kernels and implementations that can run well on various platforms. Modern supercomputer architectures are following various different paths, e.g., Intel’s XeonPhi processors, NVIDIA’s Graphic Processing Units ( gpus) or the Emu systems . ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |