AMD Radeon Open Compute ROCm 3.7 released

Drivers 2882 Published 2020-08-21 06:01 by Philipp Esselbach

News

AMD has released a new version of the Radeon Open Compute Linux stack for AMD graphics cards.

ROCm is designed to be a universal platform for gpu-accelerated computing. This modular design allows hardware vendors to build drivers that support the ROCm framework. ROCm is also designed to integrate multiple programming languages and makes it easy to add support for other languages.

Note: You can also clone the source code for individual ROCm components from the GitHub repositories.

ROCm Components
The following components for the ROCm platform are released and available for the v2.10 release:
• Drivers
• Tools
• Libraries
• Source Code

You can access the latest supported version of drivers, tools, libraries, and source code for the ROCm platform at the following location: https://github.com/RadeonOpenCompute/ROCm

Supported Operating Systems

The AMD ROCm v3.7.x platform is designed to support the following operating systems:

Ubuntu 20.04 and 18.04.4 (Kernel 5.3)

CentOS 7.8 & RHEL 7.8 (Kernel 3.10.0-1127) (Using devtoolset-7 runtime support)

CentOS 8.2 & RHEL 8.2 (Kernel 4.18.0 ) (devtoolset is not required)

SLES 15 SP1

Whats New in This Release

ROCm COMMUNICATIONS COLLECTIVE LIBRARY

Compatibility with NVIDIA Communications Collective Library v2.7 API

ROCm Communications Collective Library (RCCL) is now compatible with the NVIDIA Communications Collective Library (NCCL) v2.7 API.

RCCL (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and all-to-all. There is also initial support for direct GPU-to-GPU send and receive operations. It has been optimized to achieve high bandwidth on platforms using PCIe, xGMI as well as networking using InfiniBand Verbs or TCP/IP sockets. RCCL supports an arbitrary number of GPUs installed in a single node or multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.

The collective operations are implemented using ring and tree algorithms and have been optimized for throughput and latency. For best performance, small operations can be either batched into larger operations or aggregated through the API.

For more information about RCCL APIs and compatibility with NCCL v2.7, see https://rccl.readthedocs.io/en/develop/index.html

Singular Value Decomposition of Bi-diagonal Matrices

Rocsolver_bdsqr now computes the Singular Value Decomposition (SVD) of bi-diagonal matrices. It is an auxiliary function for the SVD of general matrices (function rocsolver_gesvd).

BDSQR computes the singular value decomposition (SVD) of a n-by-n bidiagonal matrix B.

The SVD of B has the following form:

B = Ub * S * Vb' where • S is the n-by-n diagonal matrix of singular values of B • the columns of Ub are the left singular vectors of B • the columns of Vb are its right singular vectors

The computation of the singular vectors is optional; this function accepts input matrices U (of size nu-by-n) and V (of size n-by-nv) that are overwritten with U*Ub and Vb’*V. If nu = 0 no left vectors are computed; if nv = 0 no right vectors are computed.

Optionally, this function can also compute Ub’*C for a given n-by-nc input matrix C.

PARAMETERS

• [in] handle: rocblas_handle.

• [in] uplo: rocblas_fill.

Specifies whether B is upper or lower bidiagonal.

• [in] n: rocblas_int. n >= 0.

The number of rows and columns of matrix B.

• [in] nv: rocblas_int. nv >= 0.

The number of columns of matrix V.

• [in] nu: rocblas_int. nu >= 0.

The number of rows of matrix U.

• [in] nc: rocblas_int. nu >= 0.

The number of columns of matrix C.

• [inout] D: pointer to real type. Array on the GPU of dimension n.

On entry, the diagonal elements of B. On exit, if info = 0, the singular values of B in decreasing order; if info > 0, the diagonal elements of a bidiagonal matrix orthogonally equivalent to B.

• [inout] E: pointer to real type. Array on the GPU of dimension n-1.

On entry, the off-diagonal elements of B. On exit, if info > 0, the off-diagonal elements of a bidiagonal matrix orthogonally equivalent to B (if info = 0 this matrix converges to zero).

• [inout] V: pointer to type. Array on the GPU of dimension ldv*nv.

On entry, the matrix V. On exit, it is overwritten with Vb’*V. (Not referenced if nv = 0).

• [in] ldv: rocblas_int. ldv >= n if nv > 0, or ldv >=1 if nv = 0.

Specifies the leading dimension of V.

• [inout] U: pointer to type. Array on the GPU of dimension ldu*n.

On entry, the matrix U. On exit, it is overwritten with U*Ub. (Not referenced if nu = 0).

• [in] ldu: rocblas_int. ldu >= nu.

Specifies the leading dimension of U.

• [inout] C: pointer to type. Array on the GPU of dimension ldc*nc.

On entry, the matrix C. On exit, it is overwritten with Ub’*C. (Not referenced if nc = 0).

• [in] ldc: rocblas_int. ldc >= n if nc > 0, or ldc >=1 if nc = 0.

Specifies the leading dimension of C.

• [out] info: pointer to a rocblas_int on the GPU.

If info = 0, successful exit. If info = i > 0, i elements of E have not converged to zero.

For more information, see https://rocsolver.readthedocs.io/en/latest/userguide_api.html#rocsolver-type-bdsqr

rocSPARSE_gemmi() Operations for Sparse Matrices

This enhancement provides a dense matrix sparse matrix multiplication using the CSR storage format. rocsparse_gemmi multiplies the scalar αα with a dense m×km×k matrix AA and the sparse k×nk×n matrix BB defined in the CSR storage format, and adds the result to the dense m×nm×n matrix CC that is multiplied by the scalar ββ, such that C:=α⋅op(A)⋅op(B)+β⋅CC:=α⋅op(A)⋅op(B)+β⋅C with

op(A)=⎧⎩⎨⎪⎪A,AT,AH,if trans_A == rocsparse_operation_noneif trans_A == rocsparse_operation_transposeif trans_A == rocsparse_operation_conjugate_transposeop(A)={A,if trans_A == rocsparse_operation_noneAT,if trans_A == rocsparse_operation_transposeAH,if trans_A == rocsparse_operation_conjugate_transpose

and

op(B)=⎧⎩⎨⎪⎪B,BT,BH,if trans_B == rocsparse_operation_noneif trans_B == rocsparse_operation_transposeif trans_B == rocsparse_operation_conjugate_transposeop(B)={B,if trans_B == rocsparse_operation_noneBT,if trans_B == rocsparse_operation_transposeBH,if trans_B == rocsparse_operation_conjugate_transpose Note: This function is non-blocking and executed asynchronously with the host. It may return before the actual computation has finished.

For more information and examples, see https://rocsparse.readthedocs.io/en/master/usermanual.html#rocsparse-gemmi

Download ROCm 3.7.0
Release Notes