Software 43918 Published by

AMD has released ROCm 6.4.3, a significant release that addresses multiple issues, featuring updates for AMD Radeon PRO and Radeon GPU drivers, enhancements to ROCm SMI, and improvements to ROCm documentation. The update addresses a problem that was leading to performance degradation in communication operations due to heightened latency in specific RCCL applications. The update addresses a problem in the AMDGPU driver's scheduler constraints that may lead to failures in queue preemption during workload execution. The ROCm documentation is being consistently updated to offer clearer and more comprehensive guidance tailored to a diverse range of user needs and use cases.

TThe release includes five new tutorials specifically designed for AI developers, which cover topics such as inference, ChatQnA vLLM deployment and performance evaluation, text-to-video generation with ComfyUI, DeepSeek Janus Pro on CPU or GPU, DeepSeek-R1 with vLLM V1, and GPU development and optimization. AMD ROCm offers a robust ecosystem for deep learning development, featuring support for Taichi, a streamlined library designed for mixture-of-experts training, along with updated information on hardware and library support. The support for the operating system and hardware remains consistent in this release.



ROCm has experienced notable transformations, such as the transfer of AMD SMI from the ROCm organization repository to a newly established AMDTools repository. Additionally, there has been the discontinuation of ROCm SMI, the phase-out of ROCTracer, ROCProfiler, rocprof, and rocprofv2, as well as the deprecation of AMDGPU wavefront size compiler macros, HIPCC Perl scripts, and ROCm Object Tooling tools. The proposed modifications are designed to enhance the alignment between HIP and CUDA APIs or behaviors, streamline header files, eliminate namespace collisions, and establish a distinct separation between hipRTC and the HIP runtime.

The ROCm software stack is anticipated to experience multiple changes in the near future, including the discontinuation of ROCm SMI, ROCTracer, ROCProfiler, rocprof, and rocprofv2, as well as the removal of AMDGPU wavefront size compiler macros and HIPCC Perl scripts. The ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj will be deprecated in an upcoming release, with their functionality integrated into the llvm-objdump --offloading tool option.

Here is the full announcement:

ROCm 6.4.3 Release

The release notes provide a summary of notable changes since the previous ROCm release.

If you’re using AMD Radeon:tm: PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
documentation to verify compatibility and system requirements.

Release highlights

ROCm 6.4.3 is a quality release that resolves the following issues. For changes to individual components, see  Detailed component changes.

AMDGPU driver updates

  • Resolved an issue causing performance degradation in communication operations, caused by increased latency in certain RCCL applications. The fix prevents unnecessary queue eviction during the fork process.
  • Fixed an issue in the AMDGPU driver’s scheduler constraints that could cause queue preemption to fail during workload execution.

ROCm SMI update

  • Fixed the failure to load GPU data like System Clock (SCLK) by adjusting the logic for retrieving GPU board voltage.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

Operating system and hardware support changes

Operating system and hardware support remain unchanged in this release.

See the  Compatibility
matrix
for more information about operating system and hardware compatibility.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.4.3.
Click {fab}github to go to the component's source code on GitHub.

CategoryGroupNameVersion
LibrariesMachine learning and computer vision Composable Kernel1.1.0
MIGraphX2.12.0
MIOpen3.4.0
MIVisionX3.2.0
rocAL2.2.0
rocDecode0.10.0
rocJPEG0.8.0
rocPyDecode0.3.1
RPP1.9.10
Communication RCCL2.22.3
rocSHMEM2.0.1
Math hipBLAS2.4.0
hipBLASLt0.12.1
hipFFT1.0.18
hipfort0.6.0
hipRAND2.12.0
hipSOLVER2.4.0
hipSPARSE3.2.0
hipSPARSELt0.2.3
rocALUTION3.2.3
rocBLAS4.4.1
rocFFT1.0.32
rocRAND3.3.0
rocSOLVER3.28.2
rocSPARSE3.4.0
rocWMMA1.7.0
Tensile4.43.0
Primitives hipCUB3.4.0
hipTensor1.5.0
rocPRIM3.4.1
rocThrust3.3.0
ToolsSystem management AMD SMI25.5.1
ROCm Data Center Tool0.3.0
rocminfo1.0.0
ROCm SMI7.5.0 ⇒  7.7.0
ROCm Validation Suite1.1.0
Performance ROCm Bandwidth Test1.4.0
ROCm Compute Profiler3.1.1
ROCm Systems Profiler1.0.2
ROCProfiler2.0.0
ROCprofiler-SDK0.6.0
ROCTracer4.1.0
Development HIPIFY19.0.0
ROCdbgapi0.77.2
ROCm CMake0.14.0
ROCm Debugger (ROCgdb)15.2
ROCr Debug Agent2.0.4
Compilers HIPCC1.1.1
llvm-project19.0.0
Runtimes HIP6.4.3
ROCr Runtime1.15.0

Detailed component changes

The following sections describe key changes to ROCm components.

For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.

ROCm SMI (7.7.0)

Added

  • Support for getting the GPU Board voltage.
See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.

ROCm known issues

ROCm known issues are noted on {fab}github  GitHub. For known
issues related to individual components, review the  Detailed component changes.

ROCm upcoming changes

The following changes to the ROCm software stack are anticipated for future releases.

AMD SMI migration to AMDGPU driver repository

In a future release,  AMD SMI will be relocated from the ROCm organization repository to a new AMDTools repository to better align with its system-level functionality. amd-smi-lib will no longer be included in the rocm-developer-tools meta-package included with your standard ROCm installation. Instead, it will be packaged with the AMDGPU driver installation.

ROCm SMI deprecation

ROCm SMI will be phased out in an
upcoming ROCm release and will enter maintenance mode. After this transition,
only critical bug fixes will be addressed and no further feature development
will take place.

It's strongly recommended to transition your projects to  AMD
SMI
, the successor to ROCm SMI. AMD SMI
includes all the features of the ROCm SMI and will continue to receive regular
updates, new functionality, and ongoing support. For more information on AMD
SMI, see the  AMD SMI documentation.

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation

Development and support for ROCTracer, ROCProfiler, rocprof, and rocprofv2 are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.

It's anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.

AMDGPU wavefront size compiler macro deprecation

Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE
and __AMDGCN_WAVEFRONT_SIZE__ macros or the constexpr warpSize variable is deprecated
and will be disabled in a future release.

  • The __AMDGCN_WAVEFRONT_SIZE__ macro and __AMDGCN_WAVEFRONT_SIZE alias will be removed in an upcoming release.
    It is recommended to remove any use of this macro. For more information, see
    AMDGPU support.
  • warpSize will only be available as a non-constexpr variable. Where required,
    the wavefront size should be queried via the warpSize variable in device code,
    or via hipGetDeviceProperties in host code. Neither of these will result in a compile-time constant. For more information, see  warpSize.
  • For cases where compile-time evaluation of the wavefront size cannot be avoided,
    uses of __AMDGCN_WAVEFRONT_SIZE__AMDGCN_WAVEFRONT_SIZE__, or warpSize
    can be replaced with a user-defined macro or constexpr variable with the wavefront
    size(s) for the target hardware. For example:
   #if defined(__GFX9__)
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 64
   #else
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 32
   #endif

HIPCC Perl scripts deprecation

The HIPCC Perl scripts (hipcc.pl and hipconfig.pl) will be removed in an upcoming release.

Changes to ROCm Object Tooling

ROCm Object Tooling tools roc-obj-lsroc-obj-extract, and roc-obj are
deprecated in ROCm 6.4, and will be removed in a future release. Functionality
has been added to the llvm-objdump --offloading tool option to extract all
clang-offload-bundles into individual code objects found within the objects
or executables passed as input. The llvm-objdump --offloading tool option also
supports the --arch-name option, and only extracts code objects found with
the specified target architecture. See  llvm-objdumpfor more information.

HIP runtime API changes

There are a number of upcoming changes planned for HIP runtime API in an upcoming major release
that are not backward compatible with prior releases. Most of these changes increase
alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to
clean up header files, remove namespace collision, and have a clear separation between
hipRTC and HIP runtime. For more information, see  HIP 7.0 Is Coming: What You Need to Know to Stay Ahead.

Release ROCm 6.4.3 Release · ROCm/ROCm