Mesa Clover for OpenCL #45

Closed
opened 2021-03-03 09:50:05 +01:00 by simon · 4 comments
Owner

This Issue superseeds #33, since this issue will not be fixed with ROCm. It can be fixed, however, by switching to mesa clover.

NixOS PR: https://github.com/NixOS/nixpkgs/pull/82729

davidak uses the PR in his config: 36be7df14e/modules/amd/default.nix

Mesa in 20.09 does not yet support OpenCL 1.2, which is required by many tools, so I have to wait until 21.05.

This Issue superseeds #33, since this issue will not be fixed with ROCm. It can be fixed, however, by switching to mesa clover. NixOS PR: https://github.com/NixOS/nixpkgs/pull/82729 davidak uses the PR in his config: https://codeberg.org/davidak/nixos-config/src/commit/36be7df14e5b62d26adcea07b07c7d6fd01bba73/modules/amd/default.nix Mesa in 20.09 does not yet support OpenCL 1.2, which is required by many tools, so I have to wait until 21.05. - [X] Look into how big of a rebuild this triggers **only mesa, since it does not override packages** - [ ] incorporate into sayuri’s configuration (see https://git.sbruder.de/simon/nixos-config/src/branch/sayuri-mesa-clover)
simon added the
type
bug
blocked by/testing needed
affects/hardware
blocked by/upstream
labels 2021-03-03 09:50:05 +01:00
simon added
blocked by/release 21.05
and removed
blocked by/upstream
labels 2021-03-21 22:29:33 +01:00
Author
Owner
diff of `clinfo` (from rocm to clover)
diff --git a/clinfo.rocm b/clinfo.clover
index ba1de81..415748e 100644
--- a/clinfo.rocm
+++ b/clinfo.clover
@@ -1,68 +1,47 @@
 Number of platforms                               1
-  Platform Name                                   AMD Accelerated Parallel Processing
-  Platform Vendor                                 Advanced Micro Devices, Inc.
-  Platform Version                                OpenCL 2.0 AMD-APP (3182.0)
+  Platform Name                                   Clover
+  Platform Vendor                                 Mesa
+  Platform Version                                OpenCL 1.1 Mesa 20.1.10
   Platform Profile                                FULL_PROFILE
-  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
-  Platform Extensions function suffix             AMD
+  Platform Extensions                             cl_khr_icd
+  Platform Extensions function suffix             MESA
 
-  Platform Name                                   AMD Accelerated Parallel Processing
+  Platform Name                                   Clover
 Number of devices                                 1
-  Device Name                                     gfx803
-  Device Vendor                                   Advanced Micro Devices, Inc.
+  Device Name                                     AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1)
+  Device Vendor                                   AMD
   Device Vendor ID                                0x1002
-  Device Version                                  OpenCL 1.2 
-  Driver Version                                  3182.0 (HSA1.1,LC)
-  Device OpenCL C Version                         OpenCL C 2.0 
+  Device Version                                  OpenCL 1.1 Mesa 20.1.10
+  Driver Version                                  20.1.10
+  Device OpenCL C Version                         OpenCL C 1.1 
   Device Type                                     GPU
-  Device Board Name (AMD)                         Device 67df
-  Device Topology (AMD)                           PCI-E, 02:00.0
   Device Profile                                  FULL_PROFILE
   Device Available                                Yes
   Compiler Available                              Yes
-  Linker Available                                Yes
   Max compute units                               36
-  SIMD per compute unit (AMD)                     4
-  SIMD width (AMD)                                16
-  SIMD instruction width (AMD)                    1
   Max clock frequency                             1266MHz
-  Graphics IP (AMD)                               8.3
-  Device Partition                                (core)
-    Max number of sub-devices                     36
-    Supported partition types                     None
-    Supported affinity domains                    (n/a)
   Max work item dimensions                        3
-  Max work item sizes                             1024x1024x1024
+  Max work item sizes                             256x256x256
   Max work group size                             256
-  Preferred work group size (AMD)                 256
-  Max work group size (AMD)                       1024
   Preferred work group size multiple              64
-  Wavefront width (AMD)                           64
   Preferred / native vector sizes                 
-    char                                                 4 / 4       
-    short                                                2 / 2       
-    int                                                  1 / 1       
-    long                                                 1 / 1       
-    half                                                 1 / 1        (cl_khr_fp16)
-    float                                                1 / 1       
-    double                                               1 / 1        (cl_khr_fp64)
-  Half-precision Floating-point support           (cl_khr_fp16)
-    Denormals                                     No
-    Infinity and NANs                             No
-    Round to nearest                              No
-    Round to zero                                 No
-    Round to infinity                             No
-    IEEE754-2008 fused multiply-add               No
-    Support is emulated in software               No
+    char                                                16 / 16      
+    short                                                8 / 8       
+    int                                                  4 / 4       
+    long                                                 2 / 2       
+    half                                                 0 / 0        (n/a)
+    float                                                4 / 4       
+    double                                               2 / 2        (cl_khr_fp64)
+  Half-precision Floating-point support           (n/a)
   Single-precision Floating-point support         (core)
     Denormals                                     No
     Infinity and NANs                             Yes
     Round to nearest                              Yes
-    Round to zero                                 Yes
-    Round to infinity                             Yes
-    IEEE754-2008 fused multiply-add               Yes
+    Round to zero                                 No
+    Round to infinity                             No
+    IEEE754-2008 fused multiply-add               No
     Support is emulated in software               No
-    Correctly-rounded divide and sqrt operations  Yes
+    Correctly-rounded divide and sqrt operations  No
   Double-precision Floating-point support         (cl_khr_fp64)
     Denormals                                     Yes
     Infinity and NANs                             Yes
@@ -73,71 +52,43 @@ Number of devices                                 1
     Support is emulated in software               No
   Address bits                                    64, Little-Endian
   Global memory size                              8589934592 (8GiB)
-  Global free memory (AMD)                        8388608 (8GiB)
-  Global memory channels (AMD)                    8
-  Global memory banks per channel (AMD)           4
-  Global memory bank width (AMD)                  256 bytes
   Error Correction support                        No
-  Max memory allocation                           7301444403 (6.8GiB)
+  Max memory allocation                           6871947673 (6.4GiB)
   Unified memory for Host and Device              No
   Minimum alignment for any data type             128 bytes
-  Alignment of base address                       1024 bits (128 bytes)
-  Global Memory cache type                        Read/Write
-  Global Memory cache size                        16384 (16KiB)
-  Global Memory cache line size                   64 bytes
-  Image support                                   Yes
-    Max number of samplers per kernel             26591
-    Max size for 1D images from buffer            65536 pixels
-    Max 1D or 2D image array size                 2048 images
-    Base address alignment for 2D image buffers   256 bytes
-    Pitch alignment for 2D image buffers          256 pixels
-    Max 2D image size                             16384x16384 pixels
-    Max 3D image size                             2048x2048x2048 pixels
-    Max number of read image args                 128
-    Max number of write image args                8
+  Alignment of base address                       32768 bits (4096 bytes)
+  Global Memory cache type                        None
+  Image support                                   No
   Local memory type                               Local
-  Local memory size                               65536 (64KiB)
-  Local memory syze per CU (AMD)                  65536 (64KiB)
-  Local memory banks (AMD)                        32
-  Max number of constant args                     8
-  Max constant buffer size                        7301444403 (6.8GiB)
-  Preferred constant buffer size (AMD)            16384 (16KiB)
+  Local memory size                               32768 (32KiB)
+  Max number of constant args                     16
+  Max constant buffer size                        2147483392 (2GiB)
   Max size of kernel argument                     1024
   Queue properties                                
     Out-of-order execution                        No
     Profiling                                     Yes
-  Prefer user sync for interop                    Yes
-  Number of P2P devices (AMD)                     0
-  P2P devices (AMD)                               <printDeviceInfo:147: get number of CL_DEVICE_P2P_DEVICES_AMD : error -30>
-  Profiling timer resolution                      1ns
-  Profiling timer offset since Epoch (AMD)        0ns (Thu Jan  1 01:00:00 1970)
+  Profiling timer resolution                      0ns
   Execution capabilities                          
     Run OpenCL kernels                            Yes
     Run native kernels                            No
-    Thread trace supported (AMD)                  No
-    Number of async queues (AMD)                  8
-    Max real-time compute queues (AMD)            8
-    Max real-time compute units (AMD)             36
-  printf() buffer size                            4194304 (4MiB)
-  Built-in kernels                                (n/a)
-  Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 
+  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64
 
 NULL platform behavior
-  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  AMD Accelerated Parallel Processing
-  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [AMD]
-  clCreateContext(NULL, ...) [default]            Success [AMD]
+  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
+  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
+  clCreateContext(NULL, ...) [default]            Success [MESA]
   clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
-    Platform Name                                 AMD Accelerated Parallel Processing
-    Device Name                                   gfx803
+    Platform Name                                 Clover
+    Device Name                                   AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1)
   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
-    Platform Name                                 AMD Accelerated Parallel Processing
-    Device Name                                   gfx803
+    Platform Name                                 Clover
+    Device Name                                   AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1)
   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
-    Platform Name                                 AMD Accelerated Parallel Processing
-    Device Name                                   gfx803
+    Platform Name                                 Clover
+    Device Name                                   AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1)
 
 ICD loader properties
   ICD loader Name                                 OpenCL ICD Loader
<details> <summary>diff of `clinfo` (from rocm to clover)</summary> ```diff diff --git a/clinfo.rocm b/clinfo.clover index ba1de81..415748e 100644 --- a/clinfo.rocm +++ b/clinfo.clover @@ -1,68 +1,47 @@ Number of platforms 1 - Platform Name AMD Accelerated Parallel Processing - Platform Vendor Advanced Micro Devices, Inc. - Platform Version OpenCL 2.0 AMD-APP (3182.0) + Platform Name Clover + Platform Vendor Mesa + Platform Version OpenCL 1.1 Mesa 20.1.10 Platform Profile FULL_PROFILE - Platform Extensions cl_khr_icd cl_amd_event_callback - Platform Extensions function suffix AMD + Platform Extensions cl_khr_icd + Platform Extensions function suffix MESA - Platform Name AMD Accelerated Parallel Processing + Platform Name Clover Number of devices 1 - Device Name gfx803 - Device Vendor Advanced Micro Devices, Inc. + Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1) + Device Vendor AMD Device Vendor ID 0x1002 - Device Version OpenCL 1.2 - Driver Version 3182.0 (HSA1.1,LC) - Device OpenCL C Version OpenCL C 2.0 + Device Version OpenCL 1.1 Mesa 20.1.10 + Driver Version 20.1.10 + Device OpenCL C Version OpenCL C 1.1 Device Type GPU - Device Board Name (AMD) Device 67df - Device Topology (AMD) PCI-E, 02:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes - Linker Available Yes Max compute units 36 - SIMD per compute unit (AMD) 4 - SIMD width (AMD) 16 - SIMD instruction width (AMD) 1 Max clock frequency 1266MHz - Graphics IP (AMD) 8.3 - Device Partition (core) - Max number of sub-devices 36 - Supported partition types None - Supported affinity domains (n/a) Max work item dimensions 3 - Max work item sizes 1024x1024x1024 + Max work item sizes 256x256x256 Max work group size 256 - Preferred work group size (AMD) 256 - Max work group size (AMD) 1024 Preferred work group size multiple 64 - Wavefront width (AMD) 64 Preferred / native vector sizes - char 4 / 4 - short 2 / 2 - int 1 / 1 - long 1 / 1 - half 1 / 1 (cl_khr_fp16) - float 1 / 1 - double 1 / 1 (cl_khr_fp64) - Half-precision Floating-point support (cl_khr_fp16) - Denormals No - Infinity and NANs No - Round to nearest No - Round to zero No - Round to infinity No - IEEE754-2008 fused multiply-add No - Support is emulated in software No + char 16 / 16 + short 8 / 8 + int 4 / 4 + long 2 / 2 + half 0 / 0 (n/a) + float 4 / 4 + double 2 / 2 (cl_khr_fp64) + Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes - Round to zero Yes - Round to infinity Yes - IEEE754-2008 fused multiply-add Yes + Round to zero No + Round to infinity No + IEEE754-2008 fused multiply-add No Support is emulated in software No - Correctly-rounded divide and sqrt operations Yes + Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes @@ -73,71 +52,43 @@ Number of devices 1 Support is emulated in software No Address bits 64, Little-Endian Global memory size 8589934592 (8GiB) - Global free memory (AMD) 8388608 (8GiB) - Global memory channels (AMD) 8 - Global memory banks per channel (AMD) 4 - Global memory bank width (AMD) 256 bytes Error Correction support No - Max memory allocation 7301444403 (6.8GiB) + Max memory allocation 6871947673 (6.4GiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes - Alignment of base address 1024 bits (128 bytes) - Global Memory cache type Read/Write - Global Memory cache size 16384 (16KiB) - Global Memory cache line size 64 bytes - Image support Yes - Max number of samplers per kernel 26591 - Max size for 1D images from buffer 65536 pixels - Max 1D or 2D image array size 2048 images - Base address alignment for 2D image buffers 256 bytes - Pitch alignment for 2D image buffers 256 pixels - Max 2D image size 16384x16384 pixels - Max 3D image size 2048x2048x2048 pixels - Max number of read image args 128 - Max number of write image args 8 + Alignment of base address 32768 bits (4096 bytes) + Global Memory cache type None + Image support No Local memory type Local - Local memory size 65536 (64KiB) - Local memory syze per CU (AMD) 65536 (64KiB) - Local memory banks (AMD) 32 - Max number of constant args 8 - Max constant buffer size 7301444403 (6.8GiB) - Preferred constant buffer size (AMD) 16384 (16KiB) + Local memory size 32768 (32KiB) + Max number of constant args 16 + Max constant buffer size 2147483392 (2GiB) Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes - Prefer user sync for interop Yes - Number of P2P devices (AMD) 0 - P2P devices (AMD) <printDeviceInfo:147: get number of CL_DEVICE_P2P_DEVICES_AMD : error -30> - Profiling timer resolution 1ns - Profiling timer offset since Epoch (AMD) 0ns (Thu Jan 1 01:00:00 1970) + Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No - Thread trace supported (AMD) No - Number of async queues (AMD) 8 - Max real-time compute queues (AMD) 8 - Max real-time compute units (AMD) 36 - printf() buffer size 4194304 (4MiB) - Built-in kernels (n/a) - Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program + Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 NULL platform behavior - clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing - clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD] - clCreateContext(NULL, ...) [default] Success [AMD] + clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover + clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] + clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) - Platform Name AMD Accelerated Parallel Processing - Device Name gfx803 + Platform Name Clover + Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) - Platform Name AMD Accelerated Parallel Processing - Device Name gfx803 + Platform Name Clover + Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) - Platform Name AMD Accelerated Parallel Processing - Device Name gfx803 + Platform Name Clover + Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.35.0, 5.4.105, LLVM 9.0.1) ICD loader properties ICD loader Name OpenCL ICD Loader ``` </details>
simon added
blocked by/testing needed/sayuri
and removed
blocked by/testing needed
labels 2021-04-06 23:43:20 +02:00
Author
Owner

As of yesterday, the patch from the PR no longer applies on 21.05’s mesa.

As of yesterday, the patch from the PR no longer applies on 21.05’s mesa.
simon added
blocked by/upstream
and removed
blocked by/release 21.05
labels 2021-05-28 15:05:30 +02:00
simon started working 2021-08-29 12:25:00 +02:00
simon canceled time tracking 2021-08-29 12:25:06 +02:00
Author
Owner

See https://github.com/NixOS/nixpkgs/pull/136402 for something that will probably get merged soon™.

Sadly, Clover still only seems to support OpenCL 1.1 on my hardware, so some things don’t work (e.g. NNEDI3CL).

See https://github.com/NixOS/nixpkgs/pull/136402 for something that will probably get merged soon™. Sadly, Clover still only seems to support OpenCL 1.1 on my hardware, so some things don’t work (e.g. NNEDI3CL).
simon removed the
blocked by/testing needed/sayuri
label 2021-09-01 23:23:19 +02:00
simon removed the
blocked by/upstream
label 2021-10-08 17:13:44 +02:00
simon added the
blocked by/release 21.11
label 2021-10-08 19:17:51 +02:00
simon added a new dependency 2022-03-10 18:49:44 +01:00
simon removed a dependency 2022-03-10 18:49:48 +01:00
Author
Owner

sayuri (now hitagi) no longer has an AMD GPU, so this no longer applies.

AMD not supporting GPGPU well on any but the newest generations was one of the reasons why it no longer has an AMD GPU.

sayuri (now hitagi) no longer has an AMD GPU, so this no longer applies. AMD not supporting GPGPU well on any but the newest generations was one of the reasons why it no longer has an AMD GPU.
simon closed this issue 2023-02-11 23:00:02 +01:00
simon removed the
blocked by/testing needed
blocked by/testing needed/sayuri
labels 2023-02-11 23:00:08 +01:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: simon/nixos-config#45
No description provided.