diff --git a/README.md b/README.md
index 22deb92..c5242b2 100644
--- a/README.md
+++ b/README.md
@@ -1,39 +1,39 @@
# tuned-amdgpu
-Hacky solution to integrate AMDGPU power profile control in `tuned` with Ansible
+Hacky solution to integrate AMDGPU power control and overclocking in `tuned` with Ansible
Takes a list of existing `tuned` profiles and creates new ones based on them. These new profiles include AMDGPU power/clock parameters
An attempt is made to discover the active GPU using the 'connected' state in the `DRM` subsystem, example:
-```
-$ grep -ls ^connected /sys/class/drm/*/status | grep -o card[0-9] | sort | uniq | sort -h | tail -1
+
+```bash
+~ $ grep -ls ^connected /sys/class/drm/*/status | grep -o card[0-9] | sort | uniq | sort -h | tail -1
card1
```
-
-_Warning_: This is only tested with `RX6000` series GPUs, it is probable that older AMD GPUs will not work properly. Use at your own risk!
+_Warning_: This is only tested with `RX6000` series GPUs, it is probable that other generations will *not* work properly. Use at your own risk!
## Profiles
-An example of the output/provided profiles follow
+Two _'profiles'_ are in each name:
+
+- before `amdgpu` is the source profile provided with `tuned`
+- after `amdgpu` tells the GPU clock profile offered, outlined below
| Output profile | Description |
|:---|---|
| `balanced-amdgpu-default` | Includes the (assumed) existing `balanced` tuned profile.
Only adjusts the GPU power limit (typically lower). Clocks/voltage curve remain the default. |
-| `desktop-amdgpu-VR` | Includes the (assumed) existing `desktop` tuned profile.
Adjusts the GPU power limit, clocks, _and_ the voltage curve.
Uses the predefined `VR` profile in the driver. See `/sys/class/drm/card*/device/pp_power_profile_mode` |
-| `latency-performance-amdgpu-custom` | Includes the existing `latency-performance` tuned profile.
Like the existing GPU profiles (eg: _VR)), this also adjusts the GPU power limit, clocks, _and_ the voltage curve.
This differs by using the `custom` profile in the driver. This opens up further tweaking of the power/clock heuristics through the driver (currently manual). see: [pp-dpm](https://docs.kernel.org/gpu/amdgpu/thermal.html#pp-dpm) |
-
-**Note**: This is non-exhaustive, see the variables `base_profiles` and `amdgpu_profiles` below for the (default) sources of the merged profile mapping
+| `desktop-amdgpu-overclock` | Includes the (assumed) existing `desktop` tuned profile.
Adjusts the GPU power limit, clocks, _and_ the voltage curve. |
+| `desktop-amdgpu-peak` | Includes the (assumed) existing `desktop` tuned profile.
Same as the `overclock` profile, but locks clocks to their highest configured values |
## Notable variables
These are the variables you're likely to want to change. They are defined in [playbook.yml](playbook.yml)
-| Variable | Description | In-playbook |
-|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| gpu_clock_min | Sets the minimum (dynamic) GPU clock (in `Mhz`) for the non-default `amdgpu` profiles | `700` |
-| gpu_clock_max | Sets the maximum (dynamic) GPU clock (in `MHz`) for the non-default `amdgpu` profiles | `2600`, results in `2.6GHz` (rounded); mild overclock |
-| gpumem_clock_static | Sets the _static_ memory clock for the GPU (in `MHz`). This is *not* the _effective_ data rate. That is a multiple of this depending on the type of VRAM.
To avoid flickering this does *not* change dynamically with load. | `1050`, results in just over `1GHz`; mild overclock
Actual effective clock depends on this being multiplied against the data/pump rate of the `GDDR?` GPU memory |
-| gpu_mv_offset | GPU core voltage offset. Takes +/- some integer in millivolts. Can be used to both over _and_ under volt. | `-50` (undervolt `50mV` or `0.05V`) |
-| base_profiles | List of base tuned profiles to clone in the new AMDGPU profiles. Defaults based on `Fedora` |
- `balanced`
- `desktop`
- `latency-performance`
- `network-latency`
- `network-throughput`
- `powersave`
- `virtual-host`
|
-| amdgpu_profiles | Dictionary mapping the AMDGPU power profiles found in `/sys/class/drm/card*/device/pp_power_profile_mode` and custom power limits.
For each item, two keys: `pwrmode` and `pwr_cap_multi`.
`pwrmode` maps to the number assigned in `/sys` above.
`pwr_cap_multi` is a multiplier against board power capability. Must be a float, eg: `0.5` for *50%* |
default:
pwrmode: 0
pwr_cap_multi: 0.75
# 75% relatively safe default
VR:
pwrmode: 4
pwr_cap_multi: 0.8
# 80%, likely slight boost
custom:
pwrmode: 6
pwr_cap_multi: 1.0
# 100%, full GPU board capability
# warning: significantly increased heat
|
-
+| Variable | Description |
+|------------------------|---------------------------------------------------------------------------------------|
+| gpu_clock_min | Sets the minimum (dynamic) GPU clock (in `Mhz`) for the non-default `amdgpu` profiles |
+| gpu_clock_max | Sets the maximum (dynamic) GPU clock (in `MHz`) for the non-default `amdgpu` profiles |
+| gpumem_clock_static | Sets the _static_ memory clock for the GPU (in `MHz`). This is *not* the _effective_ data rate. That is a multiple of this depending on the type of VRAM.
To avoid flickering this does *not* change dynamically with load. |
+| gpu_mv_offset | GPU core voltage offset. Takes +/- some integer in millivolts. Can be used to both over _and_ under volt. eg: `-50` _(undervolt `50mV` or `0.05V`)_ |
+| base_profiles | List of base tuned profiles to clone in the new AMDGPU profiles. Defaults based on `Fedora` |
+| gpu_power_multi | Dictionary with two keys, `default` and `overclock`. Expects two floats to set a power limit relative to the board _capability_. Example: `1.0` is full board capability, `0.5` is 50%. |
diff --git a/playbook.yml b/playbook.yml
index 76d2f95..eec99dd 100644
--- a/playbook.yml
+++ b/playbook.yml
@@ -7,12 +7,23 @@
- role: tuned_amdgpu
# note: 'gpu_*' vars only apply with the 'custom' suffixed profiles created by this tooling
# profiles based on the 'default' amdgpu power profile mode use default clocks
- gpu_clock_min: "750" # default 500
- gpu_clock_max: "2600" # default 2529
- gpumem_clock_static: "1050"
+ #
+ # the connected AMD GPU is automatically discovered - assumes one
+ # on swap to other AMD cards to avoid instability:
+ # 'rm -rfv /etc/tuned/*amdgpu*'
+ gpu_clock_min: "750" # default 500, for best performance: near maximum. applies with 'overclock' tuned profile
+ gpu_clock_max: "2675" # default somewhere around 2529 to 2660
+ gpumem_clock_static: "1075"
+ gpu_power_multi:
+ default: 0.869969040247678 # 281W - real default
+ overclock: 0.928792569659443 # 300W - slight boost
+# overclock: 1.0 # 323W - full board capability
# optional, applies offset (+/-) to GPU voltage by provided mV
- gpu_mv_offset: "-50"
+ # gpu_mv_offset: "-25"
+ # gpu_mv_offset: "+50" # add 50mV or 0.05V
+ gpu_mv_offset: "+25" # add 25mV or 0.025V
# '-50' undervolts GPU core voltage 50mV or 0.05V
+ # mostly untested, there be dragons/instability
#
# list of source tuned profiles available on Fedora (TODO: should dynamically discover)
base_profiles:
@@ -23,27 +34,3 @@
- network-throughput
- powersave
- virtual-host
- #
- # mapping of typical Navi generation power profiles from:
- # /sys/class/drm/card*/device/pp_power_profile_mode
- # ref: https://www.kernel.org/doc/html/v4.20/gpu/amdgpu.html#pp-power-profile-mode
- # 'pwr_cap_multi' is multiplied against board *limit* to determine profile wattage; 0.5 = 50%
- # values below reflect my 6900XT
- amdgpu_profiles:
- default:
- pwrmode: 0
- pwr_cap_multi: 0.789473684210526 # 255W - default
- 3D:
- pwrmode: 1
- pwr_cap_multi: 0.789473684210526 # 255W - default
- VR:
- pwrmode: 4
- pwr_cap_multi: 0.789473684210526 # 255W - default
- compute:
- pwrmode: 5
- pwr_cap_multi: 0.789473684210526 # 255W - default
- custom:
- pwrmode: 6
- pwr_cap_multi: 0.869969040247678 # 281W - slight boost
- # both dictionaries are merged to create new 'tuned' profiles. eg:
- # 'balanced-amdgpu-default', 'balanced-amdgpu-3D', 'balanced-amdgpu-video'
diff --git a/power_max multi tab calculator.ods b/power_max multi tab calculator.ods
index de93923..ea8c9b0 100644
Binary files a/power_max multi tab calculator.ods and b/power_max multi tab calculator.ods differ
diff --git a/roles/tuned_amdgpu/defaults/main.yml b/roles/tuned_amdgpu/defaults/main.yml
index de80a28..6792dcb 100644
--- a/roles/tuned_amdgpu/defaults/main.yml
+++ b/roles/tuned_amdgpu/defaults/main.yml
@@ -1,15 +1,12 @@
---
# defaults file for tuned_amdgpu
#
-# vars handling unit conversion RE: power capabilities/limits
-# the discovered board limit for power capability; in microWatts, then converted
-power_max: "{{ power_max_b64['content'] | b64decode }}"
-board_watts: "{{ power_max | int / 1000000 }}"
# internals for profile power calculations
# item in the context of the with_nested loops in the play
-profile_name: "{{ item.0.key }}"
-profile_percentage: "{{ (item.0.value.pwr_cap_multi * 100.0) | round(2) }}"
-profile_multi: "{{ item.0.value.pwr_cap_multi }}"
-profile_microwatts: "{{ power_max | float * profile_multi | float }}"
-profile_watts: "{{ profile_microwatts | int / 1000000 }}"
+profile_name: "{{ item.0 }}"
+
+amdgpu_profiles:
+ - default
+ - overclock
+ - peak
diff --git a/roles/tuned_amdgpu/files/profile-common.sh b/roles/tuned_amdgpu/files/profile-common.sh
new file mode 100644
index 0000000..5970513
--- /dev/null
+++ b/roles/tuned_amdgpu/files/profile-common.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+#
+# 'common' file sourced by other scripts under tuned profile
+#
+# dynamically determine the connected GPU using the DRM subsystem
+CARD=$(/usr/bin/grep -ls ^connected /sys/class/drm/*/status | /usr/bin/grep -o 'card[0-9]' | /usr/bin/sort | /usr/bin/uniq | /usr/bin/sort -h | /usr/bin/tail -1)
+
+function get_hwmon_dir() {
+ CARD_DIR="/sys/class/drm/${1}/device/"
+ for CANDIDATE in "${CARD_DIR}"/hwmon/hwmon*; do
+ if [[ -f "${CANDIDATE}"/power1_cap ]]; then
+ # found a valid hwmon dir
+ echo "${CANDIDATE}"
+ fi
+ done
+}
+
+# determine the hwmon directory
+HWMON_DIR=$(get_hwmon_dir "${CARD}")
+
+# read all of the power profiles, used to get the IDs for assignment later
+PROFILE_MODES=$(< /sys/class/drm/"${CARD}"/device/pp_power_profile_mode)
+
+# get power capability; later used determine limits
+read -r -d '' POWER_CAP < "$HWMON_DIR"/power1_cap_max
+
+# enable THP; profile enables the 'vm.compaction_proactiveness' sysctl
+# improves allocation latency
+echo 'always' | tee /sys/kernel/mm/transparent_hugepage/enabled
+
+# export determinations
+export CARD
+export HWMON_DIR
+export PROFILE_MODES
+export POWER_CAP
diff --git a/roles/tuned_amdgpu/handlers/main.yml b/roles/tuned_amdgpu/handlers/main.yml
index 60384eb..c9a9ad5 100644
--- a/roles/tuned_amdgpu/handlers/main.yml
+++ b/roles/tuned_amdgpu/handlers/main.yml
@@ -4,3 +4,4 @@
ansible.builtin.service:
name: tuned
state: restarted
+ become: true
diff --git a/roles/tuned_amdgpu/tasks/main.yml b/roles/tuned_amdgpu/tasks/main.yml
index 5f93274..4cc4e42 100644
--- a/roles/tuned_amdgpu/tasks/main.yml
+++ b/roles/tuned_amdgpu/tasks/main.yml
@@ -28,70 +28,57 @@
when: (fed_ppdtuned_swap is not defined) or ('tuned' not in ansible_facts.packages)
become: true
-- name: Determine GPU device in drm subsystem
- ansible.builtin.shell:
- cmd: grep -ls ^connected /sys/class/drm/*/status | grep -o card[0-9] | sort | uniq | sort -h | tail -1
- executable: /bin/bash
- changed_when: false
- register: card
-
-- name: Find hwmon/max power capability file for {{ card.stdout }}
- ansible.builtin.find:
- paths: /sys/class/drm/{{ card.stdout }}/device/hwmon
- file_type: file
- recurse: true
- use_regex: true
- patterns:
- - '^power1_cap_max$'
- register: hwmon
-
-- name: Find hwmon/current power limit file for {{ card.stdout }}
- ansible.builtin.find:
- paths: /sys/class/drm/{{ card.stdout }}/device/hwmon
- file_type: file
- recurse: true
- use_regex: true
- patterns:
- - '^power1_cap$'
- register: powercap_set
-
-- name: Get max power capability for {{ card.stdout }}
- ansible.builtin.slurp:
- src: "{{ hwmon.files.0.path }}"
- register: power_max_b64
+- name: Ensure dynamic tuning is disabled
+ ansible.builtin.lineinfile:
+ path: /etc/tuned/tuned-main.conf
+ regexp: '^dynamic_tuning.*='
+ line: 'dynamic_tuning = 0'
+ notify: Restart tuned
+ become: true
- name: Create custom profile directories
ansible.builtin.file:
state: directory
- path: /etc/tuned/{{ item.1 }}-amdgpu-{{ item.0.key }}
+ path: /etc/tuned/{{ item.1 }}-amdgpu-{{ item.0 }}
mode: "0755"
with_nested:
- - "{{ lookup('dict', amdgpu_profiles) }}"
+ - "{{ amdgpu_profiles }}"
- "{{ base_profiles }}"
become: true
-- name: Template AMDGPU control/reset scripts
+- name: Copy 'common' AMDGPU script for all profiles
+ ansible.builtin.copy:
+ src: profile-common.sh
+ dest: "/etc/tuned/{{ item.1 }}-amdgpu-{{ item.0 }}/amdgpu-common.sh"
+ mode: "0644" # sourced, doesn't require executable bit
+ owner: root
+ group: root
+ notify: Restart tuned
+ with_nested:
+ - "{{ amdgpu_profiles }}"
+ - "{{ base_profiles }}"
+ become: true
+
+- name: Template custom AMDGPU profile scripts
ansible.builtin.template:
- src: templates/amdgpu-clock.sh.j2
- dest: /etc/tuned/{{ item.1 }}-amdgpu-{{ item.0.key }}/amdgpu-clock.sh
+ src: amdgpu-profile-{{ item.0 }}.sh.j2
+ dest: /etc/tuned/{{ item.1 }}-amdgpu-{{ item.0 }}/amdgpu-clock.sh
owner: root
group: root
mode: "0755"
- with_nested:
- - "{{ lookup('dict', amdgpu_profiles) }}"
- - "{{ base_profiles }}"
+ loop: "{{ amdgpu_profiles | product(base_profiles) | list }}"
notify: Restart tuned
become: true
-- name: Template custom tuned profiles
+- name: Template tuned.conf for custom profiles
ansible.builtin.template:
src: templates/tuned.conf.j2
- dest: /etc/tuned/{{ item.1 }}-amdgpu-{{ item.0.key }}/tuned.conf
+ dest: /etc/tuned/{{ item.1 }}-amdgpu-{{ item.0 }}/tuned.conf
owner: root
group: root
mode: "0644"
with_nested:
- - "{{ lookup('dict', amdgpu_profiles) }}"
+ - "{{ amdgpu_profiles }}"
- "{{ base_profiles }}"
notify: Restart tuned
become: true
diff --git a/roles/tuned_amdgpu/templates/amdgpu-clock.sh.j2 b/roles/tuned_amdgpu/templates/amdgpu-clock.sh.j2
deleted file mode 100644
index 90c5f0b..0000000
--- a/roles/tuned_amdgpu/templates/amdgpu-clock.sh.j2
+++ /dev/null
@@ -1,72 +0,0 @@
-#!/bin/bash
-# script for tuned AMDGPU clock control
-# configures GPU power/clock characteristics
-# clocks/power in 3D are dynamic based on need/usage
-#
-# for 'amdgpu-default' tuned profiles, this will reset the characteristics to default
-# for others this will apply overclocking settings -- leaving clock choices to the associated power profile (eg: VR)
-#
-# rendered by Ansible with environment-appropriate values:
-# card #, eg: card0
-# path to discovered sysfs device files (power/clock/voltage control)
-#
-# AMDGPU driver/sysfs references:
-# https://01.org/linuxgraphics/gfx-docs/drm/gpu/amdgpu.html
-# https://docs.kernel.org/gpu/amdgpu/thermal.html
-
-{# done this way to avoid issues with the card number possibly shifting after playbook run #}
-# dynamically determine the connected GPU using the DRM subsystem
-CARD=$(/usr/bin/grep -ls ^connected /sys/class/drm/*/status | /usr/bin/grep -o 'card[0-9]' | /usr/bin/sort | /usr/bin/uniq | /usr/bin/sort -h | /usr/bin/tail -1)
-
-{# begin the templated script for 'default' profiles to reset state #}
-{% if 'default' in profile_name %}
-# set power state transition heuristics to default
-echo '{{ item.0.value.pwrmode }}' | tee /sys/class/drm/"${CARD}"/device/pp_power_profile_mode
-
-# set control mode back to auto
-# attempts to dynamically set optimal power profile for (load) conditions
-echo 'auto' | tee /sys/class/drm/"${CARD}"/device/power_dpm_force_performance_level
-
-# reset any existing profile clock changes
-echo 'r' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
-
-# give '{{ profile_name }}' profile ~{{ profile_percentage }}% (rounded) of the max power capability
-# {{ profile_watts }} Watts of {{ board_watts }} total
-echo '{{ profile_microwatts | int }}' | tee '{{ powercap_set.files.0.path }}'
-{% else %}
-{# begin the templated script for non-default AMD GPU profiles, eg: 'VR' or '3D_FULL_SCREEN' #}
-# set manual control mode
-# allows control via 'pp_dpm_mclk', 'pp_dpm_sclk', 'pp_dpm_pcie', 'pp_dpm_fclk', and 'pp_power_profile_mode' files
-# only interested in 'pp_power_profile_mode' for power and 'pp_dpm_mclk' for memory clock (flickering).
-# GPU clocks are dynamic based on (load) condition
-echo 'manual' | tee /sys/class/drm/"${CARD}"/device/power_dpm_force_performance_level
-
-# set power state transition heuristics to '{{ profile_name }}' profile
-echo '{{ item.0.value.pwrmode }}' | tee /sys/class/drm/"${CARD}"/device/pp_power_profile_mode
-
-# give '{{ profile_name }}' profile ~{{ profile_percentage }}% (rounded) of the max power capability
-# {{ profile_watts }} Watts of {{ board_watts }} total
-echo '{{ profile_microwatts | int }}' | tee '{{ powercap_set.files.0.path }}'
-
-# set the minimum GPU clock
-echo 's 0 {{ gpu_clock_min }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
-
-# set the maximum GPU clock
-echo 's 1 {{ gpu_clock_max }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
-
-# set the maximum GPU *memory* clock
-echo 'm 1 {{ gpumem_clock_static }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
-{% if gpu_mv_offset is defined %}
-
-# offset GPU voltage {{ gpu_mv_offset }}mV
-echo 'vo {{ gpu_mv_offset }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
-{% endif %}
-
-# commit the changes
-echo 'c' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
-
-# force GPU memory into highest clock (fix flickering)
-# pp_dpm_*clk settings are unintuitive, giving profiles that may be used
-# opt not to set the others (eg: sclk/fclk) - those should remain for benefits from the curve
-echo '3' | tee /sys/class/drm/"${CARD}"/device/pp_dpm_mclk
-{% endif %}
diff --git a/roles/tuned_amdgpu/templates/amdgpu-profile-default.sh.j2 b/roles/tuned_amdgpu/templates/amdgpu-profile-default.sh.j2
new file mode 100644
index 0000000..4bd1282
--- /dev/null
+++ b/roles/tuned_amdgpu/templates/amdgpu-profile-default.sh.j2
@@ -0,0 +1,36 @@
+#!/bin/bash
+# script for tuned AMDGPU clock control
+# configures GPU power/clock characteristics
+# clocks/power in 3D are dynamic based on need/usage
+#
+# for 'amdgpu-default' tuned profiles, this will reset the characteristics to default
+# for others this will apply overclocking settings -- leaving clock choices to the associated power profile (eg: VR)
+#
+# rendered by Ansible with environment-appropriate values:
+# card #, eg: card0
+# path to discovered sysfs device files (power/clock/voltage control)
+#
+# AMDGPU driver/sysfs references:
+# https://01.org/linuxgraphics/gfx-docs/drm/gpu/amdgpu.html
+# https://docs.kernel.org/gpu/amdgpu/thermal.html
+#
+# start by including the 'common' script; determines card/hwmon dir/power profiles/power capability
+. $(dirname "${BASH_SOURCE[0]}")/amdgpu-common.sh
+
+{# begin the templated script for 'default' profiles to reset state #}
+# set control mode back to auto
+# attempts to dynamically set optimal power profile for (load) conditions
+echo 'auto' | tee /sys/class/drm/"${CARD}"/device/power_dpm_force_performance_level
+
+# reset any existing profile clock changes
+echo 'r' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# adjust power limit using multiplier against board capability
+POWER_LIM_DEFAULT=$(/usr/bin/awk -v m="$POWER_CAP" -v n={{ gpu_power_multi.default }} 'BEGIN {printf "%.0f", (m*n)}')
+echo "$POWER_LIM_DEFAULT" | tee "${HWMON_DIR}/power1_cap"
+
+# extract the power-saving profile ID number
+PROF_DEFAULT_NUM=$(/usr/bin/awk '$0 ~ /BOOTUP_DEFAULT.*:/ {print $1}' <<< "$PROFILE_MODES")
+
+# reset power/clock heuristics to power-saving
+echo "${PROF_DEFAULT_NUM}" | tee /sys/class/drm/"${CARD}"/device/pp_power_profile_mode
diff --git a/roles/tuned_amdgpu/templates/amdgpu-profile-overclock.sh.j2 b/roles/tuned_amdgpu/templates/amdgpu-profile-overclock.sh.j2
new file mode 100644
index 0000000..1f0aa3a
--- /dev/null
+++ b/roles/tuned_amdgpu/templates/amdgpu-profile-overclock.sh.j2
@@ -0,0 +1,58 @@
+#!/bin/bash
+# script for tuned AMDGPU clock control
+# configures GPU power/clock characteristics
+# clocks/power in 3D are dynamic based on need/usage
+#
+# for 'amdgpu-default' tuned profiles, this will reset the characteristics to default
+# for others this will apply overclocking settings -- leaving clock choices to the associated power profile (eg: VR)
+#
+# rendered by Ansible with environment-appropriate values:
+# card #, eg: card0
+# path to discovered sysfs device files (power/clock/voltage control)
+#
+# AMDGPU driver/sysfs references:
+# https://01.org/linuxgraphics/gfx-docs/drm/gpu/amdgpu.html
+# https://docs.kernel.org/gpu/amdgpu/thermal.html
+#
+# start by including the 'common' script; determines card/hwmon dir/power profiles/power capability
+. $(dirname "${BASH_SOURCE[0]}")/amdgpu-common.sh
+
+{# begin the templated script for 'overclocked' AMD GPU profiles based on the existing tuned profiles #}
+# set the minimum GPU clock - for best performance, this should be near the maximum
+# RX6000 series power management *sucks*
+echo 's 0 {{ gpu_clock_min }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# set the maximum GPU clock
+echo 's 1 {{ gpu_clock_max }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# set the GPU *memory* clock
+# normally this would appear disregarded, memory clocked at the minimum allowed by the overdrive (OD) range
+# it follows the core clock; if both 0/1 profiles for _it_ are high enough, the memory will follow
+echo 'm 1 {{ gpumem_clock_static }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+{% if gpu_mv_offset is defined %}
+
+# offset GPU voltage {{ gpu_mv_offset }}mV
+echo 'vo {{ gpu_mv_offset }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+{% endif %}
+
+# commit the changes
+echo 'c' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# force GPU core and memory into highest clocks (fix flickering and poor power management)
+# set manual control mode
+# allows control via 'pp_dpm_mclk', 'pp_dpm_sclk', 'pp_dpm_pcie', 'pp_dpm_fclk', and 'pp_power_profile_mode' files
+echo 'manual' | tee /sys/class/drm/"${CARD}"/device/power_dpm_force_performance_level
+
+# adjust power limit using multiplier against board capability
+POWER_LIM_OC=$(/usr/bin/awk -v m="$POWER_CAP" -v n={{ gpu_power_multi.overclock }} 'BEGIN {printf "%.0f", (m*n)}')
+echo "$POWER_LIM_OC" | tee "${HWMON_DIR}/power1_cap"
+
+# avoid display flickering, force OC'd memory to highest clock
+echo '3' | tee /sys/class/drm/"${CARD}"/device/pp_dpm_mclk
+
+# extract the VR power profile ID number
+PROF_VR_NUM=$(/usr/bin/awk '$0 ~ /VR.*:/ {print $1}' <<< "$PROFILE_MODES")
+
+# force 'overclocked' profile to 'VR' power/clock heuristics
+# latency/frame timing seemed favorable with relatively-close minimum clocks
+echo "${PROF_VR_NUM}" | tee /sys/class/drm/"${CARD}"/device/pp_power_profile_mode
diff --git a/roles/tuned_amdgpu/templates/amdgpu-profile-peak.sh.j2 b/roles/tuned_amdgpu/templates/amdgpu-profile-peak.sh.j2
new file mode 100644
index 0000000..14105a8
--- /dev/null
+++ b/roles/tuned_amdgpu/templates/amdgpu-profile-peak.sh.j2
@@ -0,0 +1,66 @@
+#!/bin/bash
+# script for tuned AMDGPU clock control
+# configures GPU power/clock characteristics
+# clocks/power in 3D are dynamic based on need/usage
+#
+# for 'amdgpu-default' tuned profiles, this will reset the characteristics to default
+# for others this will apply overclocking settings -- leaving clock choices to the associated power profile (eg: VR)
+#
+# rendered by Ansible with environment-appropriate values:
+# card #, eg: card0
+# path to discovered sysfs device files (power/clock/voltage control)
+#
+# AMDGPU driver/sysfs references:
+# https://01.org/linuxgraphics/gfx-docs/drm/gpu/amdgpu.html
+# https://docs.kernel.org/gpu/amdgpu/thermal.html
+#
+# start by including the 'common' script; determines card/hwmon dir/power profiles/power capability
+. $(dirname "${BASH_SOURCE[0]}")/amdgpu-common.sh
+
+{# begin the templated script for 'overclocked' AMD GPU profiles based on the existing tuned profiles #}
+# set the minimum GPU clock - for best performance, this should be near the maximum
+# RX6000 series power management *sucks*
+echo 's 0 {{ gpu_clock_min }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# set the maximum GPU clock
+echo 's 1 {{ gpu_clock_max }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# set the GPU *memory* clock
+# normally this would appear disregarded, memory clocked at the minimum allowed by the overdrive (OD) range
+# it follows the core clock; if both 0/1 profiles for _it_ are high enough, the memory will follow
+echo 'm 1 {{ gpumem_clock_static }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+{% if gpu_mv_offset is defined %}
+
+# offset GPU voltage {{ gpu_mv_offset }}mV
+echo 'vo {{ gpu_mv_offset }}' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+{% endif %}
+
+# commit the changes
+echo 'c' | tee /sys/class/drm/"${CARD}"/device/pp_od_clk_voltage
+
+# force GPU core and memory into highest clocks (fix flickering and poor power management)
+# set manual control mode
+# allows control via 'pp_dpm_mclk', 'pp_dpm_sclk', 'pp_dpm_pcie', 'pp_dpm_fclk', and 'pp_power_profile_mode' files
+echo 'manual' | tee /sys/class/drm/"${CARD}"/device/power_dpm_force_performance_level
+
+# adjust power limit using multiplier against board capability
+POWER_LIM_OC=$(/usr/bin/awk -v m="$POWER_CAP" -v n={{ gpu_power_multi.overclock }} 'BEGIN {printf "%.0f", (m*n)}')
+echo "$POWER_LIM_OC" | tee "${HWMON_DIR}/power1_cap"
+
+# pp_dpm_*clk settings are unintuitive, giving profiles that may be used
+echo '1' | tee /sys/class/drm/"${CARD}"/device/pp_dpm_sclk
+echo '3' | tee /sys/class/drm/"${CARD}"/device/pp_dpm_mclk
+echo '2' | tee /sys/class/drm/"${CARD}"/device/pp_dpm_fclk
+echo '2' | tee /sys/class/drm/"${CARD}"/device/pp_dpm_socclk
+
+# extract the VR power profile ID number
+PROF_VR_NUM=$(/usr/bin/awk '$0 ~ /VR.*:/ {print $1}' <<< "$PROFILE_MODES")
+
+# force 'overclocked' profile to 'VR' power/clock heuristics
+# latency/frame timing seemed favorable with relatively-close minimum clocks
+echo "${PROF_VR_NUM}" | tee /sys/class/drm/"${CARD}"/device/pp_power_profile_mode
+
+# note 4/8/2023: instead of 'manual'... try dealing with broken power management, force clocks to high
+# ref: https://gitlab.freedesktop.org/drm/amd/-/issues/1500
+# followup: doesn't work that well in practice, still flaky on clocks/frame times
+#echo 'high' | tee /sys/class/drm/"${CARD}"/device/power_dpm_force_performance_level
diff --git a/roles/tuned_amdgpu/templates/tuned.conf.j2 b/roles/tuned_amdgpu/templates/tuned.conf.j2
index 636abad..729e025 100644
--- a/roles/tuned_amdgpu/templates/tuned.conf.j2
+++ b/roles/tuned_amdgpu/templates/tuned.conf.j2
@@ -1,16 +1,22 @@
[main]
include={{ item.1 }}
-summary={{ item.1 }} + TCP/RAID tweaks + AMDGPU pp_power_profile_mode = {{ item.0.value.pwrmode }} ({{ item.0.key }})
+summary={{ item.1 }} + TCP/RAID tweaks + AMDGPU {{ item.0 }}
[sysctl]
+# allow regular users to see the kernel ring buffer
+kernel.dmesg_restrict=0
net.core.default_qdisc=fq
# 'bbr2' requires a [modified] supporting kernel - stock Fedora kernels do *not* support it (currently)
# eg: 'kernel-xanmode-edge' from COPR 'rmnscnce/kernel-xanmod'
net.ipv4.tcp_congestion_control=bbr2
net.core.rmem_max=33554432
net.core.wmem_max=33554432
-dev.raid.speed_limit_min=600000
-dev.raid.speed_limit_max=9000000
+dev.raid.speed_limit_min=1000000
+dev.raid.speed_limit_max=6000000
+# improve THP allocation latency, compact in background
+vm.compaction_proactiveness=30
+# make page lock theft slightly more fair
+vm.page_lock_unfairness=1
# allow some games to run (eg: DayZ)
vm.max_map_count=1048576
@@ -20,3 +26,11 @@ vm.max_map_count=1048576
[gpuclockscript]
type=script
script=${i:PROFILE_DIR}/amdgpu-clock.sh
+
+# for SSDs with no RPM, set no IO scheduler
+[ssdnosched]
+type=disk
+devices_udev_regex=(ID_ATA_ROTATION_RATE_RPM=0)
+# elevator=none
+elevator=kyber
+# elevator=mq-deadline