Fan speed control not working on Pi 5 under Ubuntu 23.10

Bug #2041741 reported by Steve Pringle
106
This bug affects 16 people
Affects Status Importance Assigned to Milestone
linux-raspi (Ubuntu)
Fix Released
High
Unassigned
Mantic
Fix Released
High
Juerg Haefliger

Bug Description

[Impact]

The fan speed control when running Ubuntu 23.10 on a Raspberry Pi 5 does not appear to be working. The fan runs at near maximum speed all the time and the cur_state in /sys/class/thermal/cooling_device0/ is stuck at the value of 4 no matter what the CPU temperature. I suspect it is related to this fault reported on Raspberry Pi OS: https://forums.raspberrypi.com/viewtopic.php?t=356881

[Test Case]

Plug in a fan and verify that it does not run at full speed all the times.

[Regression Potential]

Simple modification to the step_wise thermal governor. If bad, might result in fans always spinning or not at all.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/2041741/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux-raspi (Ubuntu)
Juerg Haefliger (juergh)
tags: added: kern-8417
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-raspi (Ubuntu):
status: New → Confirmed
Revision history for this message
Jürgen Kreileder (jk) wrote :

This affects at least the pwm-fan and gpio-fan overlays as both are using the step_wise governor.

The problem supposedly has been fixed in raspberrypi rpi-6.5.y: https://github.com/raspberrypi/linux/commit/51c8b070cbf3f5df97a633c5203f729013ab60d0

Jürgen Kreileder (jk)
summary: - Fan speed Control not working on RASPERRY PI 5 RUNNING UBUNTU 23.10
+ Fan speed Control not working on RASPERRY PI RUNNING UBUNTU 23.10
Dave Jones (waveform)
Changed in linux-raspi (Ubuntu):
importance: Undecided → High
Changed in linux-raspi (Ubuntu Mantic):
status: New → Confirmed
importance: Undecided → High
summary: - Fan speed Control not working on RASPERRY PI RUNNING UBUNTU 23.10
+ Fan speed control not working on Pi 5 under Ubuntu 23.10
Revision history for this message
Dave Jones (waveform) wrote :

The step-wise governor certainly appears to have some issues. It would appear our version is reading trip_temp before it is ever initialized (https://git.launchpad.net/ubuntu/+source/linux-raspi/tree/drivers/thermal/gov_step_wise.c#n82). The patch linked from the forum post in the description corrects this; will build a test kernel to see if this fixes the issue.

Revision history for this message
Dave Jones (waveform) wrote :

Incidentally, a work-around to make things more bearable until this can be fixed is as follows:

  $ echo 2 | sudo tee /sys/class/thermal/cooling_device0/cur_state

Where 2 is a medium (and in my opinion, largely silent) fan speed. Valid values are between 0 (off) and 4 (full speed).

Revision history for this message
Dave Jones (waveform) wrote :

@juergh Confirmed that the patch to drivers/thermal/gov_step_wise.c is indeed the root cause. In thermal_zone_trip_update, I changed:

  hyst_temp = trip.temperature

To:

  hyst_temp = trip_temp = trip.temperature;

This ensures trip_temp is not uninitialized when it's read. Also removed the (redundantly added) second call to get_tz_trend; fan operated correctly after installing recompiled kernel.

Revision history for this message
Dave Jones (waveform) wrote :

I've applied the patch to a kernel build in ppa:waveform/fan-fix (https://launchpad.net/~waveform/+archive/ubuntu/fan-fix) which should serve as a temporary fix until the next kernel release.

Juerg Haefliger (juergh)
description: updated
Revision history for this message
Steve Pringle (stevepringle) wrote :

I have applied to temporary fix from the ppa patch and can confirm that the fan speed is now being controlled correctly to regulate the cpu temperature.

Juerg Haefliger (juergh)
Changed in linux-raspi (Ubuntu Mantic):
status: Confirmed → In Progress
Changed in linux-raspi (Ubuntu):
status: Confirmed → Invalid
Changed in linux-raspi (Ubuntu Mantic):
assignee: nobody → Juerg Haefliger (juergh)
Revision history for this message
Jürgen Kreileder (jk) wrote :

@waveform I'm not sure if that is the final solution. With your build speed control works but the set hysteresis (default: 5°C) seems to be ignored. This results in very frequent changes around trip points.

Did you just fix the uninitialized variable or apply the whole commit (https://github.com/raspberrypi/linux/commit/29d2bdf66191b6e50deef2110792c43c10cccfd9)?

E.g. lots of changes between states 0 and 1, with the fan sometimes stopping for just a fraction of a second:

$ cat /sys/class/thermal/cooling_device0/stats/time_in_state_ms
state0 6365509
state1 5434504
state2 0
state3 0
state4 0
$ cat /sys/class/thermal/cooling_device0/stats/total_trans
5371
$ while true ; do cat /sys/class/thermal/thermal_zone0/temp; cat /sys/class/thermal/cooling_device0/cur_state; sleep 1; done
49600
0
49600
1
49600
0
49600
0
50150
1
49050
0
49600
0
49600
1
49050
0
49600
0
49600
0
49600
0
48500
0
49600
0
48500
0
49050
0
49050
0
50150
1
49600
1
51250
0
49050
0
50700
0
50150
0
51250
0
50700
1
49600
1
50700
1
49600
1
49050
0
50700
0
50150
0
50150
1
51250
1
50150
1
50700
1
50150
0
50150
0
49050
0
50150
1
50150
1
49600
1
49600
1
49050
1
49050
0
49050
0

Juerg Haefliger (juergh)
Changed in linux-raspi (Ubuntu Mantic):
status: In Progress → Fix Committed
Revision history for this message
Janåke Rönnblom (jan-ake) wrote :

Seems like the patch is not complete as Jürgen Kreileder (jk) pointed out.

Compare https://github.com/raspberrypi/linux/commit/29d2bdf66191b6e50deef2110792c43c10cccfd9

with https://launchpadlibrarian.net/697908072/linux-raspi_6.5.0-1005.7_6.5.0-1006.9~fanfix1.diff.gz

Also fanfix contains other fixes.

Revision history for this message
Jürgen Kreileder (jk) wrote :

I stand corrected. The missing part of https://github.com/raspberrypi/linux/commit/29d2bdf66191b6e50deef2110792c43c10cccfd9 is already in the code. So the fix is correct.

As for the missing hysteresis support: get_trip_hyst is an optional operation (hence the "if (tz->ops->get_trip_hyst)"). It actually has been removed in the 6.7 kernel: https://github.com/torvalds/linux/commit/35d8dbbb25add265a880ab0dc48a229f06b08325

The way to go apparently is to read hyst values directly. I've opened an issue on GitHub: https://github.com/raspberrypi/linux/issues/5726

Revision history for this message
Dave Jones (waveform) wrote :

I have noted, as @jk does above, that with my patch in place, the fan does sometimes wobble between on/off states which it really shouldn't if hysteresis is sufficiently wide and is operating correctly. Still, not being fully versed in kernel patching, I opted for a minimal change that I could definitely comprehend the effect of (on the basis the results were "good enough for now").

On the subject of fanfix containing other fixes: it doesn't, but I can understand why it looks that way from the diff provided by the PPA. Unfortunately, Launchpad's gotten confused as to what version to diff against so it's looking at 1005.7 rather than 1006.8 as a base, hence why other stuff is showing up in there.

Unfortunately, it appears there's been another linux-raspi release (1007.9) but without the patch in place (presumably one that was already in the pipeline?), so my fan is back to spinning madly. I'll see if I can upload another patched version to the PPA shortly. I'll also push my branch to Launchpad so you can see the patch I'm building from (as Launchpad will probably diff the wrong thing in the PPA again).

@juergh -- is 1008 the one scheduled to have the fix, and am I right in thinking the first week of December is when it's due (or am I reading kernel.ubuntu.com wrong)?

Revision history for this message
Dave Jones (waveform) wrote :

Okay, new version pushed to the PPA. I need sleep, so I'm afraid I'm not going to be around to test it for another 10 hours or so, but if anyone wants to see the source I'm building from, see the "fanfix" branch in https://code.launchpad.net/~waveform/ubuntu/+source/linux-raspi/+git/mantic. This can be compared to the repo at https://code.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux-raspi/+git/mantic which was my base, specifically the Ubuntu-raspi-6.5.0-1007.9 tag.

Given the kernel's slightly bizarre package format (1.0!) there will be further differences to the sources uploaded to the PPA. If you want to replicate the source build yourself (either to verify there's nothing iffy going on, or that I have a clue what I'm doing :) the following procedure should replicate the build of the source package, assuming you have a mantic chroot set up for sbuild:

  # Get the pre-requisites
  sudo apt install ubuntu-dev-tools sbuild fakeroot

  # Grab the current linux-raspi source package to get the orig tar-ball
  pull-lp-source --download-only linux-raspi mantic

  # Grab my branch (this will take a while)
  git clone https://git.launchpad.net/~waveform/ubuntu/+source/linux-raspi/+git/mantic linux-raspi
  cd linux-raspi
  git checkout fanfix

  # Build the source package
  fakeroot ./debian/rules clean
  sbuild --no-arch-all --no-arch-any --source --dist mantic

At this point you should have a linux-raspi_6.5.0-1007.10~fanfix1_source.changes in the parent directory (with accompanying files, including a .diff.gz) which you could upload to a PPA (set for arm64/armhf builds) to build your own version. This *should* match what I'm building in my fan-fix PPA.

Revision history for this message
Jürgen Kreileder (jk) wrote :

I don't have a working build environment currently. But something like

diff --git a/drivers/thermal/gov_step_wise.c b/drivers/thermal/gov_step_wise.c
index eefeb6407d0f..904a66c9e499 100644
--- a/drivers/thermal/gov_step_wise.c
+++ b/drivers/thermal/gov_step_wise.c
@@ -99,12 +99,15 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_id
  }

  hyst_temp = trip_temp = trip.temperature;
- if (tz->ops->get_trip_hyst) {
- tz->ops->get_trip_hyst(tz, trip_id, &hyst_temp);
- hyst_temp = trip_temp - hyst_temp;
- }
  trip_type = trip.type;

+ if (trip.hysteresis)
+ hyst_temp = trip_temp - trip.hysteresis;
+ else
+ dev_info_once(&tz->device,
+ "Zero hysteresis value for Trip%d[type=%d]\n",
+ trip_id, trip_type);
+
  dev_dbg(&tz->device,
   "Trip%d[type=%d,temp=%d,hyst=%d]:trend=%d,throttle=%d\n",
   trip_id, trip_type, trip.temperature, hyst_temp, trend, throttle);
--

probably will fix the hysteresis issue. This should read the values from the trip tables directly (just like, e.g., in gov_bang_bang.c).

Revision history for this message
Jürgen Kreileder (jk) wrote :

Yes, this fixes the problem with hysteresis:

[ 825.785812] thermal thermal_zone0: Trip1[type=0,temp=50000,hyst=45000]:trend=0,throttle=0
[ 825.785829] thermal thermal_zone0: Trip2[type=0,temp=60000,hyst=55000]:trend=0,throttle=0
[ 825.785838] thermal thermal_zone0: Trip3[type=0,temp=67500,hyst=62500]:trend=0,throttle=0
[ 825.785846] thermal thermal_zone0: Trip4[type=0,temp=75000,hyst=70000]:trend=0,throttle=0

https://launchpad.net/~jk/+archive/ubuntu/linux-raspi
https://github.com/raspberrypi/linux/pull/5736/commits/a57989416faf34a4d7358f3786c0de52e5af81cc

Revision history for this message
Juerg Haefliger (juergh) wrote :

I've created a new bug for this issue: bug 2044341

We're past the patch submission date for the current cycle so this will land in a later cycle.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-raspi/6.5.0-1008.11 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-raspi' to 'verification-done-mantic-linux-raspi'. If the problem still exists, change the tag 'verification-needed-mantic-linux-raspi' to 'verification-failed-mantic-linux-raspi'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-raspi-v2 verification-needed-mantic-linux-raspi
Revision history for this message
Dave Jones (waveform) wrote :

Verified on Pi 5 with an official fan, and a Pi 4B with a GPIO-based fan

tags: added: verification-done-mantic-linux-raspi
removed: verification-needed-mantic-linux-raspi
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package linux-raspi - 6.5.0-1008.11

---------------
linux-raspi (6.5.0-1008.11) mantic; urgency=medium

  * mantic/linux-raspi: 6.5.0-1008.11 -proposed tracker (LP: #2041533)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync update-dkms-versions helper
    - debian/dkms-versions -- update from kernel-versions (main/2023.10.30)

  * drop all references to is_rust_module.sh in kernels >= 6.5 (LP: #2038611)
    - [Packaging] raspi: drop references to is_rust_module.sh

  * Unnecessary armhf DTB (LP: #2039431)
    - [Config] raspi: Set ARCH_BRCMSTB=n for armhf

  * disable shiftfs (LP: #2038522)
    - [Config] raspi: disable shiftfs

  * openvswitch fails on raspberry pi 4 (LP: #2040524)
    - [Packaging] raspi: Include openvswitch in linux-modules

  * Fan speed control not working on Pi 5 under Ubuntu 23.10 (LP: #2041741)
    - driver: thermal: step_wise: Fix uninitialized variable

  * Raspberry Pi 3B+ doesnt boot from USB on 23.10 Mantic (LP: #2039786)
    - SAUCE: Revert "usb: misc: onboard-hub: add support for Microchip USB2514B
      USB 2.0 hub"

  [ Ubuntu: 6.5.0-14.14 ]

  * mantic/linux: 6.5.0-14.14 -proposed tracker (LP: #2042660)
  * Boot log print hang on screen, no login prompt on Aspeed 2600 rev 52 BMC
    (LP: #2042850)
    - drm/ast: Add BMC virtual connector
  * arm64 atomic issues cause disk corruption (LP: #2042573)
    - locking/atomic: scripts: fix fallback ifdeffery
  * Packaging resync (LP: #1786013)
    - [Packaging] update annotations scripts

  [ Ubuntu: 6.5.0-12.12 ]

  * mantic/linux: 6.5.0-12.12 -proposed tracker (LP: #2041536)
  * Packaging resync (LP: #1786013)
    - [Packaging] update annotations scripts
    - [Packaging] update helper scripts
    - debian/dkms-versions -- update from kernel-versions (main/2023.10.30)
  * CVE-2023-5633
    - drm/vmwgfx: Keep a gem reference to user bos in surfaces
  * CVE-2023-5345
    - fs/smb/client: Reset password pointer to NULL
  * CVE-2023-39189
    - netfilter: nfnetlink_osf: avoid OOB read
  * CVE-2023-4244
    - netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
  * apparmor restricts read access of user namespace mediation sysctls to root
    (LP: #2040194)
    - SAUCE: apparmor: open userns related sysctl so lxc can check if restriction
      are in place
  * AppArmor spams kernel log with assert when auditing (LP: #2040192)
    - SAUCE: apparmor: fix request field from a prompt reply that denies all
      access
  * apparmor notification files verification (LP: #2040250)
    - SAUCE: apparmor: fix notification header size
  * apparmor oops when racing to retrieve a notification (LP: #2040245)
    - SAUCE: apparmor: fix oops when racing to retrieve notification
  * SMC stats: Wrong bucket calculation for payload of exactly 4096 bytes
    (LP: #2039575)
    - net/smc: Fix pos miscalculation in statistics
  * Support mipi camera on Intel Meteor Lake platform (LP: #2031412)
    - SAUCE: iommu: intel-ipu: use IOMMU passthrough mode for Intel IPUs on Meteor
      Lake
    - SAUCE: platform/x86: int3472: Add handshake GPIO function
  * CVE-2023-45898
    - ext4: fix slab-use-after-free in ext4_es_insert_extent(...

Read more...

Changed in linux-raspi (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Dave Jones (waveform) wrote :

Okay, new kernel is released! If you've been using my PPA for the fix in the meantime, just "sudo apt update && sudo apt upgrade" as usual, reboot, and you'll be on the officially fixed kernel. After that you can "sudo add-apt-repository --remove ppa:waveform/fan-fix" and your apt sources will be back to normal.

Revision history for this message
Steve Pringle (stevepringle) wrote : Re: [Bug 2041741] Re: Fan speed control not working on Pi 5 under Ubuntu 23.10

Brilliant,thanks to all those concerned for getting this bug fixed :-)

Regards,

Steve
Sent from my iPad

> On 5 Dec 2023, at 17:40, Dave Jones <email address hidden> wrote:
>
> sudo add-apt-repository --remove ppa:waveform/fan-fix

Revision history for this message
Juerg Haefliger (juergh) wrote :

Note that it doesn't contain the hysteresis fix. That patch landed too late for the cycle.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (49.6 KiB)

This bug was fixed in the package linux-raspi - 6.7.0-1001.1

---------------
linux-raspi (6.7.0-1001.1) noble; urgency=medium

  * noble/linux-raspi: 6.7.0-1001.1 -proposed tracker (LP: #2051136)

  * Packaging resync (LP: #1786013)
    - [Packaging] update Ubuntu.md
    - [Packaging] update update.conf
    - debian/dkms-versions -- update from kernel-versions (main/d2024.01.02)

  * Remove linux-modules-extra (LP: #2048862)
    - [Packaging] raspi: Remove linux-modules-extra package

  * Make dwc2 the default (LP: #2048861)
    - SAUCE: ARM: dts: bcm27xx: Make dwc2 the default

  * Raspberry Pi 3B+ doesnt boot from USB on 23.10 Mantic (LP: #2039786)
    - SAUCE: Revert "usb: misc: onboard-hub: add support for Microchip USB2514B
      USB 2.0 hub"

  * Missing overlays/README (LP: #1954757)
    - SAUCE: (no-up) Install overlays/README

  * [Raspberry Pi/lunar] systemd-oomd fails with
    "ConditionControlGroupController=memory was not met" (LP: #2017209)
    - SAUCE: Revert "cgroup: Disable cgroup "memory" by default"

  * Remove armhf support (LP: #2048864)
    - [Packaging] raspi: Remove armhf packages
    - [Packaging] raspi: Remove armhf ABI files

  * Miscellaneous Ubuntu changes
    - [Packaging] raspi: Initial import of debian.raspi from mantic:linux-raspi
      (6.5.0-1010.13)
    - [Packaging] raspi: Sync packaging files from debian.master
    - [Packaging] raspi: Initial version of linux-raspi for Noble
    - [Config] raspi: updateconfigs after rebase to Ubuntu-6.7.0-2.2
    - [Config] raspi: Include master annotations
    - [Packaging] raspi: Disable all ABI checks
    - SAUCE: Revert "iommu: Retire map/unmap ops"
    - [Packaging] raspi: Import of upstream raspberrypi patchset
    - [Config] raspi: updateconfigs after import of rpi-6.6.y patchset
    - [Config] raspi: Set SWIOTLB_DYNAMIC=n
    - SAUCE: arm64: dts: broadcom: Remove downstream dt overlay support
    - SAUCE: (no-up) ARM: dts: Disable unsupported Raspberry Pi DTBs
    - SAUCE: ARM: dts: Fix broken symlinks
    - SAUCE: ARM: dts: overlays: Fix file permissions
    - [Packaging] raspi: Update reconstruct script

  * Miscellaneous upstream changes
    - raspberrypi-firmware: Update mailbox commands
    - drm/vc4: Add FKMS as an acceptable node for dma ranges.
    - drm/atomic: Don't fixup modes that haven't been reset
    - drm/vc4: Allow setting the TV norm via module parameter
    - drm/vc4: Add firmware-kms mode
    - drm/vc4: Add support for gamma on BCM2711
    - drm/vc4: Add debugfs node that dumps the vc5 gamma PWL entries
    - drm/vc4: hvs: Force modeset on gamma lut change
    - drm/vc4: Relax VEC modeline requirements and add progressive mode support
    - drm/vc4: Make VEC progressive modes readily accessible
    - drm: Check whether the gamma lut has changed before updating
    - drm/vc4: Enable gamma block only when required.
    - drm/vc4: Only add gamma properties once.
    - drm/vc4: Validate the size of the gamma_lut
    - drm/vc4: Disable Gamma control on HVS5 due to issues writing the table
    - drm/dsi: Document the meaning and spec references for MIPI_DSI_MODE_*
    - drm/bridge: tc358762: Ignore EPROBE_DEFER when logging errors
    - vc4/drm: vc...

Changed in linux-raspi (Ubuntu):
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.