Interesting case regarding temperature limit and FurMark

Started by RavenRidge, June 06, 2019, 01:27:33 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

RavenRidge

Summary:
During FurMark stress test, temprature limit is not effective.
After running FurMark stress test, temperature limit might remain ineffective in subsequent gaming or GPU intensive applications.

Environment:
FurMark 1.20.4
Windows 10 LTSC 2019 amd64

Ryzen 5 2600x
RTX2060 (Colorful)

Steps to reproduce: (Not 100% reproducible)

  • Boot Windows
  • Set temp limit = 65 degrees C via MSI AB or NvidiaInspector

  • Play some game and notice that temp limit does work, i.e. Voltage and Clock are being actively throttled back once "Perfcap reason" turns to "Thrm" in GPU-Z

  • Launch FurMark stress test with the following setting: 1024x768 windowed mode, 0xMSAA

  • End FurMark stress test;

  • Play some game and notice that temp limit does not work anymore, i.e. Voltage and Clock are not being actively throttled back once "Perfcap reason" turns to "Thrm" in GPU-Z. Temp would rise all the way up to 67C~68C with no throttling observed.(normally it would throttle back to ~1650MHz@0.85V)

  • Try re-applying 65C temp limit in MSI AB, it still does not work. Only a reboot would fix it.

Notes:
1. Please notice that "temp limit" does not work means that Voltage and Clock are not being actively throttled back once the temprature hit 65C which can be easily observed via various hardware monitoring tool.

2. It is natual to suspect furmark since I also observed that furmark itself does not respect temp limit during stress test as if it has overriden any previously set temp limit, so it might be the case that it fails to restore temp limit and somehow borked the GPU driver temp limit algorithm.

3. After FurMark stress test and the temp limit fails, there are different symptoms also. Most of the time it's as described above, "Perfcap Reason" will include "Thrm" but no actual throttling. Some other time Perfcap Reason will not include "Thrm" at all even if temp has already reached 68C.

JeGX

I did a quick test and set the temperature target to 66°C. I let the power target linked and it has been automatically set to 73 % TDP. I ran FurMark for 5min:

FurMark + MSI Afterburner + GeForce RTX 2070

Looks like NVIDIA driver properly limits the GPU temp.

FurMark does not set nor tweak nor play with driver settings. Never.  The only interaction with the driver is the reading of some sensors like GPU temp or usage. FurMark never updates drivers settings.  I currently have no explanation about your issue. FurMark is a 32-bit app. Did you notice this issue with another 32-bit app? Did you test on a non-LTSC Win10? Did you set the temperature target only or was-it also linked with the power target? I did a test with temp target=66°C and power target=100%TDP and in that case, the GPU temp during FurMark stress test has widely exceeded 66°C. FurMark does not care about temperature target or power target, it's the job of the driver to throttle back the GPU to respect user's settings.

Hope that helps  ;) 

RavenRidge

Quote from: JeGX on June 06, 2019, 03:07:00 PM
FurMark does not set nor tweak nor play with driver settings. Never. ..... I did a test with temp target=66°C and power target=100%TDP and in that case, the GPU temp during FurMark stress test has widely exceeded 66°C.
Thanks for the quick response! If this is the case, then it's highly likely that there is some sort of problem with either the NVIDIA card firmware or the NVIDIA driver. Due to this problem temperature limit would be ineffective during particular workloads, such as when the workload is cleverly designed to max out TDP.

This is very likely since I've observed other quirks with NVIDIA temp limits. For example, a negative clock offset will decrease the "responsiveness" of temp throttling. A negative clock offset exceeding minus 200MHz would practically render the temp limit ineffective. So it does seem that NVIDIA firmware or driver is not very solid regarding temp limit operation.

Also I've found this thread: https://forums.guru3d.com/threads/temp-limit-not-a-limit.407793/. I am pretty sure that this threads's OP has experienced the same problem. (He refers to "Furmark actually has its own settings " which I believe is the temprature alarm feature, not a limit.)

Thanks for taking time to look into this issue and I will modify the thread title to reflect the possible case.

Quote from: JeGX on June 06, 2019, 03:07:00 PM
...... temperature target to 66°C. I let the power target linked and it has been automatically set to 73 % TDP.Looks like NVIDIA driver properly limits the GPU temp.
If you are still interested in this case, may be you can try monitoring the "perfcap reason" with MSI AB Monitoring page or GPU-Z, to see if this 66 degrees C maximun value is really the effect of the temp limit, or just the 2070 firmware is cleverly designed such that when running at 73% TDP, sustained temprature would be difficult to raise above 66 deg C.

The temperature limit is very easily observable. It can be observed that as soon as "Perfcap reason" turns to "Thrm" (Or temp limit turns to 1 in MSI AB), clock speed and voltage would keep dropping until temprature drops below the target value.

Thank you very much for taking time looking into this thread! I might ask related questions in the NVIDIA support forum if I have time.

QuoteDid you notice this issue with another 32-bit app? Did you test on a non-LTSC Win10? Did you set the temperature target only or was-it also linked with the power target?
I've tested with both linked and unlinked and in both cases, during FurMark burn-in the temperature limit would not engage at all. Could 32-bit be a culprit? I'll test with Windows 7 and see if result is different.

The following screenshots are captured in a RDP session (Furmark or GTA5 is running in local session.)
Ineffective temp limit with Furmark:
Notice that temp limit does not engage at all. All the throttling is because of power limit.


Effective temp limit with GTA5:
Notice the clear pattern between temp limit and clock/voltage.

indefix123

Just registered to the forum to say that I had a similar problem with a GTX 560TI also from MSI in 32 bit win7. I was stress testing the card and it never throttled down or crashed the PC during the test. The card now produces screen flickering  >:(