Frequent TDRs with GTX 760

Started by Skylark, July 05, 2015, 04:24:53 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Skylark

Hello,

I've been getting pretty frequent TDRs lately. My GPU is a GTX 760 2GB, and I'm currently using driver 353.06 (I tend to stay up to date).

The symptoms are that once in a while, the game crashes with a "driver stopped responding and has been restarted" message, or the game just crashes (with a "debug / close" dialog) or the game/window just disappears. I have had this happen with many games, from Skyrim to Assassin's Creed IV / AC Unity to Witcher 3 to even Minecraft! It seems random, because sometimes it will crash within the first 5 minutes of playing, while other times it will happen after 45 minutes to an hour.

I have left MSI Afterburner's monitoring window in the background while playing (which is helpful because it graphs various data over the last few minutes) and I cannot correlate the TDRs to a spike in GPU utilization, temperature, memory use, voltage or anything like that. At high load, my GPU temp is around 72 degrees C, and the fan speed is at 75% tops.

I've been able to run a few of the MSI Kombustor 3 stress tests for a long time without problems (the GPU memory burner 2GB, the Lakes of Titan 32X, etc.)

One test that gave me TDRs twice is the PhysX GPU one. But it doesn't seem reliable, just happened twice out of maybe 5 attempts, the other 3 times the test ran for 10 minutes with no TDR. That seems pretty consistent with what I'm seeing from games. However, I set PhysX to only use the CPU in the nvidia control panel before, and it didn't have any impact on TDRs at all (in actual games). So I doubt it is PhysX itself that is the culprit.

Can someone suggest something I haven't already tried? Some troubleshooting I may have missed?

This video card is already the result of an RMA (the first GTX 760 I got crashed Windows even when nothing was running!) and it's out of warranty anyways. I am thinking I may borrow a video card from a friend (a known-good one) to see if TDRs go away before going out any buying another one, but if there was another way to confirm that it's my video card (and that changing it will fix these issues) then I'd like to try.

Thanks in advance,

Skylark

Stefan

Looks like a wide-spreaded issue. The official GeForce forum is full of complaints.

Try NVIDIA GeForce Hotfix driver 353.38 or roll back to older drivers; R350.12 / R348.17 aint affected.

JeGX

Depending on the driver version, the app that shows me a lot of TDRs is MadShaders.  As Stefan says, try the latest hotfix R353.38.

Skylark

Thanks a lot for the pointers to the hotfix driver. Trying it now.

I knew there was bound to be some info on the GeForce forums somewhere but honestly, wading though that large number of posts a day is just too much for me :-) Hence my presence here, I've followed Geeks3D for years and knew someone would have good info.

I'll post again with my findings after a while using the hotfix driver. Thanks again.

Skylark

After a few evenings of gaming, I don't think the new drivers have improved things substantially.

I do seem to see the "driver stopped responding and has been restarted" popup (had it 2-3 times), but games still crash randomly (either with the Windows "debug / close" dialog or the window just disappears). So I do not think the cause of my problems was what this hotfix driver tried to solve.

I am about to try a memtest86+ run overnight to see if it might be my RAM. It's a stab in the dark really.

Any other suggestions? I'd really appreciate any suggestions of things to try that I haven't tried already.

nuninho1980

#5
Do you use Display Driver Uninstaller (DDU)?

- uninstall driver with reboot required;
- should choose safe mode and run DDU to select button "clean and reboot";
- start normal mode to re-install driver.

Good luck! ;)

JeGX

NVIDIA has published two hotfix drivers in the same day, the latest being R353.49:

http://www.geeks3d.com/forums/index.php/topic,4090.0.html

Try it...

Skylark

@nuninho1980: Yes I have used DDU in my last few driver installations to make sure everything was clean before installing. Didn't seem to help.
@JegX: I'm installing 353.49 now, will see if it helps.

I'm still not sure it's caused by my graphics card or driver, so I'll also be doing some memtest86+ tests and other stuff. Hope I find the cause soon, it's really annoying.

Stefan

Quote from: JeGX on July 10, 2015, 10:55:43 AM
NVIDIA has published two hotfix drivers in the same day, the latest being R353.49:

Try it...

Tried that with Firefox at www.shadertoy.com
Instead of producing TDRs Firefox takes a little nap and then continues compiling.
I don't know if i should call that an improvement.


I think NVIDIA lost a bit of control as they have to maintain a second kernel driver for Windows 10 now.
Check this Microsoft site

QuoteThe display driver model from Windows 8.1 and Windows Phone have converged into a unified model for Windows 10 Insider Preview. A new memory model is implemented that gives each GPU a per-process virtual address space. Direct addressing of video memory is still supported by WDDMv2 for graphics hardware that requires it, but that is considered a legacy case. IHVs are expected to develop new hardware that supports virtual addressing. Significant changes have been made to the DDI to enable this new memory model.

Skylark

So I've tested all my memory sticks in every combination of a pair of sticks (I have 4). No improvement, always just as random.

I've also tested the newer drivers released (I'm on 353.62 now), again no change.

I think I'll try to swap my video card. It's too old to still be under warranty, so if I need to change it, I really want to be sure that will solve my problems and not just be a waste of money.

Still, if anyone has other troubleshooting steps they can recommend so I can further narrow down the source of the problem, I'd appreciate any suggestions.

Stefan

Personally for me the issue seems to be gone.

I read in some forums (NVIDIA, Guru3D, TPU) that users helped themselves by either changing the power management in the NV control panel or by constantly activating Kepler boost.

Make sure your VBIOS is newer than 80.04.63.00.00.
NVIDIA provides some low-level docs about badly running mobile Keplers.
ftp://download.nvidia.com/open-gpu-doc/gk104-disable-graphics-power-gating/1/gk104-disable-graphics-power-gating.txt
ftp://download.nvidia.com/open-gpu-doc/gk104-disable-underflow-reporting/1/gk104-disable-underflow-reporting.txt

I won't be surprised if some desktop Keplers are affected as well. That's just speculation though.


Another thing: driver branch R355 is around the corner. Hopefully they worked out the issues with that.





Skylark

Quote from: Stefan on August 04, 2015, 08:10:38 AM
I read in some forums (NVIDIA, Guru3D, TPU) that users helped themselves by either changing the power management in the NV control panel or by constantly activating Kepler boost.

Can you give me more information on this? I haven't seen any power management settings in the Nvidia Control Panel.

Quote from: Stefan on August 04, 2015, 08:10:38 AM
Make sure your VBIOS is newer than 80.04.63.00.00.

GPU-Z says I have 80.04.BF.00.30 (P2004-0010)

I guess that's newer. Nevertheless, how would I find if there's a newer one available and update it?

Thanks!

Stefan

#12
In 3D settings you can switch between adaptive and maximum power.
Kepler boost must be activated by your vendor's tweak tool - MSI Afterburner, EVGA Precision etc.

Getting BIOS updates depends on the manufacturer, e.g. EVGA has an e-mail address for that. MSI, Gigabyte and ASUS have them on their driver site.
There are also BIOS collections on 3rd party websites. But you never know if/how these BIOS are modified.


Skylark

Thanks for the info. I've set the power management mode to prefer maximum performance, will see how it affects things.

Stefan

R355.60 looks pretty stable.
Select "clean install" at installation to rebuild shader caches, they seem to change from branch to branch.