Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Stefan

Pages: 1 ... 3 4 [5] 6 7 ... 140
3D-Tech News Around The Web / (WebGL) Ô Green by SPECIAL.T
« on: March 08, 2014, 03:15:29 PM »
Now you can dive right into the magical 3D world of Ô Green, the new Limited Edition  green tea by SPECIAL.T.

Usually i ignore adverts, but cucumber flavored tea is bizarre enough to post it here  :P

Depending on your device or browser you see either a school of fishes or mashed geometry.

SiSoftware Sandra 2014 Released:
Updated Device Performance Certification, New Benchmarks, Windows 8.1 support
Updated February 17th 2014 : SP1a released nV CUDA 5.x devices - aka "Maxwell".

New CPU Scientific Analysis benchmark
3 algorithms, 2 precision (FP32/FP64), 3 instruction sets

New GP / HC (GPU/APU/CPU) Scientific Analysis benchmark
3 algorithms, 2 precision (FP32/FP64), 2 interfaces

Updated GP / HC (GPU/APU/CPU) Financial Analysis benchmark
3 models, 2 precision (FP32/FP64), 2 interfaces

Quote from: Device query.exe
devicequery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 750 Ti"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Clock rate:                                1268 MHz (1.27 GHz)
  Memory Clock rate:                             2700 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 750 Ti
Result = PASS

Quote from: NBody.exe
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure perfomance.

> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GTX 750 Ti" with compute capability 5.0

> Compute 5.0 CUDA device: [GeForce GTX 750 Ti]
5120 bodies, total time for 10 iterations: 8.208 ms
= 31.936 billion interactions per second
= 638.723 single-precision GFLOP/s at 20 flops per interaction

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure perfomance.
   -fp64             (use double precision floating point values for simulation)

> Double precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GTX 750 Ti" with compute capability 5.0

> Compute 5.0 CUDA device: [GeForce GTX 750 Ti]
5120 bodies, total time for 10 iterations: 220.679 ms
= 1.188 billion interactions per second
= 35.637 double-precision GFLOP/s at 30 flops per interaction

CUDA 6, Available as Free Download, Makes Parallel Programming Easier, Faster

We’re always striving to make parallel programming better, faster and easier for developers creating next-gen scientific, engineering, enterprise and other applications.

With the latest release of the CUDA parallel programming model, we’ve made improvements in all these areas.

Available now to all developers on the CUDA website, the CUDA 6 Release Candidate is packed with several new features that are sure to please developers.

A few highlights:

    Unified Memory – This major new feature lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other. This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications.
    Drop-in Libraries – Want to instantly accelerate your application by up to 8X? The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent.
    Multi-GPU Scaling – Re-designed BLAS and FFT GPU libraries automatically scale performance across up to eight GPUs in a single node. This provides over nine teraflops of double-precision performance per node, supporting larger workloads than ever before (up to 512GB).

And there’s more.

The following are known issues with the CUDA 6.0 Release Candidate that will be resolved in the production release:
‣ The minBlocksPerMultiprocessor parameter for the launch_bounds() qualifier only accepts values up to 16 when used in compiling for sm_50, even
though values up to 32 are possible on that architecture.
‣ There is a performance issue with the new SIMD video intrinsics __v*2() and __v*4() when used in compiling for the sm_50 architecture.
‣ The sm_50 architecture supports 48 KB of shared memory per block; however, the check for this limit is not functioning properly in the compiler. This can allow
programs that use more than 48 KB of shared memory per block to compile successfully, although they will fail to run because the driver component does check
the limit properly.
‣ The MT19937 random number generator in the cuRAND library generates non-deterministic results for curandGenerateUniformDouble().
‣ The NPP library function nppiAlphaComp_8u_AC4R() generates incorrect results when used with the NPPI_OP_ALPHA_ATOP_PREMUL option.
‣ The NPP library functions FilterSobelHorizSecondBorder() and FilterSobelVertSecondBorder() may generate incorrect results.

Thx to GTX 750 TI i don't need 10 GB RAM. However NVIDIA Optix is not yet Maxwell compatible  :P

General Discussion / Re: EVGA GeForce GTX 750 Ti FTW hands-on review
« on: March 05, 2014, 05:15:21 PM »
Stefan, have you tried CUDAMiner?

No, i don't believe in cryptocurrencies.
However, the developer might share his CUDA optimisation tricks in the NVIDIA forum mentioned above.

mobile Maxwell heads-up :
A happy user at Notebook review introduces his GTX 860M powered rig.

Attention - GTX 860M comes in 2 flavors:
Kepler - NVIDIA_DEV.119A = "NVIDIA GeForce GTX 860M"
Maxwell - NVIDIA_DEV.1392 = "NVIDIA GeForce GTX 860M"

3D-Tech News Around The Web / Introducing NVIDIA GameWorks
« on: March 04, 2014, 07:23:42 PM »
NVIDIA GameWorks™ pushes the limits of gaming by providing a more interactive and cinematic game experience and thus enabling next gen gaming for current games. We provide technologies e.g. PhysX and VisualFX, which are easy to integrate into games as well as tutorials and tools to quickly generate game content. In addition we also provide tools to debug, profile and optimize your code.

Read more

General Discussion / Re: EVGA GeForce GTX 750 Ti FTW hands-on review
« on: March 04, 2014, 06:50:17 PM »
As soon as the new GpuTest will be online, try the new fp32/fp64 OpenGL test...

...or something wrong with recent nv drivers.

I made some tests in 1280x720 windowed mode.

Julia fp64 - 31 fps
Julia fp32 - 395 fps

Furmark and Volplosion performed at exactly 60 FPS.
At first i thought it was a weird Vsync bug, but it's just coincidence.

Regarding NVIDIA's driver: i can't quit any program in fullscreen mode without freezing the desktop with R334 drivers.
Fooling around with TDR didn't help.
Judging from their forum a lot of people have issues with recent drivers.

Little heads-up from F@H:

"I have high hopes for Maxwell. We'll be optimizing OpenMM for it."

Vijay Pande
Director, Folding@home Distributed Computing Project

Another heads-up from Arion renderer
We’re adapting the Arion Core so it runs on CUDA 6 Release Candidate, which was released some days ago. Our setups carry the run-time CUDA .dll, but your drivers may need to be upgraded when Arion for 3ds Max v2.7.0 is released.
Update: Fortunately, this time upgrading the build system from CUDA 5 to CUDA 6 has been easy!
Update: I spoke too fast. CUDA 6 has broken instancing… Been working on that for the past 8 hours…

EDIT: Arion 2.5 built against CUDA 5.0 takes 7:28 min, i'll recheck with Arion 2.7 when it's available.

Beta Intel® Iris™ and HD Graphics Beta Driver for Windows* 7/8/8.1 for TITANFALL* and Thief*

In an effort to keep Intel HD graphics compatible with
the latest games and applications, Intel will occasionally post a “Beta” driver
for user feedback on compatibility and performance. This beta driver provides
benefit for users playing Titan Fall and Thief Games.  We strive for
the best possible experience for users of Intel HD graphics and will greatly
appreciate you feedback for these beta drivers. Download the 32-bit or 64-bit
beta drivers to play Titan Fall and Thief 2 Games.

Link: Iris™ and HD Graphics Driver for Windows* 7/8/8.1

    Added support for the following GPUs:
        GeForce GTX 750 Ti
        GeForce GTX 750
        GeForce GTX 745
        GeForce GTX TITAN Black
    Fixed a regression in the NVIDIA kernel module which caused it to improperly dereference a userspace pointer. This potential security issue was initially reported to the public at:
    The regression did not affect NVIDIA GPU drivers before release 334.
    Fixed a bug that could cause OpenGL programs to hang after calling fork(2).
    Added support for GPUs with VDPAU Feature Set E. See the README for details.
    On GPUs with VDPAU Feature Set E, VDPAU now supports more robust decode error handling at the cost of a minor performance impact.
    This can be disabled by setting the VDPAU_NVIDIA_DISABLE_ERROR_CONCEALMENT environment variable to 1.
    Added support for application profile rule patterns which are logical operations of subpatterns. See the README for details.
    Added support for a "findfile" application profile feature which allows the driver to apply profiles based on matching files in the same directory as the process executable. See the README for details.
    Improved performance of OpenGL applications when used in conjunction with the X driver's composition pipeline. The composition pipeline may be explicitly enabled by using the ForceCompositionPipeline or ForceFullCompositionPipeline MetaMode options, or implicitly enabled when certain features such as some XRandR transformations, rotation,Warp & Blend, PRIME, and NvFBC are used.
    Fixed a bug that could cause nvidia-settings to compute incorrect gamma ramps when adjusting the color correction sliders.
    Updated the nvidia-settings control panel to allow the selection of display devices using RandR and target ID names when makingqueries targeted towards specific display devices.
    Fixed a bug that prevented some dropdown menus in the nvidia-settings control panel from working correctly on olderversions of GTK+ (e.g. 2.10.x).
    Updated the nvidia-settings control panel to provide help text for application profile keys and suggestions for valid key nameswhen configuring application profiles.
    Updated the nvidia-settings control panel to populate the dropdown menu of stereo modes with only those modes which are available.
    Fixed a bug that could cause applications using the OpenGL extension ARB_query_buffer_object to crash under Xinerama.
    Fixed a bug that caused high pixelclock HDMI modes (e.g. as used with 4K resolutions) to be erroneously reported as dual-link inthe nvidia-settings control panel.
    Fixed a bug that prevented some DisplayPort 1.2 displays from being properly restored after a VT switch.
    Renamed per GPU proc directories in /proc/driver/nvidia/gpus/ with GPU's bus location represented in "domain:bus:device.function" format.
    Added 64-bit EGL and OpenGL ES libraries to 64-bit driver packages.
    Changed format of "Bus Location" field reported in the
    /proc/driver/nvidia/gpus/0..N/information files from "domain:bus.device.function" to "domain:bus:device.function" to matchthe lspci format.
    Fixed a bug in the GLX_EXT_buffer_age extension where incorrect ages would be returned unless triple buffering was enabled.
    Changed the driver's default behavior to stop deleting RandR 1.2 outputs corresponding to unused DisplayPort 1.2 devices. Deleting these outputs can confuse some applications. Added a new option,DeleteUnusedDP12Displays, which can be used to turn this behavior back on.This option can be enabled by running sudo nvidia-xconfig --delete-unused-dp12-displays
    Improved support for the __GL_SYNC_DISPLAY_DEVICE and VDPAU_NVIDIA_SYNC_DISPLAY_DEVICE environment variables in certain configurations. Both environment variables will now recognize all supported display device names. See "Appendix C. Display Device Names" and "Appendix G. VDPAU Support" in the README for more details.
    Improved performance of the X driver when handling large numbers of surface allocations.

NVIDIA Driver Downloads

BlenderArtist Rolf compiled Blender with CUDA 6.0 RC SDK

Blender 2.69.11 r61310 Hash eb4f2b4
Added support for the new Maxwell architecture.
Compile with VS2008, Scons
Has Cuda Kernels 2.0, 2.1, 3.0, 3.5, 5.0
i'm unable to build kernels for 1.0, 1.1, 1.2, 1.3, sry

Reminder: you must enable  GPU Rendering  manually in Blender as seen in screenshot below.

Barcelona Pavillon benchmark takes 2:27 minutes, slightly faster than Rolf thx to factory overclocking

Broadcom is releasing the full source of the OpenGL ES 1.1 and 2.0 driver stack for the Broadcom VideoCore® IV 3D graphics subsystem used in the BCM21553 3G integrated baseband SoC. VideoCore IV is used in many Broadcom products, including the BCM2835 application processor, which runs the popular Raspberry Pi microcomputer.


3D-Tech News Around The Web / OpenGL Extensions Viewer 4.15 released
« on: February 28, 2014, 05:54:14 PM »
You can download the latest version of OpenGL Extensions Viewer. :

Release 4.15 2014-02-28
No changelog yet.
Reminder: unselect forward context to display OpenGL 4.4 infos

3D-Tech News Around The Web / NVIDIA GeForce Linecards 2014
« on: February 28, 2014, 05:15:08 PM »

New in Release 331.01

Graphics driver updated for Mac OS X Mavericks 10.9.2 (13C64)
Graphics driver updated for Mac OS X Mountain Lion 10.8.5 (12F45). 
Contains performance improvements and bug fixes for a wide range of applications.
Includes NVIDIA Driver Manager preference pane.

libclh.dylib supports:
gt206 gt200 g98 g96 g94 g92 g86 g84 g80 cudaMoneyInTheBananaStand gf119 gf117 gf108 gf106 gf116 gf104 gf114 gf110 gf100 gk208 gk110 gk107 gk106 gk104


General Discussion / Re: EVGA GeForce GTX 750 Ti FTW hands-on review
« on: February 27, 2014, 05:51:34 PM »
It's all about thermal power draw and compute power.
Study the article  5 Things You Should Know About the New Maxwell GPU Architecture and the topic in NVIDIA's developer forum  So what's new about Maxwell? where the people are pretty excited.

A friend of mine will be getting 10 (Maxwells). His first mining farm.
After recompiling my image processing codes, the instruction count reduced by 12% and kernel time by 22% !
Integer logical performance indeed got a huge boost in Maxwell (compared to Kepler). GTX750 can push almost similar SHA-1 digesting performance as the GK110-based GTX780 (!) I'm really impressed, especially if we compare Maxwell against GK10x.

Announcing Intel® Graphics Performance Analyzers 2014 R1

Intel® Graphics Performance Analyzers (Intel® GPA) is a powerful, agile developer tool suite for analyzing and optimizing games, media, and other graphics-intensive applications. The product supports applications intended for the Windows* OS platforms or Intel® Atom™ based phones running the Android* OS. The toolset is a free download from the Intel GPA Home Page.

General Discussion / EVGA GeForce GTX 750 Ti FTW hands-on review
« on: February 26, 2014, 06:55:18 PM »
Just upgraded my old rig with EVGA GeForce GTX 750 Ti FTW w/ EVGA ACX Cooling

I don't waste your time by repeating benchmarks already shown on major news sites.

Fan-speed detection in Geeks3D programs needs to be updated.
Maxwell stands FurMark easily, only 55°C with 8xMSAA

AMD LEO, FluidMark, TessMark, NVIDIA Alien vs. Triangles, NVIDIA Endless City produce only max. 46°C

GPU-Z always reads DirectX 11.1, even under Windows XP.

NVFLASH support for GM107 begins with v5.142

If you like to have tested some CUDA accelerated apps, let me know.
I suggest to compile them against CUDA 6.0 RC SDK to exploit Maxwell's capabilities.

3D-Tech News Around The Web / Re: NVIDIA GeForce driver 335.04 certified
« on: February 25, 2014, 10:01:15 PM »
"This has one critical bug fix. Otherwise the same as 334.89 driver."


Pages: 1 ... 3 4 [5] 6 7 ... 140