Nvidia Preparing Itself For Fermi, Releases CUDA Toolkit 3.0 Beta

Nvidia releases a preview of it's toolkit update, sheds some light into new functionality.

The CUDA Toolkit 3.0 Beta is now available to GPU Computing registered
developers.

Highlights for this release include:

* CUDA Driver / Runtime Buffer Interoperability, which allows
applications using the CUDA Driver API to also use libraries
implemented using the CUDA C Runtime.

* A new, separate version of the CUDA C Runtime (CUDART) for debugging
in emulation-mode.

* C++ Class Inheritance and Template Inheritance support for increased
programmer productivity

* A new unified interoperability API for Direct3D and OpenGL, with
support for:
* OpenGL texture interop
* Direct3D 11 interop support

* cuda-gdb hardware debugging support for applications that use the CUDA
Driver API

* New CUDA Memory Checker reports misalignment and out of bounds errors,
available as a debugging mode within cuda-gdb and also as a
stand-alone utility.

* CUDA Toolkit libraries are now versioned, enabling applications to
require a specific version, support multiple versions explicitly, etc.

* CUDA C/C++ kernels are now compiled to standard ELF format

* Support for all the OpenCL features in the latest R195.39 beta driver:
* Double Precision
* OpenGL Interoperability, for interactive high performance
visualization
* Query for Compute Capability, so you can target optimizations for
GPU architectures (cl_nv_device_attribute_query)
* Ability to control compiler optimization settings, etc. via support
for NVIDIA Compiler Flags (cl_nv_compiler_options)
* OpenCL Images support, for better/faster image filtering
* 32-bit Atomics for fast, convenient data manipulation
* Byte Addressable Stores, for faster video/image processing and
compression algorithms
* Support for the latest OpenCL spec revision 48 and latest official
Khronos OpenCL headers as of 11/1/2009

* Early support for the Fermi architecture, including:
* Native 64-bit GPU support
* Multiple Copy Engine support
* ECC reporting
* Concurrent Kernel Execution
* Fermi HW debugging support in cuda-gdb

Well, they should have some samples up and running just fine or software wouldn't be this advanced two whole months(at least) before the release. Despite faked "Fermi" boards, the actual goal of having a very fast HPC accelerator seems well underway.
Since I'm not a C++ developer, and I can link C kernels to C++ code either way, the most exciting features are definitely the ones provided by Fermi. If you care for PhysX, you should also be curious about the performance benefits of Concurrent Kernel Execution, as I believe this will bring performance gains with the new chip. How? The current cards can't process more than a kernel at a time and PhysX is just that, a kernel that is processing on the GPU and can't be run at the same time as graphic duties. If you aren't using a separate card, you'll likely notice some drop in graphics performance, one bigger than accountable for the extra processing load, one also accountable for context switching - an overhead that has also seen improvement with "Fermi", it's 20x faster.

Moore's law states that transistor count that can be placed inexpensively on a chip doubles every two years. Lately it's every 18 months, as GPU manufacturers have come to prove, and which was less than that at the start of this decade.
With that in mind, I'll tell you the 7800GTX was released in June 2005, the 8800GTX in November 2006 and the GTX 280 in... wait for it... June 2008. Eighteen months in between with a 8800GTX a month sooner and the GTX 280 a month later - not a bad execution. Such a common schedulle would put "Fermi" in 1/2010, a date that is still manageable by all means. Being the very advanced HPC multicore processor that it is, it's not something that can be manufactured sooner than what has been the norm for the last 4 years. Nvidia's problem right now is that AMD managed to undercut them by three months - at least.
While the 5870 missed the optimal die size for a $300 GPU that the 4870 set before it, the lack of competition from Nvidia made it very appealing and, despite what I might think about the hastily executed, memory bandwidth starved card, it's still a the fastest single GPU card around and one that didn't brought more serious consequences for being the first DX11 GPU to be sold.

Hastiness implied some sloppiness during the design of the new card, but it certainly is paying off for AMD. If AMD hadn't released the new cards, I would've said Nvidia had everything on track.

1 comment:

Anonymous said...: "Nvidia's problem right now is that AMD managed to undercut them by three months - at least."

BUT who cares about that now....
AMD/ATI cant even be bothered to release their UVD ASIC data sheets and related documentaion.

were as Nvidi go from strength to strength with their CUDA ASIC....

Bridgeman the OSS AMD/ATI Executive said elswere on 10-29-2009, 09:41 PM "One more time, the open source graphics plan does *not* include UVD programming information. This is *not* a "delivery problem".
"; November 6, 2009 at 9:15 PM

The Bit Speek

Graphics Cards

Nvidia Preparing Itself For Fermi, Releases CUDA Toolkit 3.0 Beta

1 comment:

Post a Comment

Popular Posts

Sponsors

Categories