your Windows® embedded community
In fact, as we review later in this story, Nvidia announced two Fermi-based parallel processors for the high-performance computing market, the Tesla C2070 and C2050. We presume that these specialized products, which do not yet appear to be shipping, flew below the Journal's radar.
Tomorrow's announcement most likely involves the formal debut of the long-delayed GF100, a 40nm implementation of the Fermi architecture. According to various reports, the DirectX 11-capable GF100 will be built into two new GPUs, the GeForce GTX 470 and GTX 480. It's said OEMs such as Asus, MSI, and Zotac will ship graphics cards based on these GPUs next month.
ATI released DirectX 11-compatible GPUs in its Radeon product line as long ago as September, garnering a technological advantage. However, manufacturing delays forced both ATI and Nvidia to keep selling last-generation products, giving Nvidia something of a reprieve, according to Dicolo. In the fourth quarter of 2009, Nvidia increased its position in the desktop GPU market, raising its share to 64.8 percent from 62.1 percent a year ago, Jon Peddie Research is quoted as saying.
Background on Fermi
Fermi, announced last October at Nvidia's inaugural GPU Technology Conference in San Jose, amounts to a third generation of products embodying the company's "GPU computing" model. The first generation was the G80 unified graphics/computing architecture, introduced in November 2006 and later embodied in the GeForce 8800, Quadro FX 6500, and Tesla C870 GPU products.
The G80 was the first GPU to replace separate vertex and pixel pipelines with a single unified processor, the first to utilize a scalar thread processor, and the first to support C, according to the company.
The second generation was the GT200, introduced in the GeForce GTX 280, Quadro FX 5800, and Tesla T10 GPUs. GT200 increased the number of streaming processor cores -- subsequently referred to as "Cuda" cores -- from 128 to 240. It also added "hardware memory access coalescing," improving memory access efficiency, along with double-precision floating point support, Nvidia says.

Fermi, implemented in a GPU containing more than three billion transistors, more than doubles the number of Cuda cores, organizing them into 16 SMs (streaming multiprocessors) with 32 cores apiece. Sporting up to 6GB of GDDR5 RAM, Fermi is the first product of its type to support ECC (error correcting code), the company says.
In October, Nvidia cited the following additional features for Fermi:
According to Nvidia, Fermi-based products are so powerful that they can now be termed CGPUs (computational graphics processing units), and are suitable for high-performance computing (HPC) applications such as linear algebra, numerical simulation, and quantum chemistry.
At Nvidia's GPU Technology Conference, Oak Ridge National Laboratory (ORNL) announced plans to build a new supercomputer that will employ the Fermi architecture, and also announced it will be creating the Hybrid Multicore Consortium, whose goals "are to work with the developers of major scientific codes to prepare those applications to run on the next generation of supercomputers built using GPUs."
According to Nvidia, Fermi is the first product of its type that supports C++, complementing existing support for C, Fortran, Java, Python, OpenCL and DirectCompute. Fermi also supports Nexus (below), touted as "the world's first fully integrated heterogeneous computing application development environment within Microsoft Visual Studio."

Nvidia's Nexus
(Click to enlarge)
Source: Nvidia
Nvidia's C2070 and C2050
Nvidia's C2070 and C2050 are PCI Express x16 cards touted as "transforming a [Windows or Linux] workstation to perform like a small cluster," with up to 640 Gigaflops of performance. They employ the Fermi GPUs to run C, C++, OpenCL, DirectCompute, or Fortran while a workstation's CPU performs other tasks, according to the company.

The C2070 and C2050 are PCI Express Gen2 cards that occupy two slots in a workstation, and include either 3GB or 6GB of onboard GDDR5 memory, respectively. Nvidia claims the cards' onboard GPUs offer performance that's equivalent to the latest quad-core CPUs, but with 1/20th the power consumption and 1/10th the cost.
According to Nvidia, the C2070 and C2050 offer from 520 to 640 Gigaflops of double precision performance, allowing applications such as ray tracing, 3D cloud computing, video encoding, database search, data analytics, computer-aided engineering, and virus scanning to be performed "dramatically faster." Four of the boards may be placed into a 1U enclosure that quadruples performance for data center deployments, the company adds.
Nvidia says the boards support the next-generation IEEE 754-2008 double-precision floating point standard. Providing ECC (error correction code) memory for their DRAM, shared memory, L1/L2 caches, and shared memory, they support PCI Express 2.0, for "fast and high-bandwidth communication between CPU and GPU," the company adds.
Specifications provided for the Tesla C2070 and C2050 by Nvidia include:
Availability
According to Nvidia, the Tesla C2070 and C2050 will be available during the second quarter, retailing for approximately $4,000 and $2,500, respectively. More information on the boards may be found on the company's website, here.
More information on Nexus may be found on the Nvidia website, here. Meanwhile, overall background on Fermi, including a downloadable white paper, may be found here.
Jerry Dicolo's The Wall Street Journal article on tomorrow's product launches from Nvidia may be found here.