The most prominent part of XBox's architecture is it use of a SIMD architectured chipset. SIMD stands for Single-Instruction Stream Multiple-Data Stream. This architecture is essential in the parallel world of computers. Its ability to manipulate large vectors and matrices in minimal time has created a phenomenal demand in such areas as weather data and cancer radiation research. Below is a generic model of a vector architecture:



The architecture consists of a set of identical processing elements (PEs), capable of performing the same operation on different data sets simultaneously. The SIMD allows for a faster and better way to visualize graphics, rendering them almost flawlessly as well as creating them faster. Both these properties make it ideal for use with XBox ... a medium through which gamers can experience the best, faster. SIMD architecture is two fold; True SIMD and Pipelined SIMD:
True SIMD:
Both types of true SIMD architecture organizations differ only in connection of memory models, M, to the arithmetic units, D. From above, the D, or arithmetic units, are called the processing elements (PEs). In distributed memory, each memory model is uniquely associated with a particular arithmetic unit. The synchronized PE's are controlled by one control unit. Each PE is basically an arithmetic logic unit with attached working registers and local memories for storage of distributed data. The CU decodes the instructions and determines where they should be executed. The scalar or control type of instructions are executed in CU whereas the vector instructions are broadcast to PE's. In shared memory SIMD machines, t he local memories attached to PE's are replaced by memory modules shared by all PE's through an alignment network. This configuration allows the individual PE's to share their memory without accessing the CU.

A graphical representation of a True, Distributed Memory SIMD architecture:



Pipelined SIMD:
The pipelined SIMD consists of pipelined arithmetic units with shared memory. Pipelining of arithmetic operations divides one operation into many smaller functions and executes these sub functions in parallel on different data. Pipelining keeps the parallel activity high and reduces the hardware requirement significantly. The pipelined SIMD computers can be distinguished according to number and types of pipelines, processor/memory interaction and implementation arithmetic.



According to an Intel press release, the maufacturer of the XBox CPU processor, "In addition to higher performance, Intel® Pentium® III and Pentium® III Xeon™ processors bring application developers a powerful set of tools: Streaming SIMD (Single Instruction Multiple Data) Extensions. Combined with higher clock rates, generous cache memory, and a high-speed system bus, Streaming SIMD Extensions make the Pentium III Xeon processor, in particular, a powerful workstation platform. Streaming SIMD Extensions enhance performance by increasing the data and computational throughput of the Pentium III Xeon processor. They do this in two ways: SIMD operations can process four floating-point or integer values in a single instruction, and streaming memory instructions can be used to avoid processor stalls by optimizing cache memory utilization. The net result is better performance—in some instances up to twice as fast.

Users don't just want faster workstations, they also want to improve the quality of their work. With Streaming SIMD Extensions, application providers aren't limited to merely creating faster applications—they also have the opportunity to craft richer, more effective tools. Higher performance permits more accurate simulations and realistic animations, the modeling of more sophisticated and complex systems, and the chance to analyze data more completely. In short, like any good tool, Streaming SIMD Extensions remove constraints and enable innovation. Like Intel® MMX™ technology before it, Streaming SIMD Extensions will free users to improve their work in ways they once could only imagine.

While new uses for Streaming SIMD Extensions are found constantly, a number of application areas have already demonstrated potential for compelling performance improvements, including:
It is also important to note that the new extensions accelerate the 3D geometry pipeline by nearly 2x that of the previous-generation processor while enabling new applications, such as real-time MPEG-2 encode. The Pentium III processor implementations achieved the desired goal at a modest 10% increase in die size.



The next most important hardware component of the XBox is the graphics chips provided by NVidia Corporation. These chips allow the developers to create life like graphics and to utilize many of the intense features of PC graphics capabilities. The main graphics processing unit, or XGPU, developed by NVidia are actually a member of the newly introduced NVidia GeForce 3 graphics card for PCs and Macintoshs. For starters, the XGPU is built using a 0.13 micron process, which is more advanced than the GF 3 with its 0.15 micron architecture. The chip itself packs over 57 million transistors, nearly 20 million more than Intel's latest line of Pentium IV processors. Since the architecture uses a 0.13 micron process, it allows for better chip stability because the core clock can be set higher, allowing lower power consumption.

The graphics chip provides a nfiniteFX engine, which includes programmable vertex and pixel shading capabilities. These capabilities really allow game developers a chance to go wild with their creativity to achieve their goal of transplanting their visions into a digital world. One of the reasons that developers are so excited about the programmable pixel and vertex shaders is that it allows for a much more realistic look overall while incurring a negligible performance hit; an enhancement that will be openly welcomed by all gamers. The CPU and XGPU share the same chunk of 64MB of 200MHz double-data rate (DDR) memory. That memory is twice as fast as memory normally used in PCs and there's no bus between the graphics chip and CPU, which will yield performance well beyond the capabilities of the PC. The XGPU will be able to animate 125 million polygons per second. In contrast, NVidia's current top chip, the GeForce 2 Ultra, maxes out at 31 million polygons per second. Since the Xbox has a fixed memory subsystem, NVIDIA equipped the XGPU with an astounding 6.4GB/s of bandwidth

How the XGPU Works:

Geometry processing is implemented as a pipeline. In the case of the XGPU there are two separate pipeline stages: VertexShader and TriangleSetup.

VertexShader:
VertexShader is able to perform "2 dot product operations per clock cycle" and "4 or even six instructions per clock". How it is correlated with perfomance of current chips will be shown below. The declared figures of productivity show us a deviation from Windows' DirectX8 ideology when it is declared that any operation in Vertex/Pixel Shader is executed in one clock cycle.

TriangleSetup:
TriangleSetup is capable to process one triangle per 2 clock cycles, it means we get 125 millions triangles per second at 250 Mhz core frequency. This is the value XBox applications cannot jump over. If we compare this value to the performance of current chips providing triangle setup for approximately 30 million triangles we will notice that XGPU T&L gives huge perfomance increase.

The basic architecture used in the XBox looks like the following:



The performance gain of having a system where memory can be accessed without a go-between bus is unmatched. The Xbox will allow for animation of up to 125 million polygons per second and a raw fill rate of 1 gigapixels per second, because it has a 250-MHz chip with four data pipelines. The GeForce 3 core has a new feature to detect occluded stuff, anything in the background that's blocked by something in the foreground, and doesn't draw the occluded stuff. The true power behind the Xbox lies not in the brute computational force of its processors, but in the successful combination of technology, innovation and creativity.