Netburst
Architecture
Exe. Time = IC * CPI * CCT
There is
not much hardware can do about the Instruction counts. That depends on the
compilers. Intel's Pentium 4 processor uses several different techniques
to
improve CPI and CCT.
Intel calls the new architecture of Pentium 4
'NetBurst'. There
are serveral main parts of the NetBurst Architecture which are listed
below:
The following is a diagram of Pentium 4 from tomshardware.com. This Diagram will help you understand the architecture of Pentium 4.
Hyper
Pipeline
The
hyper
pipeline is a new 20-stage branch prediction/recovery pipeline. The
pipeline of
P6 micro-architecture has 10 stages as well as the Atholon has 11
stages. The
reason for the longer pipeline is Intel's wish of Pentium 4 to deliver
highest
clock rates. Once a pipeline has more stages, it allows for the CPU to do less per clock cycle and
increase
the speed of the clock rate and allow for more processing headroom. If
more
stages can increase the speed of clock rate, why not make it 50 stages
instead
of 20. The reason is that As soon as it turns out at the end of the pipeline
that the
software will branch to an address that was not predicted, the whole
pipeline
needs to be flushed and refilled. The longer the pipeline the more
'in-flight'
instructions will be lost and the longer it takes until the pipeline is
filled
again.
Execution
Trace Cache
Due to the increase of the speed of?the clock rate, the design of the cache has to be improved to
let Pentium 4 to have the most ideal performance. One special thing about
the
Pentium 4¡s L1 cache is that the size has been reduced to 8kb, which is
half
the size of Pentium III's L1 cache and only an eighth of Athlon's. The
reason
for a small cache is to enable its extremely low latency of only 2 clock
cycles.
This latency is less than half of Pentium III's L1 cache.
While the L1 cache of Pentium 4 uses 4-way set associative,
the L2
cache uses 8-way set associative. The L2 cache, also called advanced
transfer
cache, is 256KB
in size(same as Pentium III's L2 cache) delivers a much higher data
throughput
channel between the Level 2 cache and the processor core. The Advanced
Transfer
Cache consists of a 256- bit (32-byte) interface that transfers data on
each
core clock. As a result, a 1.4-GHz Pentium 4 processor can deliver a data
transfer rate of 44.8GB/s. This rate is almost 3 times as fast as the
transfer
rate of Pentium III at 1 GHz.
Advanced Dynamic
Execution
The Pentium 4 processor has an extremely efficient out-of-order speculative execution engine that keeps the execution units busy. Also new is an enhanced branch prediction capability that keeps the processor executing to the correct program flow and reduces the mis-prediction penalty associated with deeper pipelines.
Rapid
Execution Engine
The
Pentium 4's two ALUs (Arithemetic Logic Units) run at twice the frequency
of the
processor core, so effectively 3Ghz for a 1.5Ghz Pentium 4. This allows
for two
things. It allows the processor to execute certain instructions at 1/2 the
frequency of the processor core and allows for higher execution speeds and
reduced latency. It allows for the processor to re-calculate bad cycles in
a
timely fashion that won't allow it to lose execution time. If it
miscalculates
during a cycle it still has the second half of a cycle to retrace its
steps and
correct the error.
However,
this doesn't help in integer based applications, such as business
applications.
Even with all these new fancy features to help out with miscalculated
cycles and
keep the processing flowing a pace that is higher than that of the
Athlon. When
dealing with integer applications they generally can't be predicted,
meaning the
cycles on the Pentium 4 will have to start over often. Even with a Rapid
Execution Engine to help speed up the processing time of instructions the
Pentium 4 has to pay very high penalties for a misprediction on any one
cycle.
Like everything else out there nothing is perfect and you must pay the
price for
success, in the Pentium 4's case a longer pipeline to achieve a higher
clock
speed.
Intel I
850 chipset
As
with
any new processor, you must have a new chipset to go along with it. The
Pentium
4 features a 400 Mhz NetBurst system bus, which provides nearly three
times the
system bandwidth over other platforms. With 3.2GB/s of system bandwidth,
compared to 1.06GB/s of bandwidth from the Pentium III running at 133Mhz
System
bus. This gives the processor tons of processing headroom for complex
applications, such as e-commerce applications which demand bandwidth for
highly
complex applications.
Furthermore, the 850 chipset features enhanced I/O Controller Hub (ICH2)
features such as an additional USB controller for four ports and twice as
much
bandwidth for USB than any other bridge architecture; 24Mbps. Also, the
850
chipset features ATA100 support for the best performance in hard drives
and
allows for a cost-effective solution for the latest features of HDD
technology
without losing performance from cost-effective HDD solutions.