ARMed solutions to the DSP war.
Digital Signal Processing made Easy by ARM
Architecture
What does ARM have to do with Digital Signal Processing (DSP)?
ARM seems to be leading the way in this field of processing. The processor
has found this as one of its greatest niche markets, mainly because of the
steps the company has taken to fit into the embedded market and the
architecture it has adopted.
DSP is prevalent with embedded processor in cell phones, cordless phones,
base stations, pagers, modems, Smartphones and PDAs (Personal Digital or
Data Assistants).
Other embedded applications that take advantage of such processors are:
disc drive controllers, automotive engine control and management systems,
digital auto surround sound, TV-top boxes and internet appliances. Other
products are still being modified to take advantage of it: toys, watches, etc. The possible
applications are almost endless.
But that still doesn't answer the question, why the ARM processor for this
job?
- The answer to that is rather simple. ARM can offer low cost, high
performance and low power consumption, each of which is required to make a
portable embedded item marketable in today's world. Not to mention the fact
that a whole sub-group of ARM architecture has been dedicated to function
strictly as signal processors.
- This adapted processor has been named the "Piccolo". The Piccolo functions
as an integrated co-processor to a standard ARM microprocessor allowing a
second DSP-oriented data path and associated DSP instruction set to integrate
into a standard ARM 32-bit RISC/16-bit Thumb system. This configuration allows
the co-processor to reuse data by sharing the same single system bus. Such a
system is cost effective and power efficient.
How is this co-processor situation with the Piccolo better for DSP?
- The answer is it helps in many ways. To start with, the integration of the
ARM microprocessor with the Piccolo reduces total silicon area by minimizing
the amount of on-chip code storage and efficiently using chip memory. This
situation is not typically found if two independent processors are being
used.
- Improvement to performance through instruction set integration will be
achieved through a combination of single-cycle arithmetic operations and the
data throughput necessary to sustain that performance.
- Another point that casts light on why the Piccolo solution to DSP has an
advantage is that other processors that operate independently are generally
based on "Legacy" technology which was not necessarily the best implementation,
whereas with the ARM integration there is no dependence on an inadequate
standard.
- Other important advantages are the power consumption efficiency that can
be obtained which helps lengthen battery life and reduce heat generation, and
of course, the cost savings that can be realized with
integration. Both are beneficial to support the strong trend toward
small, portable, wireless products.
So, all this builds up to something. What does this type of system
look like, you ask?
Well here it is -- the Piccolo Architecture
Shown here at top-left is the set of general-purpose registers all of which
are programmer accessible and contains thirty-two 16-bit registers or sixteen
32-bit registers to maximize data storage local to the piccolo processor along
with four extended precision 48-bit registers. At the bottom are buffers for
input and output to minimize memory accesses as well as stalls due to structural
hazards encountered by the ARM co-processor interface.
Other notable hardware is a 32-bit barrel shifter for fast scaling of data,
16 * 16 single cycle multiplier, with built in support for extended precision
arithmetic, and a split ALU for single-cycle dual 16-bit arithmetic and logical
operations in one instruction word or one 32-bit data item arithmetic or logical
operation.
Registers have a re-mapping scheme for code optimization and flexibility and
there is four nestable zero-overhead hardware loop constructs for executing DSP
algorithms.
Looks rather simple, doesn't it. But, a good question you might ask is how
does the ARM co-processor interface work? And wouldn't there be a lot of
contention between processors sharing data?
Lets start by describing the co-processor architecture support itself. ARM
supports a general-purpose extension of its instruction set by adding hardware
co-processors.
- This Architecture interface supports up to 16 logical co-processors
- Each co-processor can have up to 16 private registers of not limited to
32-bits.
- Co-processors uses load/store architecture
- For performance most new ARMs restrict the co-processor interface to
on-chip use for cache and memory management.
Now lets look at the interface.
- ARM Co-processor Interface is a "Bus watching" system.
- The co-processor is attached through a bus to the ARM processor as the
co-processor receives instructions it moves data through the input buffer to
its own internal instruction pipeline.
- As the coprocessor instruction begins execution there is a "hand-shake"
between the ARM and co-processor that they are both ready to execute the
instruction. This protocol includes three signals:
- Cpi (from ARM to all co-processors).
A signal for "Co-Processor Instruction," which indicates that ARM has
identified a coprocessor instruction and wants it executed.
- Cpa (from co-processor to ARM).
A signal for "Co-Processor Absent," which indicates to the ARM that there
is no co-processor available to execute the current instruction.
- Cpb (from co-processor to ARM).
A signal for "Co-Processor Busy," which tells ARM that the co-processor
cannot begin execution of the instruction yet.
What results come from the Hand-shaking? This is the interesting
part!
Once the co-processor has received the instruction and it is sitting and
waiting for execution there are four possible outcomes based on what
hand-shaking occurred.
- The ARM may not choose to execute this instruction (does not assert cpi),
possibly because it fell within a branch shadow or because of some failed
condition test (All ARM instructions are conditionally executed.). Result -
all co-processors discard instruction.
- ARM decides to execute (asserts cpi), but no co-processor can take it so
cpa stays active, ARM will take the undefined instruction trap and use
software to recover.
- ARM decides to execute and co-processor accepts, but cannot execute yet.
Co-processor takes cpa low but leaves cpb high; meanwhile, ARM "busy-waits"
until co-processor takes cpb low, stalling instruction stream. However ARM
will break off for interrupts.
- ARM decides to execute and co-processor accepts for immediate execution.
Cpi, cpa and cpb are all taken low and both sides commit to complete the
instruction.
Special note: Pre-emptive execution.
A co-processor may begin execution of an instruction as soon as receiving in
pipeline as long as it can recover state if hand-shaking does not
complete.
After all that what can we say?
In Conclusion, This type of processing in digital signals is not just a
trend; It has become a way of life. As anyone sitting in a college course can
tell you, it seems as if at least once per lecture a cell phone goes off. People
turn the damn thing off for an hour!
Cell phones are everywhere. And, why? Because they are so cheap, and rather
handy to have around. But, that is not the end of DSP. Cars, televisions,
microwaves, stereos, watches, PDAs -- the list goes on -- all use this
technology. ARM's Piccolo and its co-processor ideology are a move in the right
direction. It has provided an architecture that has balanced the trade-off
between performance, cost, and power consumption. ARM has emerged as the current
leader in this category a fleeting achievement in the computer world. But, there
is more yet to come.
Main | Top