AltiVec is a cool-sounding buzzword associated with a few units that together execute vector instructions (kind of like when you buy truck; you don't just buy a "truck with an engine" you buy an "F-150 SuperCab with a Triton V-8!" The buzzwords make it sound so much better). The VALU handles arithmetic, logical, comparison, rotate and shift instructions on vectors, vector load and store instructions and vector permutation and formatting instructions.

The two main units within the AltiVec unit are the vector permute unit (VPU) and the vector ALU (VALU). It also has its own 32-entry 128-bit vector register file (VRF). Connecting these units are full 128-bit data paths. Finally, it is fully pipelined. Having a separate unit dedicated to vector operations enhances both vector operation performance and overall performance since any instruction that uses vectors can be handed off to the AltiVec while subsequent (non-vector) instructions are handled by the other main execution units.

Vector Permute Unit (VPU)

Let's first look at the vector permute unit. What it does is permute vectors, that is, it manipulates them, and a lot more efficiently than the non-AltiVec MPC750, for example. It can pack and unpack vectors, meaning it truncates or expands via sign extension the operand. The VPU also handles merges, taking the first or last n bytes of two operands and merge them into a result 2n bytes long. Vector splat is taking a single constant and making a vector in which all elements have that constant's value. Splat is useful for multiplying a constant by a vector; it "splats" the constant, than multiplies the result with the vector. There are many other vector permutations the VPU performs (too many to explain them all here), however that should give you an idea about what kind of functionality it has.

Vector Arithmetic Logic Unit (VALU)

The VALU has the same functionality as the regular ALU, except it performs them on vectors rather than individual integer or floating point numbers. The VALU actually consists of three independent subunits: the vector simple integer unit (VSIU), the vector complex integer unit (VCIU) and the vector floating-point unit (you guessed it, the VFPU).

The VSIU executes the so-called "simple" vector integer computational instructions. These include addition, subtraction, maximum and minimum comparisons, averaging, rotates and shifts, comparisons and boolean operations.

The VCIU, on the other hand, handles the more complex integer instructions. These are instructions that have a longer latency than than the simple instructions, such as multiplication, division, multiplcation/addition and sum-across with saturation.

The VFPU executes all the same instructions as the VSIU and VCIU, combined, just with floating point numbers.