Carry lookahead: group
How much have we reduced the delay?
We would like to have O(1) time, but
note that there are i OR operations for the ith carry ci,
and there are also i AND operations for the biggest term
There is a practical limit to the number of inputs to a single gate (fan-in).
We could build everything out of 2-input AND gates and OR gates, and the
delay would be only O(lg n) for n bits, which is still much better than O(n).
Another approach:
Build 4-bit carry-lookahead units, then cascade them together in group of
4 to get 16-bit adder.
This can be done with a maximum fan-in of only 4.
This is called group carry-lookahead (GCLA)
Need to deal with propagates and generates between 4-bit blocks.
"Super" propagate:
A propagate will occur from one group of 4 to the next
if every propagate in the first group is true.
P0 = p3 * p2 * p1 * p0
Similarly:
P1 = p7 * p6 * p5 * p4
P2 = p11 * p10 * p9 * p8
P3 = p15 * p14 * p13 * p12
"Super" generate:
A generate will occur between 4-bit groups
if there is a carry out from the most signficant bit in the 4-bit group.
This occurs when:
Generate occurs for the most significant bit OR
Generate occurs for a lower bit and all intermediate propagates are true
G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0)
G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4)
G2 = g11 + (p11 * g10) + (p11 * p10 * g9) + (p11 * p10 * p9 * g8)
G3 = g15 + (p15 * g14) + (p15 * p14 * g13) + (p15*p14*p13*g12)
Carry out:
Carry out for the 4-bit group is similar to the carry out for each bit:
C1 = G0 + P0c0
C2 = G1 + P1G0 + P1P0c0
C3 = G2 + P2G1 + P2P1G0 + P2P1P0c0
C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0