Part of the improvement from MMX technology is due to modifications that increase the Clock Rate.
As we know, Execution Time = (# of instructions) * (CPI) * (Clock Cycle Time)
Increasing the Clock Rate decreases the Clock Cycle Time, which in turn decreases Execution Time, and that's good.
Another reason they decided to increase the Clock Rate is consumers who didn't take CMSC 411 don't know about the Execution Time equation. They wouldn't be able to comprehend something with a 300 MHz Clock Rate could actually be faster than a 333 MHz. So basically, Intel would never be able to sell the new chip unless they increased the Clock Rate.
So, in order to increase Clock Rate, the MMX Pentium designers had to find and eliminate some bottlenecks. The two major bottlenecks were the instruction decoder and the data cache access. These bottlenecks are somewhat dependendant on each other, so speeding one up helps speed the other up, too. So they tried to fix the decoder bottleneck first. Here's what an instruction used to look like in the old 5-stage pipe:
Fetch, Decode1, Decode2, Execute, WriteBack
To speed things up, a 6th stage was added to the pipe - Prefetch. A queue was also added between Fetch and Decode1 to decouple freezes, but that's way beyond the scope of this page. So now an instruction looks like this:
Prefetch, Fetch, Decode1, Decode2, Execute, WriteBack
After adding this new stage, machine timing was rebalanced to take advantage of the extra clock cycle.
The data cache was also made larger and faster than that of the original Pentium. Now the cache supports a single clock read and write in each port.
These improvements led to a 20% increase in Clock Rate.