Slide 3 of 39
Notes:
Sorting problems “solutions”:
Merge efficiency- forecasting (predict largest key), large cluster size, using max fan-ins (# of runs that can be merged at once).
Hashing problems “solutions”:
Overflow- avoidance (partition data according to fan-out (# of partition files made)), resolution (assume overflow won’t occur & resort to avoidance when it does).