A Machine Learning / Algorithms coursework project implementing and benchmarking three advanced sorting algorithms (TimSort, Radix Sort, Bucket Sort) across three dataset types (random, nearly sorted, reversed) at sizes from 500 to 100,000 elements. Produces performance comparison visualizations.
Implemented TimSort from scratch: divides array into runs sorted by insertion sort, then merges runs using merge sort — mirrors Python built-in sort.
Implemented Radix Sort: processes integers least-significant to most-significant digit using counting sort as a subroutine — achieves O(nk) linear time.
Implemented Bucket Sort: distributes elements into value-range buckets, sorts each bucket independently, then concatenates results.
Generated 3 dataset types with NumPy: random integers, nearly-sorted (minor displacements), and reversed arrays — at sizes 500 to 100,000.
Benchmarked all 9 combinations (3 algorithms × 3 dataset types) with timed execution, then visualized results with Matplotlib.
Key findings: Radix Sort most consistent overall; TimSort dominates on nearly-sorted data; Bucket Sort degrades on skewed/reversed distributions.
Built for Machine Learning / Algorithms coursework at Lawrence Technological University. Goal: empirically validate theoretical time complexity claims through real benchmarking.
RUN = 32 — standard Timsort run size. Insertion sort on runs < 32 elements, then merge. Mirrors CPython internal sort.n_buckets = len(arr) — each bucket covers (max-min)/n range. Individual bucket sort via Python sorted().