09Algorithms & ML

Advanced Sorting Algorithms Benchmark

A Machine Learning / Algorithms coursework project implementing and benchmarking three advanced sorting algorithms (TimSort, Radix Sort, Bucket Sort) across three dataset types (random, nearly sorted, reversed) at sizes from 500 to 100,000 elements. Produces performance comparison visualizations.

PythonNumPyMatplotlibPandasTimSortRadix SortBucket Sort
View on GitHub
3
Algorithms
100K
Max Dataset
9
Test Scenarios
type
Algorithms & ML
status
Completed
year
2024
role
Solo Developer
01

System Architecture · 3D View

02

Architecture Diagram

Dataset Generator
NumPy · 500–100K
Random
np.random.randint
Nearly Sorted
minor displacements
Reversed
descending order
TimSort
InsertSort + MergeSort
Radix Sort
O(nk) · LSD approach
Bucket Sort
Value-range buckets
Benchmarker
time · 9 combinations
Matplotlib Plots
Per-dataset graphs
Results Table
Pandas DataFrame
03

Screenshots & Output

terminal
$ python3 sorting_benchmark.py
Dataset: Random Size: 100,000
Radix Sort: 0.041s fastest
TimSort: 0.089s
Bucket Sort: 0.134s
Dataset: Nearly Sorted Size: 100,000
TimSort: 0.012s fastest (run detection)
Bucket Sort: 0.490s worst case
Plots saved → results/
Benchmark Run
Timed execution output per algorithm
Performance Scores
Radix (Random)96%
TimSort (Sorted)99%
Radix (Reversed)95%
Bucket (Random)78%
Bucket (Reversed)32%
Performance Scores
Relative speed per dataset type
Data Output
{
# Benchmark config
sizes: [500, 1000, 5000, 10000, 50000, 100000],
datasets: [random, nearly_sorted, reversed],
algorithms: [TimSort, RadixSort, BucketSort],
best_overall: RadixSort
}
Results Config
Dataset sizes and algorithm config
Project Structure
📄 sorting_benchmark.py single script
def timsort(arr): RUN=32 · insertion+merge
def radix_sort(arr): LSD · counting_sort
def bucket_sort(arr): value-range buckets
def benchmark(): time · 9 combos
numpy · matplotlib · pandas
Source Structure
Single-script implementation
04

What I Built

Implemented TimSort from scratch: divides array into runs sorted by insertion sort, then merges runs using merge sort — mirrors Python built-in sort.

Implemented Radix Sort: processes integers least-significant to most-significant digit using counting sort as a subroutine — achieves O(nk) linear time.

Implemented Bucket Sort: distributes elements into value-range buckets, sorts each bucket independently, then concatenates results.

Generated 3 dataset types with NumPy: random integers, nearly-sorted (minor displacements), and reversed arrays — at sizes 500 to 100,000.

Benchmarked all 9 combinations (3 algorithms × 3 dataset types) with timed execution, then visualized results with Matplotlib.

Key findings: Radix Sort most consistent overall; TimSort dominates on nearly-sorted data; Bucket Sort degrades on skewed/reversed distributions.

05

Project Insights

Personal Notes & Learnings
Markdown Editor
Live Preview

Course Context

Built for Machine Learning / Algorithms coursework at Lawrence Technological University. Goal: empirically validate theoretical time complexity claims through real benchmarking.

Key Results

  • Radix Sort: Most consistent across all 9 scenarios — linear O(nk) complexity holds empirically
  • TimSort: Best on nearly-sorted data — run-detection heuristic eliminates most work when data is partially ordered
  • Bucket Sort: Excellent on uniform random data, degrades severely on reversed/skewed distributions (uneven bucket filling)

Implementation Details

  • TimSort: RUN = 32 — standard Timsort run size. Insertion sort on runs < 32 elements, then merge. Mirrors CPython internal sort.
  • Radix Sort: LSD (Least Significant Digit) approach using counting sort as subroutine. Stable sort — preserves relative order.
  • Bucket Sort: n_buckets = len(arr) — each bucket covers (max-min)/n range. Individual bucket sort via Python sorted().
  • Dataset sizes: 500, 1K, 5K, 10K, 50K, 100K — exponential scaling to reveal asymptotic behavior

What I Learned

  • Theoretical complexity and empirical performance diverge significantly based on data distribution
  • TimSort adaptive nature makes it practically superior to theoretically "faster" algorithms on real-world data
✓ Insights saved locally