Advanced Sorting Algorithms Benchmark — TimSort · Radix Sort · Bucket Sort · Python

type

Algorithms & ML

status

Completed

year

2024

role

Solo Developer

01 —

System Architecture · 3D View

02 —

Architecture Diagram

Dataset Generator

NumPy · 500–100K

↓

Random

np.random.randint

Nearly Sorted

minor displacements

Reversed

descending order

↓

TimSort

InsertSort + MergeSort

Radix Sort

O(nk) · LSD approach

Bucket Sort

Value-range buckets

↓

Benchmarker

time · 9 combinations

↓

Matplotlib Plots

Per-dataset graphs

Results Table

Pandas DataFrame

03 —

Screenshots & Output

terminal

$ python3 sorting_benchmark.py

Dataset: Random Size: 100,000

Radix Sort: 0.041s fastest

TimSort: 0.089s

Bucket Sort: 0.134s

Dataset: Nearly Sorted Size: 100,000

TimSort: 0.012s fastest (run detection)

Bucket Sort: 0.490s worst case

Plots saved → results/

Benchmark Run

Timed execution output per algorithm

Performance Scores

Radix (Random)96%

TimSort (Sorted)99%

Radix (Reversed)95%

Bucket (Random)78%

Bucket (Reversed)32%

Performance Scores

Relative speed per dataset type

Data Output

{
# Benchmark config
sizes: [500, 1000, 5000, 10000, 50000, 100000],
datasets: [random, nearly_sorted, reversed],
algorithms: [TimSort, RadixSort, BucketSort],
best_overall: RadixSort
}

Results Config

Dataset sizes and algorithm config

Project Structure

📄 sorting_benchmark.py single script

def timsort(arr): RUN=32 · insertion+merge

def radix_sort(arr): LSD · counting_sort

def bucket_sort(arr): value-range buckets

def benchmark(): time · 9 combos

numpy · matplotlib · pandas

Source Structure

Single-script implementation

04 —

What I Built

›

Implemented TimSort from scratch: divides array into runs sorted by insertion sort, then merges runs using merge sort — mirrors Python built-in sort.

›

Implemented Radix Sort: processes integers least-significant to most-significant digit using counting sort as a subroutine — achieves O(nk) linear time.

›

Implemented Bucket Sort: distributes elements into value-range buckets, sorts each bucket independently, then concatenates results.

›

Generated 3 dataset types with NumPy: random integers, nearly-sorted (minor displacements), and reversed arrays — at sizes 500 to 100,000.

›

Benchmarked all 9 combinations (3 algorithms × 3 dataset types) with timed execution, then visualized results with Matplotlib.

›

Key findings: Radix Sort most consistent overall; TimSort dominates on nearly-sorted data; Bucket Sort degrades on skewed/reversed distributions.

05 —

Project Insights

✎Personal Notes & Learnings

Markdown Editor

### Course Context

Built for **Machine Learning / Algorithms** coursework at Lawrence Technological University. Goal: empirically validate theoretical time complexity claims through real benchmarking.

### Key Results

- **Radix Sort**: Most consistent across all 9 scenarios — linear O(nk) complexity holds empirically
- **TimSort**: Best on nearly-sorted data — run-detection heuristic eliminates most work when data is partially ordered
- **Bucket Sort**: Excellent on uniform random data, degrades severely on reversed/skewed distributions (uneven bucket filling)

### Implementation Details

- TimSort: `RUN = 32` — standard Timsort run size. Insertion sort on runs < 32 elements, then merge. Mirrors CPython internal sort.
- Radix Sort: LSD (Least Significant Digit) approach using counting sort as subroutine. Stable sort — preserves relative order.
- Bucket Sort: `n_buckets = len(arr)` — each bucket covers `(max-min)/n` range. Individual bucket sort via Python `sorted()`.
- Dataset sizes: 500, 1K, 5K, 10K, 50K, 100K — exponential scaling to reveal asymptotic behavior

### What I Learned

- Theoretical complexity and empirical performance diverge significantly based on data distribution
- TimSort adaptive nature makes it practically superior to theoretically "faster" algorithms on real-world data

Live Preview

Course Context

Built for Machine Learning / Algorithms coursework at Lawrence Technological University. Goal: empirically validate theoretical time complexity claims through real benchmarking.

Key Results

Radix Sort: Most consistent across all 9 scenarios — linear O(nk) complexity holds empirically
TimSort: Best on nearly-sorted data — run-detection heuristic eliminates most work when data is partially ordered
Bucket Sort: Excellent on uniform random data, degrades severely on reversed/skewed distributions (uneven bucket filling)

Implementation Details

TimSort: RUN = 32 — standard Timsort run size. Insertion sort on runs < 32 elements, then merge. Mirrors CPython internal sort.
Radix Sort: LSD (Least Significant Digit) approach using counting sort as subroutine. Stable sort — preserves relative order.
Bucket Sort: n_buckets = len(arr) — each bucket covers (max-min)/n range. Individual bucket sort via Python sorted().
Dataset sizes: 500, 1K, 5K, 10K, 50K, 100K — exponential scaling to reveal asymptotic behavior

What I Learned

Theoretical complexity and empirical performance diverge significantly based on data distribution
TimSort adaptive nature makes it practically superior to theoretically "faster" algorithms on real-world data

✓ Insights saved locally

← PreviousAI Requirements Generator

All Projects