R tip: Use Radix Sort

R tip: use radix sort.

The “method = "radix"” option can greatly speed up sorting and ordering tables in R.

For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default settings and showing radix sort (full code here).

1
2
3
4
5
6
7
8
9
10
11

timings <- microbenchmark(
 order_default = d[order(d$col_a, d$col_b, d$col_c, d$col_x), , 
 drop = FALSE],
 order_radix = d[order(d$col_a, d$col_b, d$col_c, d$col_x,
 method = "radix"), ,
 drop = FALSE],
 check = my_check,
 times = 10L)

print(timings)
1
2
3
4
5
6
7
## Unit: milliseconds
## expr min lq mean median uq
## order_default 9531.2865 9653.6827 9759.8929 9690.6702 9833.2170
## order_radix 262.1377 263.3226 278.2547 265.1452 274.2476
## max neval
## 10329.3520 10
## 382.2544 10