In R, we often need to get values or perform calculations from information not on the same row. We need to either retrieve specific values or we need to produce some sort of aggregation. This post explores some of the options and explains the weird (to me at least!) behaviours around rolling calculations and alignments.
We can retrieve earlier values by using the lag()
function from
dplyr
[1]. This by default looks one value earlier in the sequence.
1 |
|
v l
1 1 NA
2 2 1
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
1 |
|
v l
1 1 NA
2 2 NA
3 3 1
4 4 2
5 5 3
6 6 4
7 7 5
8 8 6
9 9 7
10 10 8
1 |
|
v l
1 1 3
2 2 4
3 3 5
4 4 6
5 5 7
6 6 8
7 7 9
8 8 10
9 9 NA
10 10 NA
1 |
|
v c
1 1 1
2 2 3
3 3 6
4 4 10
5 5 15
6 6 21
7 7 28
8 8 36
9 9 45
10 10 55
1 |
|
v c c_1
1 1 1 0
2 2 3 1
3 3 6 3
4 4 10 6
5 5 15 10
6 6 21 15
7 7 28 21
8 8 36 28
9 9 45 36
10 10 55 45
1 |
|
v c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
1 |
|
Error in data.frame(v, c2 = RcppRoll::roll_sum(v, 2), c3 = RcppRoll::roll_sum(v, : arguments imply differing number of rows: 10, 9, 8
1 |
|
v c2 c3
1 1 3 NA
2 2 5 6
3 3 7 9
4 4 9 12
5 5 11 15
6 6 13 18
7 7 15 21
8 8 17 24
9 9 19 27
10 10 NA NA
1 |
|
data.frame(v, c_l=RcppRoll::roll_sum(v,2, fill=NA, align=”left”), c_r=RcppRoll::roll_sum(v,2, fill=NA, align=”right”))
1 |
|
If, like me, you’d expect the left align to be the option for looking at prior values you’d be very wrong. The convention for these calculations, is left align extends into future values because the window starts on with the current value on the left. The right align covers past values because the window ends with the current value being on the right.