I like both Python and R, and teach them both, but for data science R is the clear choice. When asked why, I always note (a) written by statisticians for statisticians, (b) built-in matrix type and matrix manipulations, (c) great graphics, both base and CRAN, (d) excellent parallelization facilities, etc. I also like to say that R is “more CS-ish than Python,� just to provoke my fellow computer scientists.
But one aspect that I think is huge but probably gets lost when I cite it is R’s row/column/element-name feature. I’ll give an example here.
Today I was dealing with a problem of ID numbers that are nonconsecutive. My solution was to set up a lookup table. Say we have ID data (5, 12, 13, 9, 12, 5, 6). There are 5 distinct ID values, so we’d like to map these into new IDs 1,2,3,4,5. Here is a simple solution:
x <- c(5,12,13,9,12,5,6)> xuc <- as.character(unique(x))> xuc[1] “5� “12� “13� “9� “6� > xLookup <- 1:length(xuc)> names(xLookup) <- xuc> xLookup5 12 13 9 6 1 2 3 4 5
So, from now on, to do the lookup, I just use as subscript the character from of the original ID, e.g.
xLookup[’12’]12 2
Of course, I did all this within program code. So to change a column of IDs to the new ones, I wrote
ul[as.character(testSet[,1])])
Lots of other ways to do this, of course, but it shows how handy the names can be.
Related