The JapanR Conference 2018 Round-Up!

This past weekend was the 9th JapanR Conference hosted at LINECorporation in Tokyo, Japan!

I’ve been back in Japan for nearly a year now and I’ve been going tonearly every one of the R user meetups here,TokyoR, and it’s been a great experienceto learn about R and its wide variety of uses by Japanese practitionersand academics. Besides the near-monthly meetup of TokyoR there aresmaller gatherings spread throughout Japan such asFukuokaR andTsukubaR but the meeting that gathers thebiggest crowd is the JapanR Conference held everyDecember since 2010. Of course, there are outliers such as the specialTokyoR session this past Julywhen Joe Rickert and HadleyWickham visited Japan!

  • Want to learn more about TokyoR and when the next meeting is?Check this link! (Next meeting isJanuary 19th, 2019)

  • Want to learn more about JapanR? Check thislink!

This time around I took the time to take notes on the presentations andwrite up a little round-up blog post about it. As much as I would liketo write about every single presentation there were a number of topicswhere I really wouldn’t have been able to explain well even if thepresentation were done in English! You can watch most of thepresentations on the JapanR YouTubechannel.Although the talks are in Japanese maybe you’ll still find somethinguseful in their slides… or you can read on as I give a summary on around9 (out of 22) presentations that I found interesting!

NOTE: Some people presented using their Twitter/online name only,it’s just a cultural thing I’ve found here relating to privacy.

NOTE 2: There are still several presentations/slides that haven’tbeen uploaded yet but I will put more screenshots in as they becomeavailable so please check back in the coming days!

Creating your own RMarkdown template! – Kazuhiro Maeda

Kazuhiro Maeda is well known in the Japanese R community mainlythrough his online avatar (an elephant plushie) and his love for RMarkdown. For this conference he presented about creating your owncustomized R Markdown template. Maeda-san noted that knowledge of CSS,JavaScript, and Pandoc are crucial for this task as he first explainedhow the render() function works to create a document of your choice.Within this explanation he highlighted how the output from a render()call depends on templates and options set through Pandoc, therefore itis important to create a template that has options that can be utilizedby Pandoc. As making a template from scratch is extremely difficult,Maeda-san recommended that you find an existing template and play aroundwith it to get used to the process involved.

An example template Maeda-san worked on was: having an image pop outwhen you click on it in your R markdown document. To do this you need touse “lightbox” (a JavaScript library) and implement a script in yourPandoc template that calls on this library at the appropriate time (whenyou click on an image). Following a very thorough and technicalexplanation he showed us the fruits of his labor in a live-demo that youcan see here, where he knits theR Markdown document and clicks on a plot image, et Voila! It pops upvery nicely!

Since I wasn’t able to translate the template creation process wellenough (live-translating technical stuff is hard!), I will leave somegood links to creating your own R Markdown template in English below:

Easy and modern data analysis with “R AnalyticFlow”! – Ryota Suzuki

Ryota Suzuki, CEO ofef-prime and author of thepvclust package, gave atalk on R AnalyticFlow which is a freesoftware that his company built that utilizes the R environment forstatistical computing in a GUI format. R AnalyticFlow was created inJava and is compatible with Windows, Mac, and Linux OSs as well as beingavailable in English, Japanese, Chinese, and many more languages.

As you can see in the picture above, R AnalyticFlow allows you torepresent your data analysis workflow through nodes and edges in adescriptive flow chart. In previous versions of the GUI, the goal was touse as much of base R functions as possible but more recently R dataanalyses including predictive modeling have been relying heavily onexternal packages such as the tidyverse, glmnet, xgboost, etc. Sonow the new direction Suzuki-san wants to take is to implement thesepackages into R AnalyticFlow and provide support to users who wantto install their own packages to use in the GUI. Lastly, in a livedemonstration he showed us adevelopment version of the GUI as he made some simple ggplot2 plotswith a simple mouse-and-click. Due to the new direction RAnalyticFlow is taking, Suzuki-san is looking for Java developers tohelp contribute to the development of the new versions of the GUI. Ifyou know your way around Java and want to help, let him know!

I have completely understood Shiny! – Med_KU

In what was a very lively and fun presentation, @Med_KU took usthrough a very comprehensive tour of Shiny apps. First he talked abouthow to create a Shiny app via R Studio, working with the app.R and ui.Rfiles, and publishing through R Studio Connect. Afterwards, he wentthrough many examples with plotly and googleVis showing all theinteractive/reactive capabilities that Shiny apps are known for.Personally, I’m more of a ggiraph fan myself (I use it at work forflexdashboards and Shiny apps) but @Med_KU’s presentation has gottenme interested in trying googleVis out as well!

I recommend watching therecordingof the presentation as @Med_KU goes through a lot of differentexamples!

DID Analysis with R! – Yuki Yagi

University student Yuki Yagi presented on DID(Difference-in-differences) analysis and how he utilized it in one ofhis research papers. For those unfamiliar, DID is a statisticaltechnique that observes that differential effect of a treatment(training program, medication intake, etc.) on a treatment group vs. acontrol group. A quick overview of DID can be foundhere.

The main question that Yagi-san investigated in his research paper was:“What would be the impact on the number of patents produced whenresearch subsidies were given to companies that were already highlyskilled and had a track record of producing many patents.”

DID is really easy to understand given the above diagram. On the lefthand side is the measurement of the outcome variable, in this case thenumber of patents before the treatment while on the right is themeasurement of the number of patents after the treatment (researchsubsidies) were given to the treatment group (blue dot is the controland red dot is the treatment group).

Above is how the model looked like for the research question with theuse of dummy variables for B (subsidy), C (post-treatment), and D(subsidy + post-treatment).

Rugby Analytics with R! – Koichi Kinoshita

Koichi Kinoshita, a rugby performance analyst for theHITO-Communications Sunwolves and theNorthland Rugby Union (in New Zealand), gave a presentation on how heapplied his nascent R skills to his favorite sport. After giving a briefexplanation about the state of sports analytics in rugby and his resolveto improve his data analytics skills in R, he showed us a number ofplots from data he gathered from the Japanese national rugby leaguewebsite.

Throughout the presentation Kinoshita-san tried to answer severalquestions such as “Is tackling percentage related to lost tries?”,“Do a higher number of tackles help stop line-breaks?”, “Can you stopline-breaks if you have a higher tackle success percentage?” amongothers as he explained his results in thorough detail using plots as avisual aid. Ultimately, his data showed that those Japanese rugby teamswith over 86% tackle success rate were able to limit line-breaks to 10or less and were very likely to win matches while those teams with atackle success rate below 78% mostly wound up losing.

Armed with this knowledge, Kinoshita-san investigated further and foundthat across the entire season around half the teams totaled around100~150 tackles in any single game. Assuming an 80% tackle successrate, a team with 100 tackles will have 20 mistackles while a team with150 tackles will have 30 mistackles. So, the big question was: “Howmuch will this 10 mistackle difference cost a team?

Consequently, Kinoshita-san ran a regression analysis on line-breaksagainst a mistackles and found that on average you concede 10line-breaks from 30 mistackles. Coupling this with data presented thatif a team concedes 10 line-breaks or over a team is ~70% likely to losea game, a higher number of attempted tackles isn’t necessarily a goodthing, what matters is preventing line-breaks with successfultackles!

In his conclusion, Kinoshita-san brought up a really good point in thatit’s not enough to look at pure success/fail percentages and he broughtup pass completion rate in soccer as an example. I concurred with hisstatement as in soccer you could naively assume a team being “dominant”or “good” if they have a really high pass completion rate, but if mostof those successful passes came from the defenders and goalkeeperpassing among themselves you can’t really say that that is a good thing.In soccer (and in other sports) it’s important to dig a little bitdeeper, for example it might be more insightful to look at a soccerteam’s pass completion rate in the opposition’s third of the pitch!

Using linear regression to find a new home in Tokyo! – Kaori Sawamura

This presentation by Kaori Sawamura showed off a fun real life casestudy using R. One of Sawamura-san’s co-workers wanted to move to Tokyoand become a “city boy”, so she set out to use some of her newly learnedR skills to take try to find a dream home for him!

Here are the 3 basic requirements that Sawamura-san was given:

  • Somewhere with a gym close by (preferably Tokyo MetropolitanGymnasium)

  • Somewhere close to Sendagaya Station

  • Somewhere with a monthly rent below 200,000 Yen

After filtering out houses above 200,000 Yen monthly rent Sawamura-sanfitted a multiple linear regression as seen below:

After looking at the diagnostic plots for the model she took out a fewoutliers that she confirmed was mainly due to incorrect data on thehousing website and was able to cut her list down to around 70 houses!Then using leaflet Sawamura-san mapped out all the potential houses thatfit her criteria and labelled them with details about thehouse/apartment.

After doing the analysis, she showed it to her co-worker and got him totake a look around. Unfortunately, the information provided by thewebsite didn’t account for things such as construction work, cleanlinessand safety of the neighborhood, along with the added bonus of having tolive with the landlord. So this case study was also a great reminderabout the necessity of doing some field work in addition to analytics!

Using external C/C++ libraries with R! – Wataru Iwasaki

Wataru Iwasaki loves using C++ and R, he has given talks on thesubject before and thistime was no different as he talked about incorporating and using C/C++libraries with R. First, he introduced a number of great onlineresources for developing Rcpp packages including stuff from his ownwebsite as well as the free “Rcpp for Everyone (Englishversion)” written by fellowJapanese R user Masaki Tsuda.

Next, he talked about a couple of basic steps needed to incorporateC/C++ libraries as well as the advantages and disadvantages of thevarious styles of doing so.

In Iwasaki-san’s concluding remarks he called on the community to sharetheir knowledge of handling external C++ libraries via social media orthrough blog posts.

Build an R compiler…with R! – igjit

For this presentation @igjit told us about his attempt at creating anR compiler written in R! You can see the fruits of his labor in thenrc package that he created here.Note: currently it only works on Linux so you should use somethinglike Docker if you want to try it out on other OSs.

Here are some examples:

Simple addition:

You can even assign and use variables! (the function that got cut off isexecute):

I unfortunately don’t have much experience with compilers so it wastough for me to understand the technical details but it was anotherpretty cool example of how you can use R for just about anything!

Tennis Player Ratings with R! – flaty13

In the second use of R in sports analysis we had @flaty13 take a lookat tennis player ELO ratings. After a brief introduction relating totennis, ELO ratings, and his views on the difficulty of rating/rankingtennis players, he used a data set from Kaggle and the PlayerRatingspackage to conjure up some visualizations on tennis player rankings overtime.

As a matter of course, he also looked at Kei Nishikori’s performanceand highlighted his rapid rise in the rankings during 2014 where hebecame the first Asian to reach a Grand Slam tournamentfinal!

Overall, it was nice to see how sports analytics have grown these pastfew years in Japan as besides this tennis presentation and the rugbypresentation earlier there have been presentations relating to soccerand baseball analytics at past TokyoR meetups.

Following both the main talks and the LTs, sushi and drinks were servedat the after-party as R users from all over Japan shared their storiesof success and struggle alike!

The main organizer, Atsushi Hayakawamentioned that he eventually wants JapanR to grow even bigger in thecoming years and to have every participant give a LT! Whether as a jokeor if that is actually feasible it would be cool if we set the GuinessWorld Record for Most Presentations at an R Conference!

As you saw the quality of presentations at JapanR was very high.Unfortunately, most of the content was only in Japanese which I thoughtwas a shame. That’s why I thought of doing this to share the knowledgeof Japanese R Users to those around the world! This is my first timewriting up one of these and I hope to contribute more and improve in theyears to come!

If you’re ever in Japan, come join us for some R&R … and R!

Related