SunJackson Blog

How to perform merges (joins) on two or more data frames with base R, tidyverse and data.table

转载自：http://feedproxy.google.com/~r/RBloggers/~3/ppR--WPprxA/

Jozef's Rblog

发表于 2018-10-27

In this post in the R:case4base series we will look at one of the most common operations on multiple data frames – merge, also known as JOIN in SQL terms.

阅读全文 »

Can we do better than using averaged measurements?

转载自：https://andrewgelman.com/2018/10/26/can-better-using-averaged-measurements/

Andrew

发表于 2018-10-26

Angus Reynolds writes:

阅读全文 »

RConsortium — Building an R Certification

转载自：http://feedproxy.google.com/~r/RBloggers/~3/wpKVcaokI8w/

Colin Fay

发表于 2018-10-26

For the last months, ThinkR has been involved (with Mango, Procogia and the Linux Foundation) in a working group for an RConsortium R Certification.

阅读全文 »

Because it's Friday： Parable of the Polygons

转载自：https://blog.revolutionanalytics.com/2018/10/because-its-friday-parable-of-the-polygons.html

David Smith

发表于 2018-10-26

What if we lived in a society where everyone really happy about living in a diverse neighborhood? What if people only wanted to move when the disparity was really extreme: say, when fewer than 33% of people nearby looked like them? Well, we’d end up with a society like this:

阅读全文 »

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/10/26/r-packages-worth-a-look-1314/

Michael Laux

发表于 2018-10-26

Fetch Sections of XML Scholarly Articles (pubchunks)Get chunks of XML scholarly articles without having to know how to work with XML. Custom mappers for each publisher and for each article section pull o …

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/10/26/whats-new-on-arxiv-798/

Michael Laux

发表于 2018-10-26

Topic representation: finding more representative words in topic models

阅读全文 »

Document worth reading： “Causal inference and the data-fusion problem”

转载自：https://advanceddataanalytics.net/2018/10/26/document-worth-reading-causal-inference-and-the-data-fusion-problem/

Michael Laux

发表于 2018-10-26

We review concepts, principles, and tools that unify current approaches to causal analysis and attend to new challenges presented by big data. In particular, we address the problem of data fusion – piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts, because the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, nonparametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data fusion in causal inference tasks. Causal inference and the data-fusion problem

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/10/27/whats-new-on-arxiv-799/

Michael Laux

发表于 2018-10-26

Overoptimization Failures and Specification Gaming in Multi-agent Systems

阅读全文 »

The Final Data Science Roadshow is Just the Beginning

转载自：https://blog.dataiku.com/the-final-data-science-roadshow-is-just-the-beginning

Julia Günther

发表于 2018-10-26

The Dataiku Data Science Roadshow wrapped up its 12-country tour last week, and while they say all good things must come to an end, we’re happy to say that we still have more on the docket to satisfy the worldwide data community.

阅读全文 »

CRAN’s New Missing Data Task View

转载自：http://feedproxy.google.com/~r/RBloggers/~3/TkNVlvn3m1o/

R Views

发表于 2018-10-26

It is a relatively rare event, and cause for celebration, when CRAN gets a new Task View. This week the r-miss-tastic team: Julie Josse, Nicholas Tierney and Nathalie Vialaneix launched the Missing Data Task View. Even though I did some research on R packages for a post on missing values a couple of years ago, I was dumbfounded by the number of packages included in the new Task View.

阅读全文 »