SunJackson Blog

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/10/13/r-packages-worth-a-look-1301/

Michael Laux

发表于 2018-10-13

Approximate Inclusion Probabilities for Survey Sampling (jipApprox)Approximate joint-inclusion probabilities in Unequal Probability Sampling, or compute Monte Carlo approximations of the first and second-order inclusio …

阅读全文 »

Document worth reading： “Data Curation with Deep Learning [Vision]： Towards Self Driving Data Curation”

转载自：https://advanceddataanalytics.net/2018/10/13/document-worth-reading-data-curation-with-deep-learning-vision-towards-self-driving-data-curation/

Michael Laux

发表于 2018-10-13

Past. Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest data management problems. Unfortunately, it is still the most time consuming and least enjoyable work of data scientists. So far, successful data curation stories are mainly ad-hoc solutions that are either domain-specific (for example, ETL rules) or task-specific (for example, entity resolution). Present. The power of current data curation solutions are not keeping up with the ever changing data ecosystem in terms of volume, velocity, variety and veracity, mainly due to the high human cost, instead of machine cost, needed for providing the ad-hoc solutions mentioned above. Meanwhile, deep learning is making strides in achieving remarkable successes in areas such as image recognition, natural language processing, and speech recognition. This is largely due to its ability to understanding features that are neither domain-specific nor task-specific. Future. Data curation solutions need to keep the pace with the fast-changing data ecosystem, where the main hope is to devise domain-agnostic and task-agnostic solutions. To this end, we start a new research project, called AutoDC, to unleash the potential of deep learning towards self-driving data curation. We will discuss how different deep learning concepts can be adapted and extended to solve various data curation problems. We showcase some low-hanging fruits about the early encounters between deep learning and data curation happening in AutoDC. We believe that the directions pointed out by this work will not only drive AutoDC towards democratizing data curation, but also serve as a cornerstone for researchers and practitioners to move to a new realm of data curation solutions. Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation

阅读全文 »

Understanding Chicago’s homicide spike; comparisons to other cities

转载自：https://andrewgelman.com/2018/10/13/39006/

Andrew

发表于 2018-10-13

Michael Masinter writes:

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/10/13/whats-new-on-arxiv-787/

Michael Laux

发表于 2018-10-13

Equality Constrained Decision Trees: For the Algorithmic Enforcement of Group Fairness

阅读全文 »

How to import a directory of csvs at once with base R and data.table. Can you guess which way is the fastest?

转载自：http://feedproxy.google.com/~r/RBloggers/~3/o0-DMjj1yO8/

Jozef's Rblog

发表于 2018-10-13

Inspired by a recent post on how to import a directory of csv files at once using purrr and readr by Garrick, in this post we will try achieving the same using base R with no extra packages, and with data·table, another very popular package and as an added bonus, we will play a bit with benchmarking to see which of the methods is the fastest, including the tidyverse approach in the benchmark.

阅读全文 »

GitHub Streak： Round Five

转载自：http://feedproxy.google.com/~r/RBloggers/~3/mVynW3PGZ4Y/

Thinking inside the box

发表于 2018-10-13

Four years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

阅读全文 »

Open Workshop： Deep Learning in R and Keras, November 14th in Frankfurt

转载自：http://feedproxy.google.com/~r/RBloggers/~3/0eBlxGnn8GY/

STATWORX

发表于 2018-10-13

阅读全文 »

Piping into ggplot2

转载自：http://feedproxy.google.com/~r/RBloggers/~3/ZR0sPf1AIcs/

John Mount

发表于 2018-10-13

In our wrapr pipe RJournal article we used piping into ggplot2 layers/geoms/items as an example.

阅读全文 »

RcppNLoptExample 0.0.1： Use NLopt from C/C++

转载自：http://feedproxy.google.com/~r/RBloggers/~3/nnpKyxwAe70/

Thinking inside the box

发表于 2018-10-13

A new package of ours, RcppNLoptExample, arrived on CRAN yesterday after a somewhat longer-than-usual wait for new packages as CRAN seems really busy these days. As always, a big and very grateful Thank You! for all they do to keep this community humming.

阅读全文 »

Prophets of gloom： Using NLP to analyze Radiohead lyrics

转载自：http://feedproxy.google.com/~r/RBloggers/~3/xQCp_pzUYVE/

Lesley Lathrop

发表于 2018-10-13

From the first time I listened to Radiohead’s The Bends, the band has been my favorite. I was a grad student in England at the time, and I recall listening to “Fake Plastic Trees” on repeat as I made my way to and from the library each day. By the time OK Computer came out, I was hooked. I remain hooked to this day.

阅读全文 »