SunJackson Blog

A quick look at GHCN version 4

转载自：http://feedproxy.google.com/~r/RBloggers/~3/hsFM_WAPsk4/

Steven Mosher

发表于 2018-11-03

GHCN version 4 beta is available. Using the GHS population dataset the ~27000 GHCNV4 sites were filtered to collect only rural stations. GHS combines two datasets, a 10meter built surface satellite dataset and a human population dataset. https://ghsl.jrc.ec.europa.eu/. using site locations the population within 10km of the site was extracted. two cases were considered: A case where rual was defined as less than 16 people per sq km, and a case where there were less than 7 people per sq km, this filtering led to two subsets of the ~27K ghcnv4 stations. One with 15K stations and a second with 12K stations. The temperature data used was the unadjusted monthly T Avg.

阅读全文 »

Visualize the Business Value of your Predictive Models with modelplotr

转载自：http://feedproxy.google.com/~r/RBloggers/~3/zBQNVsUqIWA/

Jurriaan Nagelkerke

发表于 2018-11-03

Why ROC curves are a bad idea to explain your model to business people

阅读全文 »

coalesce with wrapr

转载自：http://feedproxy.google.com/~r/RBloggers/~3/EPrbVRcA_bc/

John Mount

发表于 2018-11-03

coalesce is a classic useful SQL operator that picks the first non-NULL value in a sequence of values.

阅读全文 »

Book Review – Sound Analysis and Synthesis with R

转载自：http://feedproxy.google.com/~r/RBloggers/~3/268g2mp-NGs/

Eryk Walczak

发表于 2018-11-03

R might not be the most obvious tool when it comes to analysing audio data. However, an increasing number of packages allows analysing and synthesising sounds. One of such packages is seewave. Jerome Sueur, one of the authors of seewave, now wrote a book about working with audio data in R. The book is entitled Sound Analysis and Synthesis with R and was published by Springer in 2018. I highly recommend it to anyone working with audio data.

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/11/04/if-you-did-not-already-know-534/

Michael Laux

发表于 2018-11-03

Accumulated Gradient Normalization This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically. …

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/11/04/whats-new-on-arxiv-804/

Michael Laux

发表于 2018-11-03

Everything you always wanted to know about a dataset: studies in data summarisation

阅读全文 »

Document worth reading： “A User’s Guide to Support Vector Machines”

转载自：https://advanceddataanalytics.net/2018/11/03/document-worth-reading-a-users-guide-to-support-vector-machines/

Michael Laux

发表于 2018-11-03

The Support Vector Machine (SVM) is a widely used classifier. And yet, obtaining the best results with SVMs requires an understanding of their workings and the various ways a user can in uence their accuracy. We provide the user with a basic understanding of the theory behind SVMs and focus on their use in practice. We describe the effect of the SVM parameters on the resulting classifier, how to select good values for those parameters, data normalization, factors that affect training time, and software for training SVMs. A User’s Guide to Support Vector Machines

阅读全文 »

“We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis”

转载自：https://andrewgelman.com/2018/11/03/reluctant-engage-post-hoc-speculation-unexpected-result-not-clearly-support-hypothesis/

Andrew

发表于 2018-11-03

Brendan Nyhan and Thomas Zeitzoff write:

阅读全文 »

7 Awesome Things You Can Do in Dataiku Without Coding

转载自：https://blog.dataiku.com/7-awesome-things-you-can-do-without-coding-in-dataiku

vincent.destoecklin@dataiku.com (Vincent de Stoecklin)

发表于 2018-11-02

As declared in Forbes just last month, businesses are starting to really wake up to the promise of what we call Enterprise AI. But what does that mean for the average non-coding analyst?

阅读全文 »

RcppAnnoy 0.0.11

转载自：http://feedproxy.google.com/~r/RBloggers/~3/SWF-bGZl25g/

Thinking inside the box

发表于 2018-11-02

A new release of RcppAnnoy is now on CRAN.

阅读全文 »