GHCN version 4 beta is available. Using the GHS population dataset the ~27000 GHCNV4 sites were filtered to collect only rural stations. GHS combines two datasets, a 10meter built surface satellite dataset and a human population dataset. https://ghsl.jrc.ec.europa.eu/. using site locations the population within 10km of the site was extracted. two cases were considered: A case where rual was defined as less than 16 people per sq km, and a case where there were less than 7 people per sq km, this filtering led to two subsets of the ~27K ghcnv4 stations. One with 15K stations and a second with 12K stations. The temperature data used was the unadjusted monthly T Avg.
Visualize the Business Value of your Predictive Models with modelplotr
Why ROC curves are a bad idea to explain your model to business people
coalesce with wrapr
coalesce
is a classic useful SQL
operator that picks the first non-NULL
value in a sequence of values.
Book Review – Sound Analysis and Synthesis with R
R might not be the most obvious tool when it comes to analysing audio data. However, an increasing number of packages allows analysing and synthesising sounds. One of such packages is seewave. Jerome Sueur, one of the authors of seewave, now wrote a book about working with audio data in R. The book is entitled Sound Analysis and Synthesis with R and was published by Springer in 2018. I highly recommend it to anyone working with audio data.
If you did not already know
Accumulated Gradient Normalization
This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically. …
Whats new on arXiv
Everything you always wanted to know about a dataset: studies in data summarisation
Document worth reading: “A User’s Guide to Support Vector Machines”
The Support Vector Machine (SVM) is a widely used classifier. And yet, obtaining the best results with SVMs requires an understanding of their workings and the various ways a user can in uence their accuracy. We provide the user with a basic understanding of the theory behind SVMs and focus on their use in practice. We describe the effect of the SVM parameters on the resulting classifier, how to select good values for those parameters, data normalization, factors that affect training time, and software for training SVMs. A User’s Guide to Support Vector Machines
“We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis”
Brendan Nyhan and Thomas Zeitzoff write:
7 Awesome Things You Can Do in Dataiku Without Coding
vincent.destoecklin@dataiku.com (Vincent de Stoecklin)
发表于
As declared in Forbes just last month, businesses are starting to really wake up to the promise of what we call Enterprise AI. But what does that mean for the average non-coding analyst?
RcppAnnoy 0.0.11
A new release of RcppAnnoy is now on CRAN.