SunJackson Blog

Document worth reading： “Toward a System Building Agenda for Data Integration”

转载自：https://analytixon.com/2018/11/06/document-worth-reading-toward-a-system-building-agenda-for-data-integration/

Michael Laux

发表于 2018-11-06

In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the ‘pain points’ of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges. Toward a System Building Agenda for Data Integration

阅读全文 »

If you did not already know

转载自：https://analytixon.com/2018/11/06/if-you-did-not-already-know-536/

Michael Laux

发表于 2018-11-06

Robust Sparse Principal Component Analysis (ROSPCA) A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuit-based algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time. …

阅读全文 »

Document worth reading： “Lectures on Statistics in Theory： Prelude to Statistics in Practice”

转载自：https://analytixon.com/2018/11/06/document-worth-reading-lectures-on-statistics-in-theory-prelude-to-statistics-in-practice/

Michael Laux

发表于 2018-11-06

This is a writeup of lectures on ‘statistics’ that have evolved from the 2009 Hadron Collider Physics Summer School at CERN to the forthcoming 2018 school at Fermilab. The emphasis is on foundations, using simple examples to illustrate the points that are still debated in the professional statistics literature. The three main approaches to interval estimation (Neyman confidence, Bayesian, likelihood ratio) are discussed and compared in detail, with and without nuisance parameters. Hypothesis testing is discussed mainly from the frequentist point of view, with pointers to the Bayesian literature. Various foundational issues are emphasized, including the conditionality principle and the likelihood principle. Lectures on Statistics in Theory: Prelude to Statistics in Practice

阅读全文 »

Data Feminism

转载自：https://flowingdata.com/2018/11/06/data-feminism/

Nathan Yau

发表于 2018-11-06

Data grows more intertwined with the everyday and more involved in important decisions. However, data is biased in many ways from collection, to analysis, and the conclusions, which is a problem when it is often intended to provide an objective point of view. In their recently released manuscript for Data Feminism, Catherine D’Ignazio and Lauren Klein discuss the importance of varied points of view:

阅读全文 »

xts 0.11-2 on CRAN

转载自：http://feedproxy.google.com/~r/RBloggers/~3/v-Eikg39BLo/

Joshua Ulrich

发表于 2018-11-06

xts version 0.11-2 was published to CRAN yesterday. xts provides data structure and functions to work with time-indexed data. This is a bug-fix release, with notable changes below:

阅读全文 »

Happy 10th Bday, Rcpp – and welcome release 1.0 !!

转载自：http://feedproxy.google.com/~r/RBloggers/~3/BRLeHhbr66Y/

Thinking inside the box

发表于 2018-11-06

Ten years ago today I wrote the NEWS.Rd entry in this screenshot for the very first Rcpp_release:

阅读全文 »

R Packages worth a look

转载自：https://analytixon.com/2018/11/06/r-packages-worth-a-look-1325/

Michael Laux

发表于 2018-11-06

Unifying Estimation Results with Binary Dependent Variables (urbin)Calculate unified measures that quantify the effect of a covariate on a binary dependent variable (e.g., for meta-analyses). This can be particularly i …

阅读全文 »

“Statistical and Machine Learning forecasting methods： Concerns and ways forward”

转载自：https://andrewgelman.com/2018/11/06/statistical-machine-learning-forecasting-methods-concerns-ways-forward/

Andrew

发表于 2018-11-06

Roy Mendelssohn points us to this paper by Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos, which begins:

阅读全文 »

Whats new on arXiv

转载自：https://analytixon.com/2018/11/06/whats-new-on-arxiv-808/

Michael Laux

发表于 2018-11-06

On Meta-Learning for Dynamic Ensemble Selection

阅读全文 »

Document worth reading： “Toward a System Building Agenda for Data Integration”

Michael Laux

If you did not already know

Michael Laux

Document worth reading： “Lectures on Statistics in Theory： Prelude to Statistics in Practice”

Michael Laux

More on sigr

Nina Zumel

Data Feminism

Nathan Yau

xts 0.11-2 on CRAN

Joshua Ulrich

Happy 10th Bday, Rcpp – and welcome release 1.0 !!

Thinking inside the box

R Packages worth a look

Michael Laux

“Statistical and Machine Learning forecasting methods： Concerns and ways forward”

Andrew

Whats new on arXiv

Michael Laux