SunJackson Blog


  • 首页

  • 分类

  • 关于

  • 归档

  • 标签

  • 站点地图

  • 公益404

Document worth reading: “Toward a System Building Agenda for Data Integration”

转载自:https://analytixon.com/2018/11/06/document-worth-reading-toward-a-system-building-agenda-for-data-integration/

Michael Laux


发表于 2018-11-06

In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the ‘pain points’ of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges. Toward a System Building Agenda for Data Integration

阅读全文 »

If you did not already know

转载自:https://analytixon.com/2018/11/06/if-you-did-not-already-know-536/

Michael Laux


发表于 2018-11-06

Robust Sparse Principal Component Analysis (ROSPCA) A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuit-based algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time. …

阅读全文 »

Document worth reading: “Lectures on Statistics in Theory: Prelude to Statistics in Practice”

转载自:https://analytixon.com/2018/11/06/document-worth-reading-lectures-on-statistics-in-theory-prelude-to-statistics-in-practice/

Michael Laux


发表于 2018-11-06

This is a writeup of lectures on ‘statistics’ that have evolved from the 2009 Hadron Collider Physics Summer School at CERN to the forthcoming 2018 school at Fermilab. The emphasis is on foundations, using simple examples to illustrate the points that are still debated in the professional statistics literature. The three main approaches to interval estimation (Neyman confidence, Bayesian, likelihood ratio) are discussed and compared in detail, with and without nuisance parameters. Hypothesis testing is discussed mainly from the frequentist point of view, with pointers to the Bayesian literature. Various foundational issues are emphasized, including the conditionality principle and the likelihood principle. Lectures on Statistics in Theory: Prelude to Statistics in Practice

阅读全文 »

More on sigr

转载自:http://www.win-vector.com/blog/2018/11/more-on-sigr/

Nina Zumel


发表于 2018-11-06

If you’ve read our previous R Tip on using sigr with linear models, you might have noticed that the lm() summary object does in fact carry the R-squared and F statistics, both in the printed form:

阅读全文 »

Data Feminism

转载自:https://flowingdata.com/2018/11/06/data-feminism/

Nathan Yau


发表于 2018-11-06

Data grows more intertwined with the everyday and more involved in important decisions. However, data is biased in many ways from collection, to analysis, and the conclusions, which is a problem when it is often intended to provide an objective point of view. In their recently released manuscript for Data Feminism, Catherine D’Ignazio and Lauren Klein discuss the importance of varied points of view:

阅读全文 »

xts 0.11-2 on CRAN

转载自:http://feedproxy.google.com/~r/RBloggers/~3/v-Eikg39BLo/

Joshua Ulrich


发表于 2018-11-06

xts version 0.11-2 was published to CRAN yesterday. xts provides data structure and functions to work with time-indexed data.  This is a bug-fix release, with notable changes below:

阅读全文 »

Happy 10th Bday, Rcpp – and welcome release 1.0 !!

转载自:http://feedproxy.google.com/~r/RBloggers/~3/BRLeHhbr66Y/

Thinking inside the box


发表于 2018-11-06

Ten years ago today I wrote the NEWS.Rd entry in this screenshot for the very first Rcpp_release:

阅读全文 »

R Packages worth a look

转载自:https://analytixon.com/2018/11/06/r-packages-worth-a-look-1325/

Michael Laux


发表于 2018-11-06

Unifying Estimation Results with Binary Dependent Variables (urbin)Calculate unified measures that quantify the effect of a covariate on a binary dependent variable (e.g., for meta-analyses). This can be particularly i …

阅读全文 »

“Statistical and Machine Learning forecasting methods: Concerns and ways forward”

转载自:https://andrewgelman.com/2018/11/06/statistical-machine-learning-forecasting-methods-concerns-ways-forward/

Andrew


发表于 2018-11-06

Roy Mendelssohn points us to this paper by Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos, which begins:

阅读全文 »

Whats new on arXiv

转载自:https://analytixon.com/2018/11/06/whats-new-on-arxiv-808/

Michael Laux


发表于 2018-11-06

On Meta-Learning for Dynamic Ensemble Selection

阅读全文 »
1 … 132 133 134 … 398
SunJackson

SunJackson

3974 日志
5 分类
© 2018 - 2019 SunJackson