SunJackson Blog

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/09/03/r-packages-worth-a-look-1262/

Michael Laux

发表于 2018-09-03

Set of Functions to Use in Survival Analysis and in Data Science (loose.rock)Collection of functions to improve work-flow in survival analysis and data science. The package features include: the generation of balanced datasets, …

阅读全文 »

Human Fuel Consumption

转载自：http://datagenetics.com/blog/september12018/index.html

未知

发表于 2018-09-02

|| ||How many miles to the gallon does a person get? How does this compare to a car or truck?OK, it’s a silly question, but let’s take a first order look at the energy involved (because you can’t, after all, drink a gallon of gasoline then see how far you can walk!)Seriously, don’t try to drink gasoline! Ever! Don’t even think about it. It’s horrible, and potentially fatal in anything other than small quantities (There’s technically no such thing as a poison, only a poisonous concentration of any substance “Sola dosis facit venenum”).|

阅读全文 »

How to set up a voting system for a Hall of Fame?

转载自：https://andrewgelman.com/2018/09/02/set-voting-system-hall-fame/

Andrew

发表于 2018-09-02

Micah Cohen writes:

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/09/03/if-you-did-not-already-know-472/

Michael Laux

发表于 2018-09-02

PixelSNAIL Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present state-of-the-art log-likelihood results on CIFAR-10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at https://…/pixelsnail-public. …

阅读全文 »

Document worth reading： “A Survey on Influence Maximization in a Social Network”

转载自：https://advanceddataanalytics.net/2018/09/02/document-worth-reading-a-survey-on-influence-maximization-in-a-social-network/

Michael Laux

发表于 2018-09-02

Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). This is an active area of research in computational social network analysis domain since one and half decades or so. Due to its practical importance in various domains, such as viral marketing, target advertisement, personalized recommendation, the problem has been studied in different variants, and different solution methodologies have been proposed over the years. Hence, there is a need for an organized and comprehensive review on this topic. This paper presents a survey on the progress in and around TSS Problem. At last, it discusses current research trends and future research directions as well. A Survey on Influence Maximization in a Social Network

阅读全文 »

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/09/02/r-packages-worth-a-look-1261/

Michael Laux

发表于 2018-09-02

Analysis of Two-Way Tables (twoway)Carries out analyses of two-way tables with one observation per cell, together with graphical displays for an additive fit and a diagnostic plot for re …

阅读全文 »

Hey—take this psychological science replication quiz!

转载自：https://andrewgelman.com/2018/09/02/hey-take-psychological-science-replication-quiz/

Andrew

发表于 2018-09-02

Rob Wilbin writes:

阅读全文 »

Unfolding Naïve Bayes From Scratch!

转载自：https://www.codementor.io/aishajaved/unfolding-naive-bayes-from-scratch-mzrnpwr0c

Aisha Javed

发表于 2018-09-02

Link to medium for this blog post : https://towardsdatascience.com/unfolding-na%C3%AFve-bayes-from-scratch-2e86dcae4b01

阅读全文 »

Magister Dixit

转载自：https://advanceddataanalytics.net/2018/09/02/magister-dixit-1334/

Michael Laux

发表于 2018-09-02

“The component of prediction tasks that can be easily automated is the one that does not involve any expert knowledge. Prediction tasks require expert knowledge to specify the scientific question (what input and what outputs) and to identify/generate relevant data sources. (The extent of expert knowledge varies across different prediction tasks.18) However, no expert knowledge is required for prediction after the inputs and outputs are specified and measured in a particular dataset. At this point, a machine learning algorithm can take over the data analysis to deliver a mapping and quantify its performance. The resulting mapping may be opaque, as in many deep learning applications, but its ability to map the inputs to the outputs with a known accuracy is not in question.” Miguel A. Hernán, John Hsu, Brian Healy ( July 12, 2018 )

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/09/02/if-you-did-not-already-know-471/

Michael Laux

发表于 2018-09-01

Restricted Maximum Likelihood (REML) In statistics, the restricted (or residual, or reduced) maximum likelihood (REML) approach is a particular form of maximum likelihood estimation which does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect. In the case of variance component estimation, the original data set is replaced by a set of contrasts calculated from the data, and the likelihood function is calculated from the probability distribution of these contrasts, according to the model for the complete data set. In particular, REML is used as a method for fitting linear mixed models. In contrast to the earlier maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters. The idea underlying REML estimation was put forward by M. S. Bartlett in 1937. The first description of the approach applied to estimating components of variance in unbalanced data was by Desmond Patterson and Robin Thompson of the University of Edinburgh, although they did not use the term REML. A review of the early literature was given by Harville. REML estimation is available in a number of general-purpose statistical software packages, including Genstat (the REML directive), SAS (the MIXED procedure), SPSS (the MIXED command), Stata (the mixed command), and R (the lme4 and older nlme packages), as well as in more specialist packages such as MLwiN, HLM, ASReml, Statistical Parametric Mapping and CropStat. …

阅读全文 »