Longitudinal Concordance Correlation (lcc)Estimates the longitudinal concordance correlation to access the longitudinal agreement profile. The estimation approach implemented is variance compon …
The evolution of pace in popular movies
James Cutting writes:
EARL conference recap: Seattle 2018
I had the pleasure of attending the EARL (Enterprise Applications of the R Language) Conference held in Seattle on 2018-11-07, and the honour of being one of the speakers. The EARL conferences occupy a unique niche in the R conference universe, bringing together the I-use-it-at-work contingent of the R community. The Seattle event was, from my perspective (I use R at work, and lead a team of data scientists that uses R) a fantastic conference. Full marks to the folks from Mango Solutions for organizing it!
Document worth reading: “Learning From Positive and Unlabeled Data: A Survey”
Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them. Learning From Positive and Unlabeled Data: A Survey
R Packages worth a look
Lindley Power Series Distribution (LindleyPowerSeries)Computes the probability density function, the cumulative distribution function, the hazard rate function, the quantile function and random generation …
Magister Dixit
“Data science, surprisingly perhaps, is not about designing the most advanced machine learning algorithms and training them on all of the data (and then having Skynet). It’s about finding the right data, becoming a quasi-expert on the process, system, or event you are trying to model, and crafting features that will help quirky and sometimes frail statistical algorithms make accurate predictions. Very little time is actually spent on the algorithm itself.” Scott W. Strong ( April 10, 2018 )
Top 5 domains Big Data analytics helps to transform
By Tetiana Boichenko, n-ix.com.
If you did not already know
Data Lineage Analysis
“Data lineage is defined as a data life cycle that includes the data’s origins and where it moves over time.” It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources. It also enables replaying specific portions or inputs of the dataflow for step-wise debugging or regenerating lost output. In fact, database systems have used such information, called data provenance, to address similar validation and debugging challenges already.Data provenance documents the inputs, entities, systems, and processes that influence data of interest, in effect providing a historical record of the data and its origins. The generated evidence supports essential forensic activities such as data-dependency analysis, error/compromise detection and recovery, and auditing and compliance analysis. “Lineage is a simple type of why provenance.” …
If you did not already know
Double Path Networks for Sequence to Sequence Learning (DPN-S2S)
Encoder-decoder based Sequence to Sequence learning (S2S) has made remarkable progress in recent years. Different network architectures have been used in the encoder/decoder. Among them, Convolutional Neural Networks (CNN) and Self Attention Networks (SAN) are the prominent ones. The two architectures achieve similar performances but use very different ways to encode and decode context: CNN use convolutional layers to focus on the local connectivity of the sequence, while SAN uses self-attention layers to focus on global semantics. In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion. During the encoding step, we develop a double path architecture to maintain the information coming from different paths with convolutional layers and self-attention layers separately. To effectively use the encoded context, we develop a cross attention module with gating and use it to automatically pick up the information needed during the decoding step. By deeply integrating the two paths with cross attention, both types of information are combined and well exploited. Experiments show that our proposed method can significantly improve the performance of sequence to sequence learning over state-of-the-art systems. …
Counting digits by @ellis2013nz
Counting digits appearing in page numbers The other day in a training session, the facilitators warmed people up into intellectual work with this group exercise: