SunJackson Blog

“Simulations are not scalable but theory is scalable”

转载自：https://andrewgelman.com/2018/11/02/simulations-not-scalable-theory-scalable/

Andrew

发表于 2018-11-02

I just watched this video the value of theory inapplied fields (like statistics), it really resonated with my previous research experiences in statistical physics and on the interplay between randomised perfect sampling algorithms and Markov Chain mixing as well as my current perspective on the status quo of deep learning. . . .

阅读全文 »

Data Science “Paint by the Numbers” with the Hypothesis Development Canvas

转载自：http://feedproxy.google.com/~r/kdnuggets-data-mining-analytics/~3/p3xZj8XBunE/data-science-paint-by-numbers-hypothesis-development-canvas.html

William Schmarzo

发表于 2018-11-02

When I was a kid, I use to love “Paint by the Numbers” sets. Makes anyone who can paint or color between the lines a Rembrandt or Leonardo da Vinci (we can talk later about the long-term impact of forcing kids to “stay between the lines”).

阅读全文 »

Data Representation for Natural Language Processing Tasks

转载自：http://feedproxy.google.com/~r/kdnuggets-data-mining-analytics/~3/1Z_eQ9_IT74/data-representation-natural-language-processing.html

Matthew Mayo

发表于 2018-11-02

We have previously had a long look at a number of introductory natural language processing (NLP) topics, from approaching such tasks, to preprocessing text data, to getting started with a pair of popular Python libraries, and beyond. I was hoping to move on to exploring some different types of NLP tasks, but had it pointed out to me that I had neglected to touch on a hugely important aspect: data representation for natural language processing.

阅读全文 »

Quick overview on the new Bioconductor 3.8 release

转载自：http://feedproxy.google.com/~r/RBloggers/~3/XWhyaM_-4TE/

Rstats on LIBD rstats club

发表于 2018-11-02

Every six months the Bioconductor project releases itâ€™s new version of packages. This allows developers a time window to try out new methods and test them rigorously before releasing them to the community at large. It also means that this is an exciting time ğŸ�‰. With every release there are dozens of new software packages. Bioconductor version 3.8 was just released on Halloween: October 31st, 2018. Thus, this is the perfect time to browse through their descriptions and find out whatâ€™s new that can be of use to your research.

阅读全文 »

Document worth reading： “Transfer Metric Learning： Algorithms, Applications and Outlooks”

转载自：https://advanceddataanalytics.net/2018/11/02/document-worth-reading-transfer-metric-learning-algorithms-applications-and-outlooks/

Michael Laux

发表于 2018-11-02

Distance metric learning (DML) aims to find an appropriate way to reveal the underlying data relationship. It is critical in many machine learning, pattern recognition and data mining algorithms, and usually require large amount of label information (class labels or pair/triplet constraints) to achieve satisfactory performance. However, the label information may be insufficient in real-world applications due to the high-labeling cost, and DML may fail in this case. Transfer metric learning (TML) is able to mitigate this issue for DML in the domain of interest (target domain) by leveraging knowledge/information from other related domains (source domains). Although achieved a certain level of development, TML has limited success in various aspects such as selective transfer, theoretical understanding, handling complex data, big data and extreme cases. In this survey, we present a systematic review of the TML literature. In particular, we group TML into different categories according to different settings and metric transfer strategies, such as direct metric approximation, subspace approximation, distance approximation, and distribution approximation. A summarization and insightful discussion of the various TML approaches and their applications will be presented. Finally, we provide some challenges and possible future directions. Transfer Metric Learning: Algorithms, Applications and Outlooks

阅读全文 »

The blocks and rows theory of data shaping

转载自：http://feedproxy.google.com/~r/RBloggers/~3/SZetoyh0Cu8/

John Mount

发表于 2018-11-02

We have our latest note on the theory of data wrangling up here. It discusses the roles of “block records” and “row records” in the cdata data transform tool. With that and the theory of how to design transforms, we think we have a pretty complete description of the system.

阅读全文 »