Watch the premiere episode of The Dr. Data Show, which answers the question, “What makes machine learning the coolest science?”
Document worth reading: “Generative Adversarial Nets for Information Retrieval: Fundamentals and Advances”
Generative adversarial nets (GANs) have been widely studied during the recent development of deep learning and unsupervised learning. With an adversarial training mechanism, GAN manages to train a generative model to fit the underlying unknown real data distribution under the guidance of the discriminative model estimating whether a data instance is real or generated. Such a framework is originally proposed for fitting continuous data distribution such as images, thus it is not straightforward to be directly applied to information retrieval scenarios where the data is mostly discrete, such as IDs, text and graphs. In this tutorial, we focus on discussing the GAN techniques and the variants on discrete data fitting in various information retrieval scenarios. (i) We introduce the fundamentals of GAN framework and its theoretic properties; (ii) we carefully study the promising solutions to extend GAN onto discrete data generation; (iii) we introduce IRGAN, the fundamental GAN framework of fitting single ID data distribution and the direct application on information retrieval; (iv) we further discuss the task of sequential discrete data generation tasks, e.g., text generation, and the corresponding GAN solutions; (v) we present the most recent work on graph/network data fitting with node embedding techniques by GANs. Meanwhile, we also introduce the relevant open-source platforms such as IRGAN and Texygen to help audience conduct research experiments on GANs in information retrieval. Finally, we conclude this tutorial with a comprehensive summarization and a prospect of further research directions for GANs in information retrieval. Generative Adversarial Nets for Information Retrieval: Fundamentals and Advances
PyImageConf 2018 Recap
Robust Quality – Powerful Integration of Data Science and Process Engineering
Distilled News
Setting up your first Kafka development environment in Google Cloud in 15 minutes
My talk tomorrow (Tues) 4pm in the Biomedical Informatics department (at 168th St)
The talk is 4-5pm in Room 200 on the 20th floor of the Presbyterian Hospital Building, Columbia University Medical Center.
A Three Month Data Analysis in Excel Could Have Taken Me One Day
This is the story of my literature thesis, which ran data analysis on short stories, and that I worked on to graduate with a B.A. in English this past May. It took me 10 laborious months of spreadsheet hell to complete the data acquisition and analysis. So after discovering Dataiku, I wanted to see what it could do with my data and how it might have changed my process, and it was nothing short of magical.
R Packages worth a look
Bayesian Dynamic Factor Analysis (DFA) with ‘Stan’ (bayesdfa)Implements Bayesian dynamic factor analysis with ‘Stan’. Dynamic factor analysis is a dimension reduction tool for multivariate time series. ‘bayesdfa’ …
What do you do when someone says, “The quote is, this is the exact quote”—and then misquotes you?
Ezra Klein, editor of the news/opinion website Vox, reports on a recent debate that sits in the center of the Venn diagram of science, journalism, and politics:
R Packages worth a look
Collection of Utility Functions (hgutils)A handy collection of utility functions designed to aid in package development, plotting and scientific research. Package development functionalities i …