SunJackson Blog

Distilled News

转载自：https://advanceddataanalytics.net/2018/09/08/distilled-news-855/

Michael Laux

发表于 2018-09-08

Understanding Logistic Regression in Python

阅读全文 »

R Tip： Give data.table a Try

转载自：http://www.win-vector.com/blog/2018/09/r-tip-give-data-table-a-try/

John Mount

发表于 2018-09-08

If your R or dplyr work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try data.table.

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/09/08/if-you-did-not-already-know-477/

Michael Laux

发表于 2018-09-08

Unsupervised Semantic Deep Hashing (USDH) In recent years, deep hashing methods have been proved to be efficient since it employs convolutional neural network to learn features and hashing codes simultaneously. However, these methods are mostly supervised. In real-world application, it is a time-consuming and overloaded task for annotating a large number of images. In this paper, we propose a novel unsupervised deep hashing method for large-scale image retrieval. Our method, namely unsupervised semantic deep hashing (\textbf{USDH}), uses semantic information preserved in the CNN feature layer to guide the training of network. We enforce four criteria on hashing codes learning based on VGG-19 model: 1) preserving relevant information of feature space in hashing space; 2) minimizing quantization loss between binary-like codes and hashing codes; 3) improving the usage of each bit in hashing codes by using maximum information entropy, and 4) invariant to image rotation. Extensive experiments on CIFAR-10, NUSWIDE have demonstrated that \textbf{USDH} outperforms several state-of-the-art unsupervised hashing methods for image retrieval. We also conduct experiments on Oxford 17 datasets for fine-grained classification to verify its efficiency for other computer vision tasks. …

阅读全文 »

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/09/09/r-packages-worth-a-look-1268/

Michael Laux

发表于 2018-09-08

R Interface to Google Slides (rgoogleslides)Previously, when one is working with in the Google Ecosystem (Using Google Drive etc), there is hardly any good workflow of getting the values calculat …

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/09/08/whats-new-on-arxiv-757/

Michael Laux

发表于 2018-09-08

Eigenvalue analogy for confidence estimation in item-based recommender systems

阅读全文 »

“It’s Always Sunny in Correlationville： Stories in Science,” or, Science should not be a game of Botticelli

转载自：https://andrewgelman.com/2018/09/08/always-sunny-correlationville-stories-science/

Andrew

发表于 2018-09-08

There often seems to be an attitude among scientists and journal editors that, if a research team has gone to the trouble of ensuring rigor in some part of their study (whether in the design, the data collection, or the analysis, but typically rigor is associated with “p less than .05” and some random assignment or regression analysis, somewhere in the paper), that then they are allowed to speculate for free.Story time can take over.

阅读全文 »

Naive Bayes Classifier： A Geometric Analysis of the Naivete. Part 1

转载自：https://www.codementor.io/ashokchilakapati/naive-bayes-classifier-a-geometric-analysis-of-the-naivete-part-1-n03p1j8bc

Ashok Chilakapati

发表于 2018-09-07

The curse of dimensionality is the bane of all classification problems. What is the curse of dimensionality? As the the number of features (dimensions) increase linearly, the amount of training data required for classification increases exponentially. If the classification is determined by a single feature we need a-priori classification data over a range of values for this feature, so we can predict the class of a new data point. For a feature xxx with 100 possible values, the required training data is of order O(100). But if there is a second feature yyy as well that is needed to determine the class, and yyy has 50 possible values, then we will need training data of order O(5000) – i.e. over the grid of possible values for the pair “x,yx,yx,y”. Thus the measure of the required data is the volume of the feature space and it increases exponentially as more features are added.

阅读全文 »

Being at the Center

转载自：https://simplystatistics.org/2018/09/07/being-at-the-center/

未知

发表于 2018-09-07

Roger Peng ** 2018/09/07

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/09/07/whats-new-on-arxiv-756/

Michael Laux

发表于 2018-09-07

t-Exponential Memory Networks for Question-Answering Machines

阅读全文 »

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/09/07/r-packages-worth-a-look-1267/

Michael Laux

发表于 2018-09-07

Easy to Make (Lazy) Tables (ltable)Constructs tables of counts and proportions out of data sets. It has simplified syntax appealing for novice and even for advanced user under time press …

阅读全文 »