Understanding Logistic Regression in Python
R Tip: Give data.table a Try
If your R
or dplyr
work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try data.table
.
If you did not already know
Unsupervised Semantic Deep Hashing (USDH)
In recent years, deep hashing methods have been proved to be efficient since it employs convolutional neural network to learn features and hashing codes simultaneously. However, these methods are mostly supervised. In real-world application, it is a time-consuming and overloaded task for annotating a large number of images. In this paper, we propose a novel unsupervised deep hashing method for large-scale image retrieval. Our method, namely unsupervised semantic deep hashing (\textbf{USDH}), uses semantic information preserved in the CNN feature layer to guide the training of network. We enforce four criteria on hashing codes learning based on VGG-19 model: 1) preserving relevant information of feature space in hashing space; 2) minimizing quantization loss between binary-like codes and hashing codes; 3) improving the usage of each bit in hashing codes by using maximum information entropy, and 4) invariant to image rotation. Extensive experiments on CIFAR-10, NUSWIDE have demonstrated that \textbf{USDH} outperforms several state-of-the-art unsupervised hashing methods for image retrieval. We also conduct experiments on Oxford 17 datasets for fine-grained classification to verify its efficiency for other computer vision tasks. …
R Packages worth a look
R Interface to Google Slides (rgoogleslides)Previously, when one is working with in the Google Ecosystem (Using Google Drive etc), there is hardly any good workflow of getting the values calculat …
Whats new on arXiv
Eigenvalue analogy for confidence estimation in item-based recommender systems
“It’s Always Sunny in Correlationville: Stories in Science,” or, Science should not be a game of Botticelli
There often seems to be an attitude among scientists and journal editors that, if a research team has gone to the trouble of ensuring rigor in some part of their study (whether in the design, the data collection, or the analysis, but typically rigor is associated with “p less than .05” and some random assignment or regression analysis, somewhere in the paper), that then they are allowed to speculate for free.Story time can take over.
Naive Bayes Classifier: A Geometric Analysis of the Naivete. Part 1
The curse of dimensionality is the bane of all classification problems. What is the curse of dimensionality? As the the number of features (dimensions) increase linearly, the amount of training data required for classification increases exponentially. If the classification is determined by a single feature we need a-priori classification data over a range of values for this feature, so we can predict the class of a new data point. For a feature xxx with 100 possible values, the required training data is of order O(100). But if there is a second feature yyy as well that is needed to determine the class, and yyy has 50 possible values, then we will need training data of order O(5000) – i.e. over the grid of possible values for the pair “x,yx,yx,y”. Thus the measure of the required data is the volume of the feature space and it increases exponentially as more features are added.
Being at the Center
Roger Peng ** 2018/09/07
Whats new on arXiv
t-Exponential Memory Networks for Question-Answering Machines
R Packages worth a look
Easy to Make (Lazy) Tables (ltable)Constructs tables of counts and proportions out of data sets. It has simplified syntax appealing for novice and even for advanced user under time press …