As we continue to bring KDnuggets readers year-end roundups and predictions for 2019, we reach out to a number of influential industry companies for their takes, posing this question:
Highlights of 2018
We end 2018 with a round-up of some of the research, talks, sci-fi, visualizations/art, and a grab bag of other stuff we found particularly interesting, enjoyable, or influential this year (and we’re going to be a bit fuzzy about the definition of “this year”)!
Document worth reading: “Are screening methods useful in feature selection? An empirical study”
Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were only useful in one regression and three classification datasets out of the ten datasets evaluated. Are screening methods useful in feature selection? An empirical study
So you want to play a pRank in R…?
So…you want to play a pRank with R? This short post will give you a fun function you can use in R to help you out!
Document worth reading: “A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions”
Deep neural networks (DNNs) achieve excellent performance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is still much lower than human performance on distorted images. We additionally find that there is little correlation in errors between DNNs and human subjects. This could be an indication that the internal representation of images are different between DNNs and the human visual system. These comparisons with human performance could be used to guide future development of more robust DNNs. A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions
vtreat Variable Importance
vtreat
‘s purpose is to produce pure numeric R
data.frame
s that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes).
AzureStor: an R package for working with Azure storage
Storage endpoints
Classifying yin and yang using MRI
Zad Chow writes:
University of Rhode Island: Data Scientist, DataSpark (2 Positions) [Kingston, RI]
At: University of Rhode Island Location: Kingston, RIWeb: www.uri.eduPosition: Data Scientist, DataSpark (2 Positions)
How will automation tools change data science?
By Dr. Ryohei Fujimaki, CEO and Founder of dotData