By Jeff Hale.
A deep dive into glmnet: standardize
Unless otherwise stated,
will denote the number of observations,
will denote the number of features, and
fit
will denote the output/result of the glmnet
call. The data matrix is denoted by
and the response is denoted by
.
Introducing Drexel new online MS in Data Science
Drexel’s new online MS in Data Science is the degree that launched a thousand opportunities. With an emphasis on skills like data mining and algorithm creation, you’ll graduate workplace-ready by having experience with some of the industry’s leading technology. And, because the online program takes a comprehensive approach to working with data, you will be able to take your skills and apply them to the field of your choice.
Quoting in R
Many R
users appear to be big fans of “code capturing” or “non standard evaluation” (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in R
.
R Packages worth a look
Generates Multivariate Nonnormal Data and Determines How Many Factors to Retain (RGenData)The GenDataSample() and GenDataPopulation() functions create, respectively, a sample or population of multivariate nonnormal data using methods describ …
In case you missed it: October 2018 roundup
In case you missed them, here are some articles from October of particular interest to R users.
Whats new on arXiv
Deep Item-based Collaborative Filtering for Top-N Recommendation
Magister Dixit
“There is a dizzying array of algorithms from which to choose, and just making the choice between them presupposes that you have sufficiently advanced mathematical background to understand the alternatives and make a rational choice. The options are also changing, evolving constantly as a result of the work of some very bright, very dedicated researchers who are continually refining existing algorithms and coming up with new ones.” Ted Dunning, Ellen Friedman ( 2014 )
Rcpp now used by 1500 CRAN packages
If you did not already know
automated CLAUse DETectEr (Claudette)
Machine Learning Powered Analysis of Consumer Contracts and Privacy Policies. CLAUDETTE – ‘automated CLAUse DETectEr’ – is an interdisciplinary research project hosted at the Law Department of the European University Institute, led by professors Giovanni Sartor and Hans-W. Micklitz, in cooperation with engineers from University of Bologna and University of Modena and Reggio Emilia. The research objective is to test to what extent is it possible to automate reading and legal assessment of online consumer contracts and privacy policies, to evaluate their compliance with EU´s unfair contractual terms law and personal data protection law (GDPR), using machine learning and grammar-based approaches. The idea arose out of bewilderment. Having read dozens of terms of service and of privacy policies of online platforms, we came to conclusion that despite substantive law in place, and despite enforcers´ competence for abstract control, providers of online services still tend to use unfair and unlawful clauses in these documents. Hence, the idea to automate parts of enforcement process by delegating certain tasks to machines. On one hand, we believe that relying on automation can increase quality and effectiveness of legal work of enforcers. On the other, we want to empower consumers themselves, by giving them tools to quickly assess whether what they agree to online is fair and/or lawful. …