It’s the dystopian backdrop to every science fiction novel involving artificial intelligence: a dark and stormy police state in which cold, unfeeling robots patrol for nonconforming humans. While we at Dataiku are optimistic that we’re a bit away from that bleak future, 2018 has been a year riddled with ethical questions about the effect artificial intelligence will have on our lives.
Whether it be Cambridge Analytica’s use of Facebook data to influence elections or Google’s involvement with the Department of Defense’s drone project, data has had it’s fair share of scrutiny in the media lately. AI is the trendy buzzword of 2018 that has the unique ability to inspire wonder and to generate intense discomfort among people.
R Packages worth a look
Estimation, Comparison and Selection of Transformations (trafo)Estimation, selection and comparison of several families of transformations. The families of transformations included in the package are the following: …
Whats new on arXiv
Reconfigurable Inverted Index
Cool tennis-tracking app
Announcing the Artificial Intelligence (AI) Hackathon: Build Intelligent Applications using machine learning APIs and serverless
Amazon Web Services (AWS) brings image and video analysis, natural language processing, speech recognition, text-to-speech, and machine translation within the reach of every developer. With machine learning (ML) services by AWS, you can plug in prebuilt AI functionality into your apps without having to worry about ML models.
Aella Credit empowers underbanked individuals by using Amazon Rekognition for identity verification
Aella Credit is a financial services company based in West Africa that provides instant loans to individuals with a verifiable source of income in emerging markets by using biometric and employer data.
Announcing Practical Data Science with R, 2nd Edition
We are pleased and excited to announce that we are working on a second edition of Practical Data Science with R!
If you did not already know
Genetic Programming Relevance Vector Machine (GP-RVM)
This paper proposes a hybrid basis function construction method (GP-RVM) for Symbolic Regression problem, which combines an extended version of Genetic Programming called Kaizen Programming and Relevance Vector Machine to evolve an optimal set of basis functions. Different from traditional evolutionary algorithms where a single individual is a complete solution, our method proposes a solution based on linear combination of basis functions built from individuals during the evolving process. RVM which is a sparse Bayesian kernel method selects suitable functions to constitute the basis. RVM determines the posterior weight of a function by evaluating its quality and sparsity. The solution produced by GP-RVM is a sparse Bayesian linear model of the coefficients of many non-linear functions. Our hybrid approach is focused on nonlinear white-box models selecting the right combination of functions to build robust predictions without prior knowledge about data. Experimental results show that GP-RVM outperforms conventional methods, which suggest that it is an efficient and accurate technique for solving SR. The computational complexity of GP-RVM scales in $O( M^{3})$, where $M$ is the number of functions in the basis set and is typically much smaller than the number $N$ of training patterns. …
If you did not already know
Brain2Text
Nowadays, the Internet represents a vast informational space, growing exponentially and the problem of search for relevant data becomes essential as never before. The algorithm proposed in the article allows to perform natural language queries on content of the document and get comprehensive meaningful answers. The problem is partially solved for English as SQuAD contains enough data to learn on, but there is no such dataset in Russian, so the methods used by scientists now are not applicable to Russian. Brain2 framework allows to cope with the problem – it stands out for its ability to be applied on small datasets and does not require impressive computing power. The algorithm is illustrated on Sberbank of Russia Strategy’s text and assumes the use of a neuromodel consisting of 65 mln synapses. The trained model is able to construct word-by-word answers to questions based on a given text. The existing limitations are its current inability to identify synonyms, pronoun relations and allegories. Nevertheless, the results of conducted experiments showed high capacity and generalisation ability of the suggested approach. …
Data Science Portfolio Project: Is Fandango Still Inflating Ratings?
At Dataquest, we strongly advocate portfolio projects as a means of getting your first data science job. In this blog post, we’ll walk you through an example portfolio project.