Are novel, complex, and specialized neural network architectures always better for language modeling? Recent papers have shown otherwise. Language models are used to predict the next token given the preceeding tokens. Most operate at word-level or character-level. Word-level models have large vocabulary sizes (how many words are there in the English language?) compared to character-level models (there are 26 letters in the English language). This means that character-level models require less memory. On the other hand, when processing a sentence, character-level models see a large number of tokens (each character is a token) compared to word-level models. A large number of tokens (long sequence) is harder for neural networks because of the vanishing gradients problem.
Using Natural Language Processing to Combat Filter Bubbles and Fake News – 360° Stance Detection
One of the most alarming cultural developments in the internet age has been due to the way that we access news content. The creation of filter bubbles, where people only consume media that expresses views that they are likely to agree with, has led to entrenchment of existing biases in society.
AHL Python Data Hackathon
Yesterday I got to attend Man AHL’s first London Python Data hackathon (21-22 April – photos online). I went with the goal of publishing my ipython_memory_usage tool from GitHub to PyPI (success!), updating the docs (success!) and starting to work on the YellowBrick project (partial-success).
Some web API package development lessons from HIBPwned
As announced yesterday, HIBPwned
version 0.1.7 has been released to CRAN! Although the release was mainly a maintenance release building on Steph’s already great code, internal changes were made to start transforming HIBPwned
into a real showcase of web API package development. Let’s summarize some interesting points:
Announcing Ursa Labs: an innovation lab for open source data science
** Thu 19 April 2018
Shared Autonomy via Deep Reinforcement Learning
How many CRAN package maintainers have been pwned?
The alternative title of this blog post is HIBPwned
version 0.1.7 has
been released! W00t!. Steph’s HIBPwned
package utilises the
HaveIBeenPwned.com API to check
whether email addresses and/or user names have been present in any
publicly disclosed data breach. In other words, this package potentially
delivers bad news, but useful bad news!
Why Start a Data Science Project?
A common popular technique for learning data science is starting a project. Here are the 3 E’s for why building a data science project is a good idea.
Can a Machine Be Racist or Sexist?
I presented a talk with this title at the Applied Machine Learning Conference at Tom Tom Fest in Charlottesville (which I also helped plan) last Thursday April 12, 2018.
Seasonalities: The Near-Term Future for the Market
Sell in May and go away. Is there any truth to this? Did some work on seasonalities recently and applied it to the stock market to quantify the truthfulness of this statement.