Examining Your Presence on Twitter with Python
My Evil The Following with absoluteBLACK’s direct mount oval ring.
Top content from two years of Data School
March 24, 2016
Sense is now part of Cloudera!
We launched Sense with the mission of helping data scientists and data engineers focus on what’s important — extracting value from data rather than managing infrastructure. As the quantity of data explodes and machine learning pushes computing to new limits, this mission has only become more important.
How tall is that tree?
|
Dealing with Corrupt Files in Hadoop
As I’ve been working with Hadoop a lot in the last several months, I’ve come to realize that it doesn’t deal gracefully with corrupt files (e.g., mal-formed gzip files). I would throw a cluster at a couple hundred thousand files (of which one or two were bad) and the job would die two hours into execution, throwing EOFException errors all over the place. If I was only processing one file, I suppose that’s a reasonably acceptable outcome. But when 99.9% of your files are fine, and the corrupt ones aren’t recoverable anyway, there’s no sense in blowing up the whole job just because a trivial portion of the data was bad.
How To Become A Machine Learning Expert In One Simple Step
The web is full of good explanations of machine learning algorithms. And every second applicant for a data science position has finished the Coursera course on machine learning. While it is important to understand the concepts behind the algorithms, one thing is even more important:
Adobe Analytics Clickstream Data Feed: Loading To Relational Database
In my previous post about the Adobe Analytics Clickstream Data Feed, I showed how it was possible to take a single day worth of data and build a dataframe in R. However, most likely your analysis will require using multiple days/weeks/months of data, and given the size and complexity of the feed, loading the files into a relational database makes a lot of sense.
Two Bingo Ball Puzzle
Image: Digby Fire Dept | There are 75 balls in an American bingo game (according to the internet).From a fully shuffled set of bingo balls, you draw one number. Without replacing this ball, you draw a second number. If you win a prize equal to the value of the highest number you draw, what is your expected return? (Meaning: if you were able to play the game an infinite number of times, what would be your average winnings?) |
Avoid unsigned integers in C++ if you can
** Thu 17 March 2016