Sponsored Post.By Rebecca Merrett, Instructor at Data Science DojoThere are many blogs and tutorials that teach you how to scrape data from a bunch of web pages once and then you’re done. But one-off web scraping is not useful for many applications that require sentiment analysis on recent or timely content, or capturing changing events and commentary, or analyzing trends in real time. As fun as it is to do an academic exercise of web scraping for one-off analysis on historical data, it is not useful to when wanting to use timely or frequently updated data.
P&G: Data Scientist – Machine Learning/NLP [Cincinnati, OH]
At: P&G Location: Cincinnati, OHWeb: www.pg.comPosition: Data Scientist - Machine Learning/NLP
InformationAge: Will 2019 See the Automation of Automation and Push Up Salaries of Data Scientists?
Historic Wildfire Data: Exploratory Visualization in R
In recent weeks, news of the devastating wildfires sweeping parts of the US state of California have featured prominently in the news.
If you did not already know
TEA-DNN Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing application-specific platforms for CNNs that provide improved inference performance and energy consumption as compared to GPUs. Embedded deep learning platforms differ in the amount of compute resources and memory-access bandwidth, which would affect performance and energy consumption of CNNs. It is therefore critical to consider the available hardware resources in the network architecture search. To this end, we introduce TEA-DNN, a NAS algorithm targeting multi-objective optimization of execution time, energy consumption, and classification accuracy of CNN workloads on embedded architectures. TEA-DNN leverages energy and execution time measurements on embedded hardware when exploring the Pareto-optimal curves across accuracy, execution time, and energy consumption and does not require additional effort to model the underlying hardware. We apply TEA-DNN for image classification on actual embedded platforms (NVIDIA Jetson TX2 and Intel Movidius Neural Compute Stick). We highlight the Pareto-optimal operating points that emphasize the necessity to explicitly consider hardware characteristics in the search process. To the best of our knowledge, this is the most comprehensive study of Pareto-optimal models across a range of hardware platforms using actual measurements on hardware to obtain objective values. …
CBH Group: Data Scientist [Perth, Australia]
At: CBH Group Location: Perth, AustraliaWeb: cbh.com.auPosition: Data Scientist
R Packages worth a look
Tools for Tensor Analysis and Decomposition (rTensor)A set of tools for creation, manipulation, and modeling of tensors with arbitrary number of modes. A tensor in the context of data analysis is a multid …
Intuit: Staff Data Scientist [Woodland Hills, CA and Mountain View, CA]
At: Intuit Location: Woodland Hills, CA and Mountain View, CAWeb: intuit.comPosition: Staff Data Scientist
Sharing Modeling Pipelines in R
Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr
supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat
data preparation system.
When cycling is faster than driving
Deliveroo is a service that picks up and delivers food. Data from their delivery riders showed that it was faster to ride a bike than other modes of transportation in cities. Carlton Reid for Forbes: