SunJackson Blog

Document worth reading： “Putting Data Science In Production”

转载自：https://advanceddataanalytics.net/2018/09/07/document-worth-reading-putting-data-science-in-production/

Michael Laux

发表于 2018-09-07

A critical challenge of data science projects is getting everyone on the same page in terms of project challenges, responsibilities, and methodologies. More often than not, there is a disconnect between the worlds of development and production. Some teams may choose to re-code everything in an entirely different language while others may make changes to core elements, such as testing procedures, backup plans, and programming languages. Transitioning a data product into production could become a nightmare as different opinions and methods vie for supremacy, resulting in projects that needlessly drag on for months beyond promised deadlines. Successfully building a data product and then deploying it into production is not an easy task — it becomes twice as hard when teams are isolated and playing by their own rules. Putting Data Science In Production

阅读全文 »

Magister Dixit

转载自：https://advanceddataanalytics.net/2018/09/07/magister-dixit-1339/

Michael Laux

发表于 2018-09-07

“Unlike a pure statistician, a data scientist is also expected to write code and understand business. Data science is a multi-disciplinary practice requiring a broad range of knowledge and insight. It’s not unusual for a data scientist to explore a fresh set of data in the morning, create a model before lunch, run a series of analytics in the afternoon and brief a team of digital marketers before heading home at night.” Neera Talbert ( March 16, 2015 )

阅读全文 »

Welcome to Dataiku University!

转载自：https://blog.dataiku.com/welcome-to-dataiku-university

alivia.smith@dataiku.com (Alivia Smith)

发表于 2018-09-07

It’s almost back to school and we got you covered with our new free online data science course.

阅读全文 »

Bothered by non-monotonicity? Here’s ONE QUICK TRICK to make you happy.

转载自：https://andrewgelman.com/2018/09/07/bothered-non-monotonicity-heres-one-quick-trick-make-happy/

Andrew

发表于 2018-09-07

We’re often modeling non-monotonic functions. For example, performance at just about any task increases with age (babies can’t do much!) and then eventually decreases (dead people can’t do much either!). Here’s an example from a few years ago:

阅读全文 »

The Blessings of Multiple Causes： Causal Inference when you Can't Measure Confounders

转载自：https://www.inference.vc/blessings-of-multiple-causes-causal-inference-when-you-cant-measure-confounders/

Ferenc Huszar

发表于 2018-09-07

Happy back-to-school time everyone!

阅读全文 »

Who wrote that anonymous NYT op-ed? Text similarity analyses with R

转载自：http://blog.revolutionanalytics.com/2018/09/anonymous-nyt-op-ed.html

David Smith

发表于 2018-09-07

In US politics news, the New York Times took the unusual step this week of publishing an anonymous op-ed from a current member of the White House (assumed to be a cabinet member or senior staffer). Speculation about the identity of the author is, of course, rife. Much of the attention has focused on the use of specific words in the article, but can data science provide additional clues? In the last 48 hours, several R users have employed text similarity analysis to try and identify likely culprits.

阅读全文 »

Mirroring an FTP Using lftp and cron

转载自：http://randyzwitch.com/mirror-ftp-lftp/

未知

发表于 2018-09-06

As my Developer Advocate role leads me to doing more and more Sysadmin/Data Engineer type of work, I continuously find myself looking for more efficient ways of copying data folders to where I need them. While there are a lot of great GUI ETL tools out there, for me the simplest and fastest way tends to be using linux utilities. Here’s how to mirror an FTP using lftp, with a cron repeater every five minutes.

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/09/06/if-you-did-not-already-know-476/

Michael Laux

发表于 2018-09-06

Deep Collaborative Weight-Based Classification (DeepCWC) One of the biggest problems in deep learning is its difficulty to retain consistent robustness when transferring the model trained on one dataset to another dataset. To conquer the problem, deep transfer learning was implemented to execute various vision tasks by using a pre-trained deep model in a diverse dataset. However, the robustness was often far from state-of-the-art. We propose a collaborative weight-based classification method for deep transfer learning (DeepCWC). The method performs the L2-norm based collaborative representation on the original images, as well as the deep features extracted by pre-trained deep models. Two distance vectors will be obtained based on the two representation coefficients, and then fused together via the collaborative weight. The two feature sets show a complementary character, and the original images provide information compensating the missed part in the transferred deep model. A series of experiments conducted on both small and large vision datasets demonstrated the robustness of the proposed DeepCWC in both face recognition and object recognition tasks. …

阅读全文 »

“Dynamically Rescaled Hamiltonian Monte Carlo for Bayesian Hierarchical Models”

转载自：https://andrewgelman.com/2018/09/06/dynamically-rescaled-hamiltonian-monte-carlo-bayesian-hierarchical-models/

Andrew

发表于 2018-09-06

Aki points us to this paper by Tore Selland Kleppe, which begins:

阅读全文 »

Document worth reading： “Data learning from big data”

转载自：https://advanceddataanalytics.net/2018/09/06/document-worth-reading-data-learning-from-big-data/

Michael Laux

发表于 2018-09-06

Technology is generating a huge and growing availability of observa tions of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical analysis of enormous batches of data. In this paper, we discuss the role of statistics regarding some of the issues raised by big data in this new paradigm and also propose the name of data learning to describe all the activities that allow to obtain relevant knowledge from this new source of information. Data learning from big data

阅读全文 »