SunJackson Blog

How AI Will Change Brick-and-Mortar Retail in 2019

转载自：https://blog.dataiku.com/ai-brick-mortar-retail-2019

Claire Carroll

发表于 2018-12-26

Data science, machine learning, and AI have clear applications for e-commerce, and given their relative ease of implementation, most online retailers are already deeply invested in strategies like recommendation engines, dynamic pricing optimization, and supply chain optimization. But so far, aside from the big players like Amazon, brick-and-mortar retail is behind in the move to AI.

阅读全文 »

Part 2： Optimism corrected bootstrapping is definitely bias, further evidence

转载自：http://feedproxy.google.com/~r/RBloggers/~3/VxLr-Qyc9MM/

chris2016

发表于 2018-12-26

Some people are very fond of the technique known as ‘optimism corrected bootstrapping’, however, this method is bias and this becomes apparent as we increase the number of noise features to high numbers (as shown very clearly in my previous blog post). This needs exposing, I don’t have the time to do a publication on this nor the interest so hence this article. Now, I have reproduced the bias with my own code.

阅读全文 »

Following your gut, following the data

转载自：https://flowingdata.com/2018/12/26/following-your-gut-following-the-data/

Nathan Yau

发表于 2018-12-26

The Wall Street Journal highlighted a disagreement between data and business at Netflix. Ultimately, the business side “won.” However, maybe that’s the wrong framing. Roger Peng describes the differences between analysis and the full truth:

阅读全文 »

If you did not already know

转载自：https://analytixon.com/2018/12/27/if-you-did-not-already-know-591/

Michael Laux

发表于 2018-12-26

Apache Hadoop Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.
The Apache Hadoop framework is composed of the following modules:
· Hadoop Common – contains libraries and utilities needed by other Hadoop modules
· Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
· Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications.
· Hadoop MapReduce – a programming model for large scale data processing.
Hadoop is being regarded as one of the best platforms for storing and managing big data. It owes its success to its high data storage and processing scalability, low price/performance ratio, high performance, high availability, high schema flexibility, and its capability to handle all types of data. …

阅读全文 »

Statistical Assessments of AUC

转载自：http://feedproxy.google.com/~r/RBloggers/~3/a2AsB8bWN-A/

statcompute

发表于 2018-12-26

In the scorecard development, the area under ROC curve, also known as AUC, has been widely used to measure the performance of a risk scorecard. Given everything else equal, the scorecard with a higher AUC is considered more predictive than the one with a lower AUC. However, little attention has been paid to the statistical analysis of AUC itself during the scorecard development.

阅读全文 »

If you did not already know

转载自：https://analytixon.com/2018/12/26/if-you-did-not-already-know-590/

Michael Laux

发表于 2018-12-26

Task Embedded Coordinate Update (TECU) We in this paper propose a realizable framework TECU, which embeds task-specific strategies into update schemes of coordinate descent, for optimizing multivariate non-convex problems with coupled objective functions. On one hand, TECU is capable of improving algorithm efficiencies through embedding productive numerical algorithms, for optimizing univariate sub-problems with nice properties. From the other side, it also augments probabilities to receive desired results, by embedding advanced techniques in optimizations of realistic tasks. Integrating both numerical algorithms and advanced techniques together, TECU is proposed in a unified framework for solving a class of non-convex problems. Although the task embedded strategies bring inaccuracies in sub-problem optimizations, we provide a realizable criterion to control the errors, meanwhile, to ensure robust performances with rigid theoretical analyses. By respectively embedding ADMM and a residual-type CNN in our algorithm framework, the experimental results verify both efficiency and effectiveness of embedding task-oriented strategies in coordinate descent for solving practical problems. …

阅读全文 »

Will Julia Replace Python and R for Data Science?

转载自：https://dimensionless.in/will-julia-replace-python-and-r-for-data-science/

Thomas

发表于 2018-12-26

For those of you who don’t know, Julia is a multiple-paradigm (fully imperative, partially functional, and partially object-oriented) programming language designed for scientific and technical (read numerical) computing. It offers significant performance gains over Python (when used without optimization and vectorized computing using Cython and NumPy). Time to develop is reduced by a factor of 2x on average. Performance gains range in the range from 10x-30x over Python (R is even slower, so we don’t include it. R was not built for speed). Industry reports in 2016 indicated that Julia was a language with high potential and possibly the chance of becoming the best option for data science if it received advocacy and adoption by the community. Well, two years on, the 1.0 version of Julia was out inAugust 2018 (version 1.0), and it has the advocacy of the programming community and the adoption by a number of companies (see https://www.juliacomputing.com) as the preferred language for many domains – including data science.

阅读全文 »