Distilled News

More on Security Data Lakes – And FAIL!

Today I read a great Gartner note on data lake failures in general (‘How to Avoid Data Lake Failures’ [Gartner access required]). Thus, I wanted to share a few bits that, in my experience, are VERY relevant to security data lake efforts I´ve seen in recent years.

How to Avoid Data Lake Failures

Table of ContentsAnalysisFailure Scenario 1: ‘Enterprise Data Lake’• Governance Challenges• Semantic Consistency Challenges• Performance and Flexibility Challenges• Political and Cultural Challenges• How to Avoid This Failure ScenarioFailure Scenario 2: ‘Data Lake Is My Data and Analytics Strategy’• Mistaken Attempts to Replace Strategy Development With Infrastructure• Lack of Organizational Clout or Social Capital• Underestimation of the Immaturity of Data Management Capabilities• Misunderstanding of the Diverse Requirements of a Data and Analytics Platform for Digital Business• How to Avoid This Failure ScenarioFailure Scenario 3: ‘Infinite Data Lake’• Outdated or Irrelevant Data• Continuation of Immature Data Life Cycle Management Capabilities• Eventual Performance and Cost Challenges• How to Avoid This Failure ScenarioGartner Recommended Reading

AIOps Platforms

AIOps is an emerging technology and addresses something I´m a big fan of – improving IT Operations. So I asked fellow Gartner analyst Colin Fletcher for a guest blog on the topic…

Build High Performance Time Series Models using Auto ARIMA in Python and R

Picture this – You´ve been tasked with forecasting the price of the next iPhone and have been provided with historical data. This includes features like quarterly sales, month-on-month expenditure, and a whole host of things that come with Apple´s balance sheet. As a data scientist, which kind of problem would you classify this as? Time series modeling, of course. From predicting the sales of a product to estimating the electricity usage of households, time series forecasting is one of the core skills any data scientist is expected to know, if not master. There are a plethora of different techniques out there which you can use, and we will be covering one of the most effective ones, called Auto ARIMA, in this article.

A Simple Introduction to Facial Recognition (with Python codes)

Did you know that every time you upload a photo to Facebook, the platform uses facial recognition algorithms to identify the people in that image? Or that certain governments around the world use face recognition technology to identify and catch criminals? I don´t need to tell you that you can now unlock smartphones with your face! The applications of this sub-domain of computer vision are vast and businesses around the world are already reaping the benefits. The usage of face recognition models is only going to increase in the next few years so why not teach yourself how to build one from scratch?

Microsoft Garage releases a new data visualization tool for PC and Surface Hub

Microsoft Garage today released a new data visualization tool called Charts 3D for PC and Surface Hub. It allows users to to better visualize and explain complex graphs and charts using a more immersive, interactive 3D object.

Preprocessing for deep learning: from covariance matrix to image whitening

The goal of this post/notebook is to go from the basics of data preprocessing to modern techniques used in deep learning. My point is that we can use code (Python/Numpy etc.) to better understand abstract mathematical notions! Thinking by coding! ??

Free eBooks on Artificial Intelligence and Machine Learning

The Artificial Neural Networks handbook: Part 2

In last part we have seen the basics of Artificial intelligence and Artificial Neural Networks. As mentioned in the last part this part will be focused on applications of Artificial neural networks. ANN is very vast concept and we can find its applications anywhere. I have mentioned some of the major use cases here.

Solving Some Image Processing Problems with Python libraries – Part 3

In this article a few more popular image processing problems along with their solutions are going to be discussed. Python image processing libraries are going to be used to solve these problems.

Word Vectors in Natural Language Processing: Global Vectors (GloVe)

Another well-known model that learns vectors or words from their co-occurrence information, i.e. how frequently they appear together in large text corpora, is GlobalVectors (GloVe). While word2vec is a predictive model – a feed-forward neural network that learns vectors to improve the predictive ability, GloVe is a count-based model.

Transfer data from R to Python with PyRserve and Bio7

Recently I discovered the package PyRserve for Python which connects Python with R using Rserve. This is extremly useful because Bio7 already integrates Rserve and has special GUI interfaces available to transfer, e.g., data from spreadsheets, ImageJ image and selection data (also georeferenced), Java simulation data, etc. With this new Rserve connection this data can now easily transferred from Java and R to a Python workflow.

Playing around with RStudio Package Manager

Managing packages in production is a lot of work: you have to juggle between versions, internal packages, CRAN updates, Bioconductor, GitHub sources… Let´s have a look into RStudio Package Manager, one of the tools available that helps you dealing with this.

Markov Chain Analysis in R

In this tutorial, you’ll learn what Markov chain is and use it to analyze sales velocity data in R.

Announcing Optimus v2 – Agile Data Science Workflows Made Easy

A couple of years ago we were cleaning, processing and applying ML clustering algorithms for a retail client project. At the moment we were looking for a tool that let us wrangle data easily. We tried trifacta, a beautifully crafted tool that let you apply transformation in a visual way with a click and point interface. The problem was that the script language available was not enough to handle the data the way we like. We also try the amazing Pandas library, but our data was big enough to make it cry. So, almost a year ago we launch Optimus. Powered by Spark (Pyspark), Optimus lets you clean your data with your own or a set of pre-created data transformation functions, profile it and apply machine learning, all easily and with all power of python available.

Self-Service Data Prep Tools vs Enterprise-Level Solutions? 6 Lessons Learned

A detailed comparison between self-service data preparation tools and enterprise-level solutions, covering business strategy, accessible tools and solutions and more.

Like this:

Like Loading…

Related