Distilled News

Why Knowledge Graphs Are Foundational to Artificial Intelligence

AI is poised to drive the next wave of technological disruption across industries. Like previous technology revolutions in Web and mobile, however, there will be huge dividends for those organizations who can harness this technology for competitive advantage. I spend a lot of time working with customers, many of whom are investing significant time and effort in building AI applications for this very reason. From the outside, these applications couldn´t be more diverse – fraud detection, retail recommendation engines, knowledge sharing – but I see a sweeping opportunity across the board: context. Without context (who the user is, what they are searching for, what similar users have searched for in the past, and how all these connections play together) these AI applications may never reach their full potential. Context is data, and as a data geek, that is profoundly exciting. We´re now looking at things, not strings.

How Not to Get Lost in 2018 with Knowledge Graphs: Map, Graph, Go!

Losing your way is easy. Much of Data Modeling in the search, analytics and reporting spaces have been focused on the fabulous five W-words. The hope is to try to answer the Why-question. We have been throwing technologies at this for quite some years now: Plain old Data Modeling, Semantics, ‘hyperindexes’, ontologies, topic maps, Data Warehouses, Operational Data Stores, multidimensional OLAP, mapping-intensive ETL, key/value pairs, Big Data and now also Data Catalogs, Data Lakes, and Knowledge Graphs.

WTF is a knowledge graph?

I´ve worked in the technology industry for 20 years and much has changed in that time. But one thing that hasn´t is the firehose of unfamiliar terminology used, abused, and generally misused, usually in a range of different contexts. These days, it´s easier than ever to look up the definition of new jargon, but, every so often I hit a phrase that leaves me in a loop of Google searches and open browser tabs. Some phrases seem to defy clear description: I´ve already covered ‘ontology’ in a previous article. Here, I aim to simplify another such example. You too may have heard the phrase ‘knowledge graph’ and turned to Google to find out what it means. Did you ever work it out? If not, read on. All will be explained…

There and back again: Outlier detection between statistical reasoning and data mining algorithms

Outlier detection has been a topic in statistics for centuries. Over mainly the last two decades, there has been also an increasing interest in the database and data mining community to develop scalable methods for outlier detection. Initially based on statistical reasoning, however, these methods soon lost the direct probabilistic interpretability of the derived outlier scores. Here, we detail from a joint point of view of data mining and statistics the roots and the path of development of statistical outlier detection and of database-related data mining methods for outlier detection. We discuss their inherent meaning, review approaches to again find a statistically meaningful interpretation of outlier scores, and sketch related current research topics.

How to Find (and Grow) the Best Traffic Sources from Your Analytics Data

It´s easy to get caught up in vanity metrics when trying to grow a website´s traffic. Users, page views and session counts are meaningless if they don´t reflect your site objective; attracting thousands of visitors with the wrong profiles won´t impact your bottom-line as much as attracting a few hundred good leads. Finding the best traffic sources for your site means continuously testing new sources of traffic. As well as you think you know your users and customers, you won´t be able to tell whether a traffic source will bring qualified traffic without actually testing.

Introduction to recommender systems

The most wonderful and most frustrating characteristic of the Internet is its excessive supply of content. As a result, many of today´s commercial giants are not content providers, but content distributors. The success of companies such as Amazon, Netflix, YouTube and Spotify relies on their ability to effectively deliver relevant and novel content to users. However, with such a vast array of content at their fingertips, the search space becomes near impossible to navigate with traditional search methods. It is therefore essential for businesses to exploit the data at their disposal to find similarities between products and user behaviours, in order to make relevant recommendations to users.

BooST series I: Advantage in Smooth Functions

This is the first of a series of post on the BooST (Boosting Smooth Trees). If you missed the first post introducing the model click here and if you want to see the full article click here. The BooST is a model that uses Smooth Trees as base learners, which makes it possible to approximate the derivative of the underlying model. In this post, we will show some examples on generated data of how the BooST approximates the derivatives and we also will discuss how the BooST may be a good choice when dealing with smooth functions if compared to the usual discrete Regression Trees.

Enabling reliable, secure collaboration on data science and machine learning projects

Machine learning researchers often prototype new ideas using Jupyter, Scala, or R Studio notebooks, which is a great way for individuals to experiment and share their results. But in an enterprise setting, individuals cannot work in isolation – many developers, perhaps from different departments, need to collaborate on projects simultaneously, and securely. I recently spoke with IBM´s Paul Taylor to find out how IBM Watson Studio is scaling machine learning to enterprise-level, collaborative projects.

How to perform consensus clustering without overfitting and reject the null hypothesis

The Monti et al. (2003) consensus clustering algorithm is one of the most widely used class discovery techniques in the genome sciences and is commonly used to cluster transcriptomic, epigenetic, proteomic, and a range of other types of data. It can automatically decide the number of classes (K), by resampling the data and for each K (e.g. 2-10) calculating a consensus rate of how frequently each pair of samples are sampled together by a given clustering algorithm, e.g. PAM. These consensus rates form a consensus matrix. A perfect consensus matrix for any given K would just consist of 0s and 1s because all samples always cluster together or not together over all resampling iterations. Whereas values around 0.5 would indicate the clustering is less clear. However, as ?enbabaoglu et al. (?2014) recently pointed out, consensus clustering is subject to false discoveries, like many clustering algorithms are, as they usually do not test the null hypothesis K=1 (no structure). We also found the consensus clustering algorithm overfits, this is why we developed M3C which was inspired by the GAP-statistic.

Like this:

Like Loading…

Related