Advanced News API search: leveraging DBpedia entity types

When we’re searching or monitoring news content online, we specify the people, organizations, or things that interest us in order to return relevant stories. But sometimes, we can’t list every individual thing we are interested in searching for content about.

For example, think about how you difficult it would be to make a comprehensive query to search for news stories about:

  • all universities that are announcing new research projects,

  • the impact a natural disaster is having on every stock exchange in the world,

  • every company that announces C-level changes.

Technically you could search for news by listing every university or every stock exchange in the world, but this would be incredibly inefficient and time-consuming. What we need in order to make these kinds of queries is a way to look for stories about entire classes of things, in other words – stories mentioning entity types.

Luckily, the News API extracts information about exactly this. In this blog, we’re going to show you how to leverage entity types to make intelligent searches that supply you with the content to meet your needs, even when those needs aren’t clearly defined.

What is an entity type?

Every time the News API gathers and indexes a story, one of the many things that it analyzes is what entities are being talked about in the story. It does this by recognizing where people, companies, and other things are mentioned in the text and then finding the correct DBpedia resource for each of them. This enables us to not only understand the exact entity being talked about (whether a mention of “apple” refers to the fruit or the company) but it also provides us with additional information about these entities.

One of these pieces of information is the entity type – the category of things that the entity belongs to. For example, AYLIEN is an entity with entity type of company, Steve Jobs is an entity with the entity types of person and executive.

Take a look at the JSON entity results that the News API would return for a sentence in a story that reads “Tim Cook works for Apple”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

"entities": [
 {
 "text": "Tim Cook",
 "score": 1,
 "**types**": [**"Executive", "Person", "Agent"**],
 "links": {"dbpedia": "http://dbpedia.org/resource/Tim_Cook"}
 },
 {
 "text": "Apple",
 "score": 0.980344831943512,
 "**types**": [**"Organisation", "Company"," Agent"**],
 "links": {"dbpedia": "http://dbpedia.org/resource/Apple_Inc."}
 }]

You can see that the types fields provide some useful information: when searching for news, this information allows us to search for stories that mention any company or any executive. So using entity types in our search lets us make a really broad search query, retrieving more relevant results by adding just one parameter.

What kind of results do entity types return?

When we leverage the entity type parameter, we can retrieve results that are more relevant to our use case. To take a simple example, if we are interested in news that is relevant to Facebook’s share price, adding the entity type “Index” (which refers to share indices) narrows the results to stories that reference any share index.

Take a look at the headlines for the three most recent stories in English that include “Facebook” in the title with and without the entity type parameter.

Without “Index” as an entity type parameter:  
“EU seeks regular updates from Facebook, Twitter on Russian disinformation”  
“Facebook gave preferential data access to certain companies”  
“Indoor Location-based Search and Advertising Market to Witness Huge Growth by 2025 Google, Cisco, Facebook – openPR’”

When we add the entity type parameter, the results are narrowed down to stories that mention any stock exchange. This gives us results that are still really broad in subject matter, but that all mention the performance of Facebook’s stock: |With “Index” as an entity type parameter: |“Facebook Looks Undervalued On Several Metrics”| |“Facebook, Inc. (FB) Shares Bought by Truepoint Inc.”| |“Facebook (FB) Gains But Lags Market: What You Should Know”|

Adding the entity type parameter to your News API query is really simple – it takes just one line in your script. Just add “entities.body.type:” along with a list of the types you’re interested in. So to make a search for stock indices like we did above, just add the parameter like this:

1
2
3

"entities.body.type:" ["Index"]