English Español Deutsch Français Italiano Português (Brasil) Русский 中文 日本語

#SEOisAEO: The Knowledge Graph - How Well is Google doing in 'understanding the world'?



Definitions of Knowledge Graph



"It's an encyclopedia that a machine can delve into and understand without any human help.” Luke Cherno Del Coro

"Knowledge Graphs are a large network of entities with semantic types, properties and relationships between those entities." M. Kroestch and G. Weikum. Nice and simple.

"Knowledge Graphs are particularly appropriate to the continuous flow of large amounts of data from diverse, heterogeneous sources where the ontology of the data can be extended and revised algorithmically as new data arrives." Jo Stichbury


That is brilliant. And especially with the Topic Layer just being announced, that last little part takes on extra special meaning.

Taking it further still, Jo says that Knowledge Graphs can benefit from “the application of various graph computing techniques and algorithms which add additional intelligence over the stored data. For example, allowing a bit of information to be derived from explicitly asserted data.”I love that last little bit because it's this idea of taking the information we have and extrapolating new information from it. Which is certainly something important for the future - if you can link enough entities together, with specific attributes, you can then guess an attribute of another linked entity. That will come.

In practical terms for #SEOisAEO, the Knowledge Graph can be seen as Google’s understanding of the world. And Google's understanding can be greatly helped by communication on our part. We have this great opportunity to communicate to Google who we are, what we do and what our offers are.

How well has Google done in growing an accurate Knowledge Graph since 2012? And how big is it?


Bill: So how big is the Knowledge Graph? I think the last I heard, which was a couple of years ago: 570 million entities and over 20 billion facts. Obviously, the Knowledge Graph has grown a heck of a lot since then. I don't know where it's at right now. What I think is really compelling is “how accurate is it? How well has it done? “

And I have what is probably a semi-unpopular opinion on that. When it came out, I thought: "Wow, this is huge. This is great. Google's moving into a semantic search model. Google's doing everything the way the framework was designed to." We got the triple store: all the entities, the relationships, the attributes. Everything working together. But today, I get frustrated when seeing the limits of how it is implemented. And sometimes the accuracy is a little bit off.

Google looks at a lot of sources - everything from Schema to maps, to Google Plus, to Wikipedia, Wikidata. And those can be wrong a lot. Wikipedia can be wrong a hell of a lot. One example is of a really well-known company - the Knowledge Graph had their CEO wrong for a long time, despite the correct information being out there. The information could have been validated from several sources but wasn’t. The same thing happened with the Knowledge card for my company, Greenlane. It says that Greenlane was started in Philadelphia. In fact, it was started in a city next door called Reddick. I submitted the feedback form and I said, "Hey, this information is wrong." They wrote back and said, "No, it's right. It matches what we believe based on everything that we're finding online." I said, "But I'm the source. I'm the guy who started the company. I can tell you it's wrong." But they didn't take it. I think that's a problem.

And then, as we were saying before the webinar, I used to be a musician, and it’s got that right. But it lists a whole bunch of songs that aren’t mine - they are terrible songs and I only write good songs :) And I think it's not doing as good a job as I had hoped it would've done in terms of content extraction that isn't marked up. I think it can extract more. I want to see it extract more… and also be a little bit more accurate. Maybe Google's struggling to figure out how to gather information reliably and gauge trust.

Jason: I think the word trust is a good point. And Google's being scaredy-pussyfoot and it won't put the data out there if it isn't incredibly sure of it. I get the feeling, Microsoft's a bit more adventurous. To summarise, you were incredibly enthusiastic in 2014. You're less enthusiastic now.

Bill: Less enthusiastic. I think I'm waiting for something to really impress me, right? Google created this second brain and they're capable of serving the information specifically. I'm just waiting for a little bit more to come from that other brain.

Jason: Brilliant. Bill's disappointed.

Bill: I'm one of those glass-half-empty guys.

Jason: Haha :) The number of Knowledge Graph opportunities for brands is humongous. Paul suggested to me:

The bigger the brand, the more there is to optimize. Can you give us the lowdown on that?


Paul: The bigger the brand, the more entities there are. The more topics there are. The more stuff there is to get right. The more stuff there is to get wrong. Some businesses operate across various industries, various verticals and potentially various languages. That’s a lot of information to optimize. Take it further and think about “what are the topical entities for a brand”. That can get pretty complex. In the case of university medical centers: the medical center itself, hospitals, doctors, departments etc. Add to that the decentralized and siloed nature of medical centers, we need to look at the university as a whole. At that level, Sports is a big, big driver of finance and money and status, so the football team becomes an incredibly important entity, perhaps an even stronger brand point than the university itself. So you need to expand your Knowledge Graph management to the entities around that: rosters, video, players, news, schedules, and fixtures. And there’s more! So you can look into state costs, the rate of graduation, notable alumni, rankings and even test scores. Quick point about Doctors. People want to own their own Knowledge Graph entries and so there's a question about who owns the Knowledge Graph entries for these entities within the brand. Then the weird thing is, once you start looking at the overarching understanding Google has of the brand, you start getting more mixed results. It always seems a little bit haphazard. It always seems like Google's guessing a little bit, dependent on what's important at the time. So it's really important that brands understand what their own particular knowledge graph is because it can be slightly different within verticals.

Jason: Yeah, brilliant stuff. So get your company understood... then all the constituent parts. And both will serve each other. Understanding the doctors helps in understanding the university and understanding the university helps to understand the doctors. In fact, there's a lot of barnacling and self-referencing that can go on there.

Paul: Exactly.

Jason: By the way, I have been working on pushing information about me in the Knowledge Graph for a few years. I use the Knowledge Graph API - https://kalicube.pro/knowledge-graph-explorer - to track the relevancy score. And what I've seen is, when I stop actively working on pushing own name, the relevancy score that it throws back goes down. Is that a question of losing understanding? Losing confidence? Or just the other Jason Barnards have come more to the fore and my relative importance has gone down? So that's a question. If anyone has an answer to that out there, including you, Paul, please do give me the answer.

How can understanding the Knowledge Graph help improve rankings?


Bill: The statement “Google has a lot more information than we know”, is very true. If you want to rank better, first you've gotta do everything you do with traditional SEO. But Google wants to get away from just ranking things based on keywords, they need to be able to understand content a little better. So you need to think about entities and the knowledge graph. Think about the information Google has, and understands, in what I call its ‘second brain’.Let's say you're writing an article on the Philadelphia Eagles. You can look at the knowledge card, and you get a little bit of information on what Google knows. They know that Doug Pederson is the coach of the Eagles. They know that the Philadelphia Eagles are an NFL team… and, given the way this relationship-driven database works, they're able to infer that Doug Pederson is an NFL coach. That type of figuring things out is the whole magic about a semantic search. So when you're writing something about the Eagles, it's a great idea to take some of the entities that Google knows and pepper them in, as long as it makes sense. Don't spam it in like we used to with keyword stuffing, but think  “if I'm writing about the Eagles, these are some things that I need to mention to give a little bit more assurance to Google that I'm talking about the Philadelphia Eagles and not just a bird.” First, take a look at the SERPs that you want to rank for... look at the intent… make sure that you have a piece of content that fits the intent... and then go through the listings that are doing well and look for the entities. After that, use one of the tools out there that help take a look at related entities: Google Cloud natural language, Alchemy  API … take a look at the entities they suggest and look at how you can work them inAnd it works really well. When we write content with those entities in mind, we are seeing a very good increase in visibility.

Jason: Great! I'm going to diss Google a bit more here and I use their NLP cloud and I've been pretty disappointed with it. I plan to start looking into DiffBot and WorldLift, because apparently, they've got something going on that's really interesting. Next up - I saw this from Evan Berkovich: "Google maps is a great example of what the future search should look like. Augmented public data powered by a knowledge graph, Google has thousands of employees on staff constantly correcting errors, machine learning, interpreting addresses. Cars driving through streets getting ground-level data, satellites taking photos, and millions of phones constantly sending in data. With this infrastructure, Google is able to maintain a real-time representation of the world and answer geospatial queries that have never been asked."What I understand from this is that Google maps is basically a knowledge graph that is functioning … and doing incredibly well. For example, if we type into Google maps, "I want to go from A to B", it can tell us how to get there, even if it's never done that trip before. Another example: if I ask “where's a restaurant where I can just have a coffee?” It's got the attributes of the café near me where it can say, "You can just have a coffee there".In short, we have all these entities, which are businesses, towns, roads. Each with attributes: with/without tolls for the roads, can just have a coffee / have to eat for a restaurant, disabled toilets are / are not available, et cetera, et cetera. Plus the phones are sending in shitloads of data constantly, meaning it can do real-time traffic or affluence in a local business because it's dealing with this data in real time. Here’s a real-life experience: I was using Google maps to find my hotel in Luton. And Luton has got the world's worst one-way system, and Google had all the one-way streets wrong. And I ended up getting incredibly annoyed with it. I'm so used to Google maps getting it right that I now always expect it to get it right the first time. I've come to rely on what I now see as an active operational knowledge graph. Would you agree with that, Christine? Or am I getting over-excited about absolutely nothing?

Google Maps as a Knowledge Graph - is that inspiring or is it a dead duck?


Christine: No, no, that makes a lot of sense. However, a lot of that kind of knowledge is going to be a lot easier than language. Because we know where the coffee shop is and where the directions are and what the streets are we can easily make a good representation of a situation. But remember that that type of fact is relatively easy to interpret... Whereas things are much harder with the knowledge graph itself because they are dealing with language and that’s much more difficult than a map result (where there's a lot of data that's already known and doesn't need to be interpreted). Language is incredibly complex, and Google can't do natural language processing. By the way, there is that misunderstanding that Google does natural language processing in the search results, but it can't. And that's why we have the Knowledge Graph. We need an interpreter. Google needs something to tell it what the entities are, and what the relationships are. In short, language is extremely complex, and Maps are probably a much simpler way for Google to apply a graph system.

Jason: Yeah, brilliant. Thank you. You've brought me down to earth. I think that's a really, really nice comment. Moving on. It's not just Google. We have Amazon, who have their Knowledge Graph and Product Graph. We have Facebook, who are obviously very big on graphs. We have Microsoft, who Martha was saying last week are coming back very strongly with their knowledge graph. We have Diffbot who claim to have 10 billion entities and one trillion facts. Aaron Bradley suggested that Google’s Knowledge Graph is not particularly good, but they're shouting loud and making a big marketing deal out of it. All mouth and no trousers.

So who is the best? Google, Facebook, Microsoft, Diffbot, Amazon?


Bill: Be wary of "are you the best?". You could be the best, you could have the best technology, you can have the most information... but it's all about how that information is going to get used. Google has the advantage. They have the platform to put their knowledge graph to good use. So, I don't know who really is the best, but I think it's really important not to brag about how great you are if people aren't using it if it's not coming into play.

Jason: Great stuff. Now, knowledge-based trust versus PageRank. Luna Dong created this idea of knowledge-based trust and suggests that PageRank cannot be used to judge the credibility of a knowledge source. She used the example of gossip sites. A gossip site will have a very high PageRank, but the information it's providing is untrustworthy.

Knowledge-based trust. For a Knowledge Graph, do we forget about PageRank?


Christine: I don't think they'll ever forget about PageRank :) PageRank has been there since day one and I don't think it's going anywhere. And they may use something like knowledge-based trust for facts to make sure they get it right, but I don't think it's something that would replace PageRank.

Jason: But to fill their knowledge graph, Amazon are going to have some kind of knowledge-based trust because they don't have PageRank to rely on.

Christine: True, very true. When we say the knowledge graph, we often think of only Google, but you're right, they existed before Google said "the" knowledge graph. As Aaron said, that is a marketing term. Knowledge graphs have existed for a very, very long time.

Jason: So what it comes down to is: everyone building a knowledge graph from non-curated data is going to need some kind of knowledge-based trust. And Amazon, from what I understand, are doing a lot of work on that. Great!

Ultimately, how are they going to make money out of Knowledge Graphs?


Paul: Certain informational SERPs aren't really driving any kind of revenue, so maybe Google can try to push users from there to SERPs that have more paid results - kind of piggy-backing. Perhaps the “neighbour” model will work for them. They can start pushing paid elements in certain informative parts of the knowledge graph. Perhaps around events, appointments and things like that. But I think what they're really gunning for is a huge disruption. One place that will see a huge disruption is the medical knowledge graph. We've seen that with Mayo clinic taking overdiagnosis of patient symptoms. Right now, that's not money-making, because it's so far up the funnel. But it has changed the game already: healthcare operations have stopped creating content around symptoms because Google's got that sewn up. They have no opportunity. So, what are the big four playing at? Apple have health records on forms and they're starting to do life-alerts and things along those lines. Google Duplex is an interesting play that aims at appointments. Microsoft opened up their Healthcare division last year. And then Amazon are looking at insurance with Berkshire Hathaway and JP Morgan Chase. All of these are massive disruption plays. And if they all start playing at the same time, it could get quite ridiculous. They are all amassing enormous amounts of data behind the scenes. And they are using AI to exploit it. For example, in some cases, they have a 99% success rate when using image AI to identify metastatic breast cancer, whereas human pathologists have a 62% failure rate. I can see a scenario not too far away where people can take a picture of a mole on Google lens, send it to a telemedicine doctor, who can then have the image analyzed in the medical knowledge graph, then Duplex can set up an appointment with the local GP, sort out your insurance and order your prescription drugs from Amazon, Google or whatever. And that seems that it would work extremely nicely, all told. That is a major disruption.

Jason: Okay, brilliant. That's a bit of a step up from my Google Maps example - there's actually a commercial goal at the end, in that they can push people right along the acquisition path from informational search to purchasing something. So the monetizing of the knowledge graph is being able to catch people at one point along the funnel and then push them through to purchasing.

Paul: Yes, yes.

Jason: Question: is the medical part of the knowledge graph particularly well-developed?

Paul: Yes. The data is leaps ahead of the rest.

Jason: So that's a space to watch. I can look at healthcare and just copy what they're doing.

Paul: Pretty much.

Jason: Wow. Next up. Here's my resumé about the different knowledge graphs.

  1. Google wants to understand the whole world,

  2. Diffbot have great claims to the amount of knowledge they have, and they feed some big players - E-bay, Yandex, and DuckDuckGo.

  3. Amazon, arguably, concentrating on the knowledge that is on a path to purchase, but they're keeping pretty quiet about what they're doing.

  4. Microsoft is possibly very strong, but they're keeping pretty quiet.

  5. Apple have a walled garden, and they're still keeping pretty quiet, but they've got all their data locked in through the IOS.

  6. IBM won jeopardy in 2011, and they haven't done very much since.

  7. Facebook - I have no idea.

How do you see the future of these knowledge graphs?


Bill: Remember I said I'm sad that Google is under-exploiting its knowledge graph? Well, the things I was shitting on will, at some point, be implemented a little bit better. Eventually, the real value of what a full, healthy knowledge graph can do will be finally obtained by Google users.

Paul: Yes. And it'll be very, very interesting to see how all this unfolds, and also how quickly this unfolds. As an example, in just two years, the understanding of the university I cited earlier has become more accurate and we can now nail searches with modifiers for particular topics around our entity. Google are now getting it right more often than they're getting it wrong, and I think that's encouraging. They still screw it up quite badly sometimes and are sometimes spammed. But it's definitely getting better.

Christine: In the long term, they need to move along the path towards natural language processing in order to pick up on all these entities. Then they need to use machine learning and human interaction to correct what they've got wrong. Long term, they will get to the point where the knowledge graph is feeding itself and do away with having any interpreters. Gary Ilyes has said they'd like to get away from Schema in about five years. That's the direction that they're taking. I like to think about a child who is learning. Their algorithms and everything at this point are in their infancy, maybe like a one or two-year-old child; but they are shooting for full-blown language skills within the next five to seven years.

Jason: Yeah, brilliant. In their infancy. Not quite what you were saying, but I really like the concept of educating a child. That idea of looking at knowledge graph and saying, "Okay, we've got a child here, we need to educate it." And for a child to learn, it needs simple information that it can understand, it needs drip feeding and it requires that the information is confirmed and re-confirmed and re-confirmed by trusted sources - parents, teachers, friends, grandparents, sisters, brothers. That is much the same for a knowledge graphLastly, the position zero profile. An important part of this series.

How can the knowledge graph affect position zero?


Bill: Bill Clinton plays the saxophone. If you actually type in "What instrument does Bill Clinton play?", you get a whole knowledge card in position zero on the saxophone. That kind of vague result makes me think there's more opportunity for Google to get that right, and make a very valuable position zero. By extension, that is a really good opportunity for marketers to try to get that position zero by optimizing for the knowledge graph or optimizing for rich answers. That's the thing that voice search is really big on. It's getting that rich answer. So optimization isn't just about making people click the little blue links, it's about making sure that your message is passed through, even when you're using voice search. As SEOs, we need to think beyond just getting the clicks. Google has changed, and we have to change with it because it's not going to go back.

Jason: Okay. Brilliant stuff. Paul, have you got a comment about position zero or anything that knowledge graph brings to the table that can be useful to us as marketers?

Paul: I think one thing that's interesting with the new Pixel and the new Assistant routines. And the way that Google has got prescriptive with the way that it's working people through particular workflows. We can aim to influence that. The better we do there, the better we'll be able to provide useful information to the user and give a better expérience throughout the user journey.

Jason: Brilliant. What I hear there is “pull the user in at the beginning of their journey, then make sure we accompany them from the moment they asked the first informational question about what we can potentially sell them, right through to when we actually sell it to them. And not losing them along the way.”

Paul: Exactly. And I read something the other day. The Holy Grail really is to be invisible. People aren't supposed to know that you're there, with the knowledge that we have. These answers can be given and can help people and also give them what they require without you being in the way.

Christine: More concretely - right now position zero's kind of a blend of different kinds of content that's pulled directly from pages or schema…. And in truth quite a variety of sources. The knowledge graph can help make all that extraction a little more accurate. Sometimes position0 can pull up the completely wrong result, just because it's pulled the HTML from an inaccurate page. As it gets better and more sophisticated and becomes more able to understand the factual nature of the data, the Knowledge graph can greatly help.

Jason: Great! We've come to the end of the chat. I'd like to thank Bill, Christine, and Paul. That was really, really, really, really cool. I got a lot out of it. For the next episode, “how to get entities in the knowledge graph”, please do join me next week, same time.

Bill, Christine, Paul, you've been absolutely wonderful and I thank you.


Check out other webinars from this series