Google 2.0: Why MIT scientists are building a new search engine - Big Think

Membership

This content is locked. Please login or become a member.

Become a member Login

Google 2.0: Why MIT scientists are building a new search engine

W. Daniel Hillis

Peter Hopkins: Among other projects—you're doing lots of stuff—you get involved in some very heady questions about the origins of truth on the internet. And this is where we're getting folks because the work that Danny's describing now in theory ultimately became a venture, right? Metaweb.

Danny Hillis: So that's right. So what I really thought is that what we need to do is have a way of representing the knowledge of the world in a way that machines can get at them, and take advantage of it—and that that should be shared. Everybody should be able to get at it. That is, in some sense if the human knowledge isn't a shared resource—then what is? I mean what has civilization been doing all these years? So I created a company that built this database called Freebase. It was a free database. And the company basically took any kind of public knowledge that we could get, information about anything and put it in machine-readable format.

We were kind of creating with the idea that this is going to be useful to the world. We didn't really have a business model. And we started building it up, and then it became useful to lots of different people including particularly all the search engines. So eventually Google bought it, of course. And then I got Google to agree to keep it open for three years, but they only kept the part that was already open open, and they started building it up. And so now Google has something called the Knowledge Graph which is the evolution of this. And it probably has about 100 billion different entities. So everybody in this room is in that graph. This building is in that graph.

Peter Hopkins: Yes, I took a screenshot earlier of when you just Googled NeueHouse, and all of these different—

Danny Hillis: That's right. NeueHouse is obviously in the graph. So this event is, and yes. So anything like a person, a place, an event. Anything like that is in this huge knowledge base, and all the relationships between them are. So when you, for instance, print out a Google map, that is rendered from the Knowledge Graph; so the Knowledge Graph knows the bus schedules and it knows the address of the restaurant and the traffic.

Peter Hopkins: It's drawing all this information together around the thing that the searcher cares about.

Danny Hillis: That's right. So the map is just in some sense a custom rendering of a piece of the Knowledge Graph for your particular purpose. And also by the way, I don't know – this doesn't have any ads on it, but the other thing is that the ads are also like a lot of Knowledge Graph about what the products are about and whether—it probably has knowledge about you, specifically, and so on. So it's gone way beyond the kind of public knowledge, also again it probably has very particular private knowledge about people too.

Peter Hopkins: Now, from Google's perspective it's safe to say that this is a quantum leap in terms of the original basis of its citation-based search model. All of a sudden it is now providing this multidimensional search that is drawing in way more richness.

Danny Hillis: It still does the old kind of search. So right now when you, let's say I put in museums of New York. You know, "museums in New York." Well, it still does the old keyword search of searching for pages that have the word "museum" and the phrase "New York," but it doesn't—if you say "an exhibition in Manhattan" or something, you might have something that's a museum in New York that actually didn't use the word "museum" and "New York" on the page. But the Knowledge Graph knows that Manhattan is in New York, and it knows that exhibitions are in museums, or may know something is a museum even if it doesn't use the word museum in its title.

And so it's actually able to pick that up even though it's not, it doesn't have the keyword. So that will play into the search results that come up. It does a search that's based on the semantics. And, of course, that's very important because that kind of knowledge is completely language independent too. So the same knowledge that informs your search in English also informs somebody's search in Mandarin or Hindi or something like that.

So the good news is it's turned out to be really useful. There are these big representations of knowledge. But the bad news is the whole idea of it being this free, open thing that everybody was going to use has actually become really just something that is a competitive advantage of Google, and now other search engines and other companies will make their own I'm sure. Apple is working on it, Amazon, you know. Each of the big companies – IBM, Microsoft. They'll each work on their own database. So the world could go in one of two directions: We could either have this sort of oligarchy of big companies that have giant knowledge bases that they use for proprietary advantage, or it could flip over and say it becomes a public resource, that we could say "We want knowledge to be a public resource. And we want, in particular, knowledge that's tied to who said what," because this is not, it doesn't represent truth, remember! It represents who said stuff and that becomes then a resource for doing things like sorting out what's fake news or deciding what medical treatments, what effects are in the scientific literature, things like that that really don't align very well with commercial goals.

Peter Hopkins: And this is where Underlay comes in. Underlay in many respects is your attempt to kind of reclaim this technology as the public good that you kind of initially envisioned it as.

Danny Hillis: Yes, it's my penance for having sold the other one to Google.

Peter Hopkins: So I'm actually stuck on the screen here. I thought there was a very nice paragraph on the very simple Underlay website, which basically in written terms explains kind of what it's attempting to do. And it says The Underlay aggregates statements and reported observations, along with citations of who made and who published them. For example, it would not contain the bare assertion that "Sudan's population was 39M in 2008", but rather that "Sudan's population was 'provisionally' 39M in 2008, according to the UN's statistics division in 2011, referencing Sudan's national census, as reported by its Central Bureau of Statistics, and as contested by the Southern People's Liberation Movement."

Danny Hillis: And it would do that not in those words, but in a kind of machine-readable.

Peter Hopkins: Right. So that those could be – and ultimately this version of what you are going at becomes almost a kind of record of all of these observations over time, and then can be tracked. So if we wanted to get to the heart of, let's say, whether in one of these hearings we just watched, somebody said one or the other, we could trace it potentially back to the first recorded incidents.

Danny Hillis: Yes. And if you take a problem like that I would regard that as an application of the Underlay, just like Google Maps and say drawing a map is. But if you take sorting through fake news and recognizing when rumors are getting out of control, in order to do that you really need a very complex representation of who's saying what. So you can kind of trace whether this person said that or this person said that this person said that. Or the New York Times said that, you know, the Drudge Report said that. And so there is something that needs to be built on top of the Underlay that is essentially a network of trust for that purpose. So somebody has to say well, okay, I trust New York Times more than I trust Fox News or vice versa.

Peter Hopkins: And these would be organizations or individuals with some sort of framework of analysis that would leverage the Underlay for interpretative purposes.

Danny Hillis: And it's going to be for different purposes. I mean an awful lot of the things that people argue about—I mean, is Taiwan a province of China? Well, you know, if you're doing something with the Chinese government you've got to count it as one. If you're doing something with Taiwan you're probably not going to count it. So for some purposes it "is", for some purposes it "isn't". And so what's the truth of that? Well there isn't exactly a truth. It's, you know, what's the purpose, what's the trust in it? and so on. And many of these – so I sort of feel like the Underlay is, in some sense it's a piece of the plumbing that we need to deal with the fact that the amount of information has become overwhelming, that no human can hold it all in their heads. Nobody can be sort of familiar with all the news sources or things like that. And then that lets us build these things on top of it where computers help us be smarter in sort of navigating these networks of trust.

Peter Hopkins: And so you were conceiving of this challenge—This is in the mid, early 2000s and what was the first inklings of an approach that technology could provide to addressing this, and to kind of capturing the chain, if you will, of custody of information.

Danny Hillis: So the idea was to build something that basically said what the agreed on the things that you were talking about, the entities that you were talking about—Let people make statements about the relationships between them but then have some provenance of who made those statements, so that instead of recording that "the glass is sitting on the table," you record, "Danny said the glass is sitting on the table on such and such a day." And then once you have all that information recorded then that lets you, first of all it lets you record information without worrying to much about whether it's true. It's true that I said that, which is much easier to determine than whether it's true that the glass is actually on the table. But then it also lets you apply basically your idea of trust afterwards, after you get more information about who I am—or later you find out I'm a liar or later you find out the glass was someplace else.

Peter Hopkins: You can weigh those previous recordings against it.

Danny Hillis: Exactly. So the idea is that what we really need to do is we need to separate up two things.

We need to separate the record of what different people said and who said it—the provenance of what was said—And then separately have in some sense a network of trust which is going to be different for different purposes.

Ultimately there's lots of kinds of knowledge that I think really are fundamentally part of the public common, the public good. And I hope that those will end up in it, and I think it's not as complicated as copyright law where you're taking the expression of the individual artist and things like that. A fact is a fact. It's not copyrightable, to own truth. If somebody figures out the geographical location of this building, that's just a truth. Nobody owns that. And, really, it's to everybody's advantage to share that.

Peter Hopkins: Among other projects—you're doing lots of stuff—you get involved in some very heady questions about the origins of truth on the internet. And this is where we're getting folks because the work that Danny's describing now in theory ultimately became a venture, right? Metaweb.

Danny Hillis: So that's right. So what I really thought is that what we need to do is have a way of representing the knowledge of the world in a way that machines can get at them, and take advantage of it—and that that should be shared. Everybody should be able to get at it. That is, in some sense if the human knowledge isn't a shared resource—then what is? I mean what has civilization been doing all these years? So I created a company that built this database called Freebase. It was a free database. And the company basically took any kind of public knowledge that we could get, information about anything and put it in machine-readable format.

We were kind of creating with the idea that this is going to be useful to the world. We didn't really have a business model. And we started building it up, and then it became useful to lots of different people including particularly all the search engines. So eventually Google bought it, of course. And then I got Google to agree to keep it open for three years, but they only kept the part that was already open open, and they started building it up. And so now Google has something called the Knowledge Graph which is the evolution of this. And it probably has about 100 billion different entities. So everybody in this room is in that graph. This building is in that graph.

Peter Hopkins: Yes, I took a screenshot earlier of when you just Googled NeueHouse, and all of these different—

Danny Hillis: That's right. NeueHouse is obviously in the graph. So this event is, and yes. So anything like a person, a place, an event. Anything like that is in this huge knowledge base, and all the relationships between them are. So when you, for instance, print out a Google map, that is rendered from the Knowledge Graph; so the Knowledge Graph knows the bus schedules and it knows the address of the restaurant and the traffic.

Peter Hopkins: It's drawing all this information together around the thing that the searcher cares about.

Danny Hillis: That's right. So the map is just in some sense a custom rendering of a piece of the Knowledge Graph for your particular purpose. And also by the way, I don't know – this doesn't have any ads on it, but the other thing is that the ads are also like a lot of Knowledge Graph about what the products are about and whether—it probably has knowledge about you, specifically, and so on. So it's gone way beyond the kind of public knowledge, also again it probably has very particular private knowledge about people too.

Peter Hopkins: Now, from Google's perspective it's safe to say that this is a quantum leap in terms of the original basis of its citation-based search model. All of a sudden it is now providing this multidimensional search that is drawing in way more richness.

Danny Hillis: It still does the old kind of search. So right now when you, let's say I put in museums of New York. You know, "museums in New York." Well, it still does the old keyword search of searching for pages that have the word "museum" and the phrase "New York," but it doesn't—if you say "an exhibition in Manhattan" or something, you might have something that's a museum in New York that actually didn't use the word "museum" and "New York" on the page. But the Knowledge Graph knows that Manhattan is in New York, and it knows that exhibitions are in museums, or may know something is a museum even if it doesn't use the word museum in its title.

And so it's actually able to pick that up even though it's not, it doesn't have the keyword. So that will play into the search results that come up. It does a search that's based on the semantics. And, of course, that's very important because that kind of knowledge is completely language independent too. So the same knowledge that informs your search in English also informs somebody's search in Mandarin or Hindi or something like that.

So the good news is it's turned out to be really useful. There are these big representations of knowledge. But the bad news is the whole idea of it being this free, open thing that everybody was going to use has actually become really just something that is a competitive advantage of Google, and now other search engines and other companies will make their own I'm sure. Apple is working on it, Amazon, you know. Each of the big companies – IBM, Microsoft. They'll each work on their own database. So the world could go in one of two directions: We could either have this sort of oligarchy of big companies that have giant knowledge bases that they use for proprietary advantage, or it could flip over and say it becomes a public resource, that we could say "We want knowledge to be a public resource. And we want, in particular, knowledge that's tied to who said what," because this is not, it doesn't represent truth, remember! It represents who said stuff and that becomes then a resource for doing things like sorting out what's fake news or deciding what medical treatments, what effects are in the scientific literature, things like that that really don't align very well with commercial goals.

Peter Hopkins: And this is where Underlay comes in. Underlay in many respects is your attempt to kind of reclaim this technology as the public good that you kind of initially envisioned it as.

Danny Hillis: Yes, it's my penance for having sold the other one to Google.

Peter Hopkins: So I'm actually stuck on the screen here. I thought there was a very nice paragraph on the very simple Underlay website, which basically in written terms explains kind of what it's attempting to do. And it says The Underlay aggregates statements and reported observations, along with citations of who made and who published them. For example, it would not contain the bare assertion that "Sudan's population was 39M in 2008", but rather that "Sudan's population was 'provisionally' 39M in 2008, according to the UN's statistics division in 2011, referencing Sudan's national census, as reported by its Central Bureau of Statistics, and as contested by the Southern People's Liberation Movement."

Danny Hillis: And it would do that not in those words, but in a kind of machine-readable.

Peter Hopkins: Right. So that those could be – and ultimately this version of what you are going at becomes almost a kind of record of all of these observations over time, and then can be tracked. So if we wanted to get to the heart of, let's say, whether in one of these hearings we just watched, somebody said one or the other, we could trace it potentially back to the first recorded incidents.

Danny Hillis: Yes. And if you take a problem like that I would regard that as an application of the Underlay, just like Google Maps and say drawing a map is. But if you take sorting through fake news and recognizing when rumors are getting out of control, in order to do that you really need a very complex representation of who's saying what. So you can kind of trace whether this person said that or this person said that this person said that. Or the New York Times said that, you know, the Drudge Report said that. And so there is something that needs to be built on top of the Underlay that is essentially a network of trust for that purpose. So somebody has to say well, okay, I trust New York Times more than I trust Fox News or vice versa.

Peter Hopkins: And these would be organizations or individuals with some sort of framework of analysis that would leverage the Underlay for interpretative purposes.

Danny Hillis: And it's going to be for different purposes. I mean an awful lot of the things that people argue about—I mean, is Taiwan a province of China? Well, you know, if you're doing something with the Chinese government you've got to count it as one. If you're doing something with Taiwan you're probably not going to count it. So for some purposes it "is", for some purposes it "isn't". And so what's the truth of that? Well there isn't exactly a truth. It's, you know, what's the purpose, what's the trust in it? and so on. And many of these – so I sort of feel like the Underlay is, in some sense it's a piece of the plumbing that we need to deal with the fact that the amount of information has become overwhelming, that no human can hold it all in their heads. Nobody can be sort of familiar with all the news sources or things like that. And then that lets us build these things on top of it where computers help us be smarter in sort of navigating these networks of trust.

Peter Hopkins: And so you were conceiving of this challenge—This is in the mid, early 2000s and what was the first inklings of an approach that technology could provide to addressing this, and to kind of capturing the chain, if you will, of custody of information.

Danny Hillis: So the idea was to build something that basically said what the agreed on the things that you were talking about, the entities that you were talking about—Let people make statements about the relationships between them but then have some provenance of who made those statements, so that instead of recording that "the glass is sitting on the table," you record, "Danny said the glass is sitting on the table on such and such a day." And then once you have all that information recorded then that lets you, first of all it lets you record information without worrying to much about whether it's true. It's true that I said that, which is much easier to determine than whether it's true that the glass is actually on the table. But then it also lets you apply basically your idea of trust afterwards, after you get more information about who I am—or later you find out I'm a liar or later you find out the glass was someplace else.

Peter Hopkins: You can weigh those previous recordings against it.

Danny Hillis: Exactly. So the idea is that what we really need to do is we need to separate up two things.

We need to separate the record of what different people said and who said it—the provenance of what was said—And then separately have in some sense a network of trust which is going to be different for different purposes.

Ultimately there's lots of kinds of knowledge that I think really are fundamentally part of the public common, the public good. And I hope that those will end up in it, and I think it's not as complicated as copyright law where you're taking the expression of the individual artist and things like that. A fact is a fact. It's not copyrightable, to own truth. If somebody figures out the geographical location of this building, that's just a truth. Nobody owns that. And, really, it's to everybody's advantage to share that.

Learn from the world's biggest thinkers.

© Copyright 2007-2026 & BIG THINK, BIG THINK PLUS, SMARTER FASTER trademarks owned by Freethink Media, Inc. All rights reserved.