When it comes to searching the World Wide Web, it reminds me of my old science lessons on the universe. I learned that the entire universe comprised of exactly two things: matter and energy. Over the last two decades we find out that the matter and energy that we can see adds up to about 5% of the universe.
Same with
the Internet. (I know you were wondering where I was going with this (smile).) Google is a fantastic tool, but Google, Yahoo, Bing - most of your primary search tools can only search roughly from 10 to 30 percent of the Web. That's right about 70 to 90 percent of the Internet is currently unsearchable. This is known as - wait for it - The Dark Web. If you want to search a site with dynamic content, or requires the filling out of forms to access a site, well you're probably out of luck.
Wowd is a new player in the search space that promises to search the Deep Internet where others cannot tread, wih a different cloud
model, and performance that defies explanation. Wowd founder
Borislav Agapiev is combining about $50K in hardware and 25 engineers in Serbia
to turn search on its head. Rather than thousands of servers in the cloud in a
centralized location or two being accessed via Google, Wowd is using what CEO
Mark Drummond calls a "write symmetrical content distribution network".
Essentially, Wowd is attempting to do for search and discovery (and the dark web in particular) that Skype did for internet phone service.
Google's algorithms are processing how different pages are related to each
other, and how relevant that data is to what you type in on the search page.
Most of the time, many of us find this information to be pretty effective, but
a Craig's List won't appear, neither will many publicly available government
websites because you need to enter form data, then enter. Since these pages are not
indexed searching them is next to impossible. Wowd takes a different approach, by asking you to download their tracking software on your computer (you are asked to allocate 2GB on your hard drive). Combine that with hundreds of thousands of users - perhaps millions - and search results will be
a bit like SETi @ Home - where the aggregate of users will produce powerful, relevant searches that will go places that other search engines can not. 70,000 users have enrolled on Wowd since October. At this point, they aren't returning images, but they believe that discovery will be a big selling point. I spoke to Mark Drummond recently on a wide range of topics, and we discussed the theory of a user-driven, distributive cloud, how they overcame performance concerns, and what they are looking to do in the future:
GB: What distinguishes Wowd from it's competitors?
MD: We ask three questions. One, "where does [the search results] come from?" Google for example gets pages and tweets. We search all sites, everywhere. Secondly, "How does it get ranked?"
We rank through blogs, tweets, anything. Index the web in real time, and while
there is a bias towards newer information, we believe relevance isn't always time-dependent. Third, we ask the question "How does a person experience the web?" We call it "discovery and search". As [one of our people] calls it, we swing from vine to vine through the jungle of internet information. We have a feature called the Hot List, which contains things that are popular at that moment relevant to your search.
The smart thing about Google when it was developed is that the scoring is based on how pages link to other pages. A page result was good as it related to other pages, which was fine when creating websites was difficult. Now, you find grad students who can, with the click of a mouse, create millions of pages that point to each other and create a crafted topography that can give your site a high page ranking, and that pollutes things. We believe that our page scores are always real because we have human beings with limited time hitting these sites
GB: How long have you been building this reference indexing?
MD: Real time indexing began in October. We're not bad at reference search, and we're getting better all the time. The focus right now is being kick-ass at being real-time and relevant, but over time we'll be better at search. As 'now' becomes the past, we become increasingly good at searching the past. But it will take time to become very good.
MD: Interesting that you would say that, because we like to say "This is not your Dad's Napster or BitTorrent." It is a bit like BitTorrent or Skype, but there is a difference. Peer 2 Peer allows people to share data from their local computer - we're exactly NOT like that. We're a system that finds publicly available web pages.
GB: So how does it work?
GB: That is comforting to privacy advocates, but what about a true bad guy?
MD: The government can legally wiretap the ISP where the PC originates; we're the 3rd party and do not believe that we should even have that information.
GB: So what is actually stored on my PC when I perform a search?MD: Your local Wowd client actually stores nothing when you do a search in Wowd. We've been asked to store and able to retrieve the search query, and we're looking into that, but even if the search query is stored, it will be stored purely locally on your own computer and will be inaccessible to Wowd as a company.
GB: So is it accurate to call this "Group Indexing"?
GB: Why thank you (mental note to self, must trademark.) So why Group Index? What is going on from the time I install to the time I run a search?
MD: Here's a Thought Experiment for you: Imagine that you could build a crawler that would hit every page on every web site every second of every day. That would be the best real-time search, but you would have this undifferentiated glut of data.
We build an "Attention Frontier" which is the set of things on the network that the user is interested in at that time. What we get by building the the Attention Frontier is that is allows us to differentiate between what we are looking at and what is of actual interest to people. When you download our software, you only interact with your own local machine. The local client will allow you to set up a localized topic cloud, as well as a few other features that get turned on such as our "Hot Topics" cloud.
GB: What is the Topic Cloud?
MD: These tell you what correlated with the search results you generated. We've gotten great feedback from users, because it allows them to search the Real Time Web in a way they have never searched before.
GB: So Wowd doesn't "stash" search results?
MD: Not on your computer. You vote, you index, but the information in stored in the Wowd compute cloud.
GB: Does that take a lot of space?
MD: We don't store that much on your computer, up to 2GB for hash table related storage. It is configurable, but most of our users don't change the settings. Just a few megabytes (of indexing info) are stored on the PC.
GB: So you store most of the data on your system?
MD: We have a backbone and datacenter which we use it to "prime the pump", if you will, when new users first come in. Then it is not needed after that as our user network grows. Two months ago, our ISP had a major failure and the backbone went down. Total failure, but no one noticed. We didn't miss a beat because it is all distributed.GB: Let's talk performance, what does Wowd bring with it's approach?
MD: When it comes to Search and Discovery, the key constraint is on sustained query serving. How many queries per second can you do? Why do content delivery networks (CDNs) make everything faster? It's because the machines that hold the content that has been requested are closer in IP hop terms.If you ask for data it will be closer to you if you're using a CDN than if you're not. That is what we are doing. We are a 'write symmetrical content distribution network'.That means that the operation of "writing" or "publishing" information in the content distribution network is part of the operation of the CDN. Normal CDNs require the content to be published to the distribution system in a way that's not symmetrical with the normal operation of the CDN itself.
With Wowd, the content distribution (and redistribution) is part of the normal operation of the network.
In a centralized setup, every request you make is going to a big honking central data center. In our system, you ask your neighbor, and then they ask their neighbor - if necessary. Chances are, they have the information you need, especially as our network of users gets larger. This is faster than a centralized architecture.
GB: How will browser performance affect Wowd and vice versa?
MD: Not such a big role for us. Browsers differ a lot in their ability to execute JavaScript, but that's not so important for Wowd.
GB: I understand that attempting to run searches using a Skype-esq model, to turn another phrase, presented a major performance issue.
MD: For search engines in general, storing index information is relatively easy -- it's delivering speedy response to a search query that's hard. Wowd's focus has been on delivering information that's only a few seconds old, and this means that legacy "batch architectures" don't work well. The technical focus for Wowd has been on handling information -- literally at the scale of trillions of URLs -- while still being able to deliver fast response to search queries, and where the results are just a few seconds old. Delivering ultra-fresh information, quickly, and at scale -- that's what Wowd has been focused on.
The search guys turned nauseous when we suggested that their performance problem could be solved with a distributive architecture. We would talk to Skype and they would look ill when we said that we would take their architecture to solve our search problem. Boris [founder Borislav Agapiev] was asking me, "Why are they acting like this?"
Good news for us, one of our patents that is in process has been approved, but not issued. It explains how to tie the resources of a bunch of dinky underpowered machine and turn it into a search, recovery, and recommendation supercomputer.
GB: How does performance compare to Google, Yahoo, or Bing?
MD: At scale, the performance of a distributed system like Wowd can be much, much better than of centralized architectures that are used by the existing search providers. Centralized architectures are like skyscrapers. The more clients you add to a fixed-sized central server, the slower the system gets. Wowd is distributed, and is more like a geodesic dome in structure, in that it gets stronger as it gets larger. Since all the people who join the network bring a little bit of computing power to the table, the more people that get faster, the more the overall system gets faster and more robust.GB: How are you able to search The Dark Web? Why is it not possible for many search engines to search a Craig's List for example?
MD: For multiple reasons "Terms Of Service" sites are hard to search. Craig's List does not allow robots, because their content changes so quickly. You can understand that, their content changes so fast that if it was indexed, they would have to have two to three times the server infrastructure to handle real-time indexing. Sites such as USPTO (U.S. Patent and Trade Office) requires forms to be filled out to get to the info.
We can get to the Craig's List info in a different way that doesn't violate terms of Service on Craig's List. We're indexing the page in the "wake" of the human being that went there. That also allows us to go to a form site such as the PTO.
GB: What do you want to leave a user with in their experience?
MD: Wowd is about control - the ranking we do is citizen-powered, the freshness are defined by human beings. All this math takes into account human beings - unlike other systems that count links as votes and is publisher-controlled. We put the person reading the web site in control. It is amazing how some of our customers have used Wowd search to fuel discovery, which drives more search.
[The interview was conducted and edited for readability by G.H. Brooks.]
Comments