Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10]

07/01/2007 23 min

Listen "Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10]"

Descargar episodio Ver en sitio original

Episode Synopsis

Intro: Right before the 2006
holidays Jimmy Wales, creator of the online encyclopedia Wikipedia, announced
the Search Wikia project. This project will rely on search results based on the
future sites community of users. In this podcast we take a look at popular
search engine technologies and discuss the Search Wikia project concept.

Question: I know this project was
really just announced. Before we get into the technology involved - can you
tell us what phase the project is in?According to the BBC Jimmy Wales is currently recruiting
people to work for the company and he's buying hardware to get the site up and
running.

Question: What makes this concept
fundamentally different than what Google or Yahoo! Are doing?When Wales announced the project he came
right out and said it was needed because the existing search systems for the
net were "broken". They were broken, he said, because they lacked
freedom, community, accountability and transparency.

Question: This sounds a lot like digg - am I on the
right track?Yes you are - what you end up with
is a digg like application, or what Wales is calling, a
"people-powered" search site.

Question: Can you provide a bit more
detail on how Google works?Googlebot is Google's web crawling
robot. Googlebot finds pages in two ways: through an add URL form, www.google.com/addurl.html,
and through finding links by crawling the web.

Source: www.google.com

Question: That's Googlebot, how does
the indexer work?Googlebot gives the indexer the
full text of the pages it finds. These pages are stored in Google's index
database. This index is sorted alphabetically by search term, with each index
entry storing a list of documents in which the term appears and the location
within the text where it occurs. This data structure allows rapid access to
documents that contain user query terms.

Source: www.google.com

Question: So now that everything is
indexed, can you describe the search query?The query processor has several
parts, including the user interface (search box), the "engine" that
evaluates queries and matches them to relevant documents, and the results
formatter.

PageRank
is Google's system for ranking web pages. A page with a higher PageRank is
deemed more important and is more likely to be listed above a page with a lower
PageRank.

Source: www.google.com

Question: Can you run us through,
step by step, a Google search query?Sure - this is also off of Google's
site, Here's the steps in a typical query process:

1. User accesses google server at
google.com and makes query.

2. The web server sends the query
to the index servers. The content inside the index servers is similar to the
index in the back of a book--it tells which pages contain the words that match
any particular query term.

3. The query travels to the doc
servers, which actually retrieve the stored documents. Snippets are generated
to describe each search result.

4. The search results are returned
to the user in a fraction of a second.

Source: www.google.com

Question: OK, so now we know how
Google and Yahoo! How will this new Search Wikia type search engines work.I can give some details based on
what I've taken a look at. As we've said the Search Wikia project will not rely
on computer algorithms to determine how relevant webpages are to keywords.
Instead the results generated by the search engine will be decided and edited
by the users.

There are a couple of projects
called Nutch and Lucene, along with some others that can now provide the
background infrastructure needed to generate a new kind of search engine, which
relies on human intelligence to do what algorithms cannot. Let's take a quick
look at these projects.

Lucene: Lucene is a free and
open source information retrieval API, originally implemented in Java by Doug
Cutting. It is supported by the Apache Software Foundation and is released
under the Apache Software License.

We mentioned Nutch earlier. Nutch
is a project to develop an open source search engine. Nutch is supported by the
Apache Software Foundation, and is a subproject of Lucene since 2005.

With Search Wikia Jimmy Wales hopes to build on Lucene and Nutch by adding the social component. What we'll end up with in the end is more intelligent and
social based search tools. Now, don't think Google, Yahoo!, Microsoft and all
the rest are not working on these kinds of technologies. It will be interesting
to watch how these new technologies and methods are implemented.

Sources: http://search.wikia.comhttp://search.wikia.com/wiki/Nutchhttp://lucene.apache.org/java/docs/
http://wikipedia.org/

References:

Wikipedia creator
turns to search: http://news.bbc.co.uk/2/hi/technology/6216619.stm

How
Google Works: http://www.googleguide.com/google_works.html

Search Wikia website:
http://search.wikia.com

Search Wikia Nutch website
http://search.wikia.com/wiki/Nutch

Lucene Website: http://lucene.apache.org/java/docs/

Wikipedia Website:
http://wikipedia.org/

More episodes of the podcast Professor

Evolving Engineering Education: AI's Impact on the Classroom 21/04/2025

Grokking: The "Aha!" Moment in Artificial Intelligence 14/04/2025

Ten Minutes with OP-TEC Webmaster Ian Anderson 09/04/2017

Hacking Car Anti-collision Systems [19:08] 28/08/2016

Lock It and Still Lose It [24:11] 15/08/2016

Intro To Pokemon Go [31:14] 24/07/2016

4K Ultra High Definition Television [22:37] 07/05/2016

Engineering Technology and Engineering Degrees – What is the Difference [20:30] 29/04/2016

FCC Spectrum Auction 2016 [32:00] 10/04/2016

What You Need to Know About Ransomware [32:04] 03/04/2016

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10]

Listen "Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10]"

Episode Synopsis

More episodes of the podcast Professor

Dot COM: The Internet’s dominant TLD

WWW. Is it obsolete or not? Should we use it?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD