Abhishek Mehta Logo
   Rss Feed                 Twitter

Quest for perfect Search Engine --Part1
Fri Jan 09 00:00:00 IST 2009, by Abhishek Mehta Bookmark and Share

Google indexes billions of pages per year and by now has a trillion unique URLs at your service for searching. Considering that some of us cannot read due to digital or biological factors, this information is impossible to read for 6.7 billion human beings worldwide. Search engines are able to keep pace with the information explosion due to advancement in processing speed and cheaper memory. On the flip side, for an individual, time spent on finding relevant and authentic information on the web is going up and over. Web Searches are not done for the quantum of HTML or speed but for their relevance. Results are relevant to the eyes of a needy seeker as the meaning unfolds deep inside his neurons.

"Looking for Bill Gates in Europe" has its sanity in all the European countries {Germany, England...} rather than in 6 characters 'E u r o p e' and If document is in Danish telling you about information on 'Bill Gates in Europa', you missed it. Do not even think of putting "before 1978" as that won't do any good to your valuable time. Due to the limitations in the search engines today, users have formed an image that the top search results are the relevant ones, otherwise keywords are inadequate or 'lets start reading it all till I hit the wall'. But the issue lies with the search engines, which take the keywords as collection of characters just the way they look (Europe).

Finding the results based on actual meaning of the words, phrase or paragraph is distant but there is a silver lining. Many efforts are underway to get to the all troubled researcher out there and add some value to everyone involved. Though nothing currently matches the ever-demanding quest for perfect search genie but some initiatives are worth giving a look.


Powerset is an engine for searching wikipedia articles, and was acquired by Microsoft for whooping 100 million $ in August of 2008. This web site can answer simple questions whenever possible, can explore simple phrases using natural language processing and can relate your search to the newer (but relevant) facts/concepts in various results. It compiles an interesting dossier, which can be navigated with comparative ease.

Simple question: "When did Mahatma Gandhi died?" returned exact answer. Exploration of the very first result article (if you want to dig deep) returns interesting navigational structure for the document under scrutiny. But the Key part is that "navigational structure" can be turned into a fact based one ("Show Factz"). This helps the user to navigate through sentences, based on facts understood from each sentence of the article.

If this was interesting then best is still to come: Simple search on "Mahatma Gandhi", produced set of results and a fact bar. This "fact bar" collects the action verb based listing of Gandhi's activities over all the results found in all wikipedia articles. User can directly connect to all the articles based on Gandhi's relations with his actions. For things, places and people wikipedia can be trusted (relatively) for focused information and power set has the ability to summarize that information for the user. Thumbs up for this application and worth a buy for Microsoft, if used constructively.

By the way, do try the Microsoft's "Live Search" option coming out on each power set result page and the value of this tool will become crystal clear. Web community will be waiting for Powerset to come out from just wikipedia to open grounds.


Cluuz is an another promising engine, which has the abilities to extract the people, things, companies, phone numbers, emails, addresses and domains from the result pages of your search and interrelate them to form denser semantic graph. This gives user an advantage to correlate the search keywords with visible clues and extend the search just with a click rather then keep on mapping the brain for newer keywords. It picks up the images from the result pages and gives you a kind of preview of what you can expect in the page.

Best part of the website comes with something called as Semantic Graph of a cluster. This is a graphical component. It displays which result page leads to generation/extraction of what relations and how many result pages leads to the generation of one concept. Enhancement of this "Semantic Graph" in the future for extending the search inside the graph itself, without leaving it, will be a good feature to have.


If you are not interested in all these options, clustering and facts based grouping then try Hakia. This is Vanilla semantic search engine, which lives on the power of their semantic algorithms to show you the results. It is in beta stage and shows relevant results for basic questions and simple phrases. One limiting factor or blessing in disguise is that they show the results from pages, which have been termed credible by their internal mechanisms (from credible Web sites recommended by librarians).

It provides you with API's to form your own semantic search engine (for ya geeky lads). But when it comes to be nearly semantic, this site still has some way to go and that is the case will all of the others also.


Do you remember the guys who founded Junglee, one of the first shopping search engines and later sold it to Amazon? Ok, so bunch of them is back with Kosmix and continuously getting funded for one stage after another. Kosmix offers the much better organized view of the result world. Search is based on their own indexes (and their partners) but the best part is organized and categorized results. Result page is categorized based on media audio/video, search engine, blogs and much more.

News, Blogs, Media (published, audio, video) are one of the major form of data available on the web and also the one, which can contaminate your results unwillingly. Kosmix certainly adds value to searches since one can directly go to category of choice since they are already segregated. Kosmix is also making efforts on machine generated related topics for your keywords, but it still has a lot to do in this area.



In Part 2 of this series "Quest for perfect Search Engine", I will explore the different stream of search engines called clustering engines. There is good count of them in the market and certainly show more organized results than plain keyword stemming engines.


Comments:


Post a Comment:
  • HTML Syntax: Allowed

Related Blogs



Computer- Internet - Privacy: Integral to child education


Google Reconciliatory note - The Murdoch Effect


©2008-2009 Abhishek Mehta All Rights Reserved

All content on this website and in whitepapers released by AbhishekMehta.com is proprietary, reproduction in any form without permission is prohibited.