Quest for perfect Search Engine --Part2
Mon Feb 09 00:00:00 IST 2009, by Abhishek Mehta Bookmark and Share

"Apple" the search engine and you will be rewarded with millions of web fruits. There are many kinds of apples and at many places in the world. Or I should rather say, all kinds of apple exist in all parts of world but are differentiated based on their color, role, age, usability, taste, and performance and of course context. Certainly previous sentence isn't weird since we all know apple is in electronics, you love to eat apple and Adam's apple isn't named after every individual. Now should we read the results of all apples to get to the right one or should we extend "microchip" to our apple or try something, which cluster the entire web results automatically into visually different result sets.

Quest for perfect search engine continues from Part-1. In the previous blog of this series, I looked into interesting engines like Powerset, Hakia, Kosmix and Cluuz. In this blog I will be discussing another branch of engines, broadly classified as "Clustering engines". So, lets see what exactly is a

Clustering Engine

"Clustering is an act of grouping into clusters. A cluster is grouping of things that can occur together. Clustering engine is a software application, which can classify a resultant heterogeneous set into smaller but more coherent homogeneous sub sets."

Clustering engines group the search results in different categories based on similarity of the text, word frequency and proximity of the phrases/words found in the resultant documents. Rather then trying to map user's keywords to the results semantically, or just literally, clustering engines employ statistical methods of Text Mining. They use predefined taxonomies and vocabularies to group results into clusters and name them for user's understanding based on weighted graphs (mathematically speaking) generated internally. Here are few good clustering engines to try.


Kartoo is my personal pick in visual clustering engines category. Being a fan of visual search, I must say that they have a nice flash based layout along with a balanced use of content and display. Once you have reached on some kind of consensus after doing your research on the Kartoo, you might like to preserve your steps/efforts, which went in finding the relevant information. So user-friendly features like: ability to save, load and print your visual map, zoom in and out of the clusters using the clicks comes handy. You don't have to do login and logout stuff for using these features.

A non-visual clustering is also available on this link non-visual-kartoo. Non-visual clustering is impressive with the abilities of including and excluding the clusters for specific views of the results.

This is a Meta search engine, meaning it combines results from various result sources. MSN and yahoo are the few search engines behind the nicely clustered results of Kartoo.


Clusty as the name suggests is a clustering engine (Meta search), from a company named Vivisimo. This company provides Enterprise, Federated and Clustering search solutions to the customers. But since clusty is free initiative for the personal use of web searchers so, one can use this for productivity enhancement without being sued.

Sources of clusty's cluster are the engines like live.com, ask.com, yahoo news, and open directory. This website can cluster the web results (obviously), along with that you can also cluster wikipedia, blogs, news, jobs and images. Another interesting clusters, which can be formed, are based on types search engine used and type of websites (.com, .net) from where results were gathered.


From Russia with Love" comes the Quintura. This is one of the most effective clustering engines with very beautiful graphical user interface. Quintura does context based search visualization and context management using neural networks, as one of their patents says. You can cluster web, images, videos and Amazon.

Quintura is also a Meta search engine relying mainly on yahoo's index for its clusters and its own-patented technology (7,437,370) for displaying the cloud of the cluster. It has very nice user interface with "on mouse over" kind of cluster expansion and contractions. Saving the cluster, map and reloading are the features provided for your results. Quintura definitely is one of finest clustering engines available on the web with very higher ratings.

Here is the list of some other interesting clustering engines. Their order is alphabetical rather then based on features, usability or recommendation by www.abhishekmehta.com:

 

CarrotSearch

Iboogie

Kooltorch

MooTer

mnemomap

qksearch

Webclust

Grokker

Xclustering

Clustering Vs Semantics in Nutshell:
Clustering engines fall short of semantic engines on the scale of language processing, context understanding, Polysemy, synonimity, vernacular, capturing negations and ontology. They use more of syntactic constructs of language, pattern recognition, and phrase proximity, LSA rather then forging into semantic aspect of context understanding. As currently there is no accurate semantic search engine in the sight, so we can fall back of clustering engines for the some more years to come.



In Part 3 of this blogging series I will take a look into the worlds of Google/ yahoo/MSN and their efforts to make them future attractive, Plus some other catchier efforts.


Comments:

I could not find any mention of Cuil. Did you not find it worthy enough for a mention.

Posted by Neeraj on March 04, 2009 at 12:21 PM IST #

Hi Neeraj, Cuil certainly is a promising engine but it cannot be categorized as 'Clustering engine'. The category suggestion (from Cuil), which show up to the user are not created on the fly. They exist in some kind of strictly predefined ontology or taxonomy. Try the word 'Trigent' in Cuil and then in any of the suggested engines. Cuil's cluster does not understand this word and hence will not show any category suggestions but clustering engines will form a cluster around this word 'Trigent'. But none the less, this is an interesting engines which will be covered in Part 3 of the series along with zoominfo, Google search wiki, Searchtogather, snap, and some others.

Posted by Admin on March 05, 2009 at 11:03 AM IST #

Check this: http://www.wolframalpha.com/

Posted by 59.160.73.114 on May 29, 2009 at 05:30 PM IST #


Post a Comment:
  • HTML Syntax: Allowed

Related Blogs



Computer- Internet - Privacy: Integral to child education


Google Reconciliatory note - The Murdoch Effect


©2008-2009 Abhishek Mehta All Rights Reserved

All content on this website and in whitepapers released by AbhishekMehta.com is proprietary, reproduction in any form without permission is prohibited.