Search CORE

35 research outputs found

Query Driven Conceptual Browsing : A Semi-Automated Approach for Building and Exploring Concepts on the Web

Author: Jerath Kinshuk
Padmanabhan Venu Balaji
Publication venue: ScholarlyCommons
Publication date: 01/12/2005
Field of study

The presence of communities, which are groups of highly cross referenced pages together representing a single concept, is a striking feature of the World Wide Web. Quite often a group of communities, each topically coherent within itself, may be related through a common concept manifested in each of them. Motivated by this observation, we present a method for query-driven conceptual browsing for exploring concepts on the Web starting from a userspecified query. We show how this idea is related to prior work on learning concept maps and on Web Mining, and discuss the application of conceptual browsing for user-driven exploration and discovery of new concepts on the Web

ScholarlyCommons@Penn

Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

Author: Geraci Filippo
Maggini Marco
Pellegrini Marco
Sebastiani Fabrizio
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Strik- ing the right balance between running time and cluster well- formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the ?y by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowl- edge. Clustering is performed by means of a fast version of the furthest-point-?rst algorithm for metric k-center cluster- ing. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering ef- fectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Di- rectory Project hierarchy. According to two widely accepted external\u27 metrics of clustering quality, Armil achieves bet- ter performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz ma- chine, Armil performs clustering and labelling altogether in less than one second

Archivio della Ricerca - Università degli Studi di Siena

PUblication MAnagement

Ranking clustered Keyword Search On Semi structured data

Author: Dayananda P, Dr. Rajashree Shettar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2014
Field of study

No Abstrac

International Journal on Recent and Innovation Trends in Computing and Communication

Refinement of Document Clustering by Using NMF

Author: 佐々木稔
新納浩幸
Publication venue: The Korean Society for Language and Information (KSLI)
Publication date: 01/01/2007
Field of study

PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

Waseda University Repository

Mobile information retrieval with search results clustering: Prototypes and evaluations

Author: Berenci
Buyukkokten
Carpineto
Carpineto
Carpineto
Chen
Cheng
Church
Church
Cutting
Di Giacomo
Ferragina
Ferragina
Ganter
Geraci
Hearst
Hersh
Jansen
Jones
Jones
Jones
Jones
Kamvar
Kamvar
Kamvar
Kummamuru
Lawrie
Lawrie
Maarek
Osinski
Osinski
Otterbacher
Priss
Schütze
Sohn
Spiliopoulou
Sweeney
Zamir
Zamir
Zeng
Zhai
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Narrative-based taxonomy distillation for effective indexing of text collections.

Author: Cataldi Mario
Sapino Maria Luisa
Sel&#231
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Institutional Research Information System University of Turin

Learn from web search logs to organize search results

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Crossref

Rapid Exploitation and Analysis of Documents

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Recommended from our members

Diversity Maximization Under Matroid Constraints

Author: Abbassi Zeinab
Mirrokni Vahab S.
Thakur Mayur
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Aggregator websites typically present documents in the form of representative clusters. In order for users to get a broader perspective,it is important to deliver a diversified set of representative documents in those clusters. One approach to diversification is to maximize the average dissimilarity among documents. Another way to capture diversity is to avoid showing several documents from the same category (e.g. from the same news channel). We model the latter approach as a (partition) matroid constraint, and study diversity maximization problems under matroid constraints. We present the first constant-factor approximation algorithm for this problem,using a new technique. Our local search 0:5-approximation algorithm is also the first constant-factor approximation for the maxdispersion problem under matroid constraints. Our combinatorial proof technique for maximizing diversity under matroid constraints uses the existence of a family of Latin squares which may also be of independent interest. In order to apply these diversity maximization algorithms in the context of aggregator websites and as a preprocessing step for our diversity maximization tool, we develop greedy clustering algorithms that maximize weighted coverage of a predefined set of topics. Our algorithms are based on computing a set of cluster centers, where clusters are formed around them. We show the better performance of our algorithms for diversity and coverage maximization by running experiments on real (Twitter) and synthetic data in the context of real-time search over micro-posts. Finally we perform a user study validating our algorithms and diversity metrics

Columbia University Academic Commons