Search CORE

39,938 research outputs found

Exploratory Analysis of Highly Heterogeneous Document Collections

Author: Blei D. M.
Bun K. K.
Maiya A. S.
Manning C. D.
Mihalcea R.
Pecina P.
Ranganathan S. R.
Wagstaff K.
Publication venue
Publication date: 01/01/2013
Field of study

We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

arXiv.org e-Print Archive

CiteSeerX

Crossref

An iterative approach for lexicon characterization in juridical context

Author: AMATO FLORA
MAZZEO ANTONINO
ROMANO SARA
SCIPPACERCOLA SERGIO
Publication venue: place:Roma
Publication date: 01/01/2010
Field of study

In the juridical context, knowledge management applications have a central role. In order to improve the effectiveness of document management procedures, techniques for automatic comprehension of textual content are required. In this work, a methodology for semi-automatic derivation of knowledge from document collections is proposed. In order to extract relevant information from document text, a process integrating both statistical and lexical approaches is applied. Moreover, we propose a system for the evaluation of the extracted peculiar lexicon quality. The system is used for the processing of heterogeneous documents corpus issued by Italy’s juridical domain

Archivio della ricerca - Università degli studi di Napoli Federico II

Living Knowledge

Author: Baldry Anthony
Dutta Biswanath
Giunchiglia Fausto
Maltese Vincenzo
Publication venue: Università degli Studi di Trento
Publication date: 01/03/2012
Field of study

Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

Unitn-eprints Research

Recommended from our members

Watson: a gateway for next generation semantic web applications

Author: Angeletou Sofia
d'Aquin Mathieu
Gridinoc Laurian
Motta Enrico
Sabou Marta
Publication venue
Publication date: 01/01/2007
Field of study

Open Research Online (The Open University)

TRECVID: benchmarking the effectiveness of information retrieval tasks on digital video

Author: Alan F. Smeaton
Alan F. Smeaton
See Profile
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Many research groups worldwide are now investigating techniques which can support information retrieval on archives of digital video and as groups move on to implement these techniques they inevitably try to evaluate the performance of their techniques in practical situations. The difficulty with doing this is that there is no test collection or any environment in which the effectiveness of video IR or video IR sub-tasks, can be evaluated and compared. The annual series of TREC exercises has, for over a decade, been benchmarking the effectiveness of systems in carrying out various information retrieval tasks on text and audio and has contributed to a huge improvement in many of these. Two years ago, a track was introduced which covers shot boundary detection, feature extraction and searching through archives of digital video. In this paper we present a summary of the activities in the TREC Video track in 2002 where 17 teams from across the world took part

CiteSeerX

Irish Universities

DCU Online Research Access Service