Search CORE

69 research outputs found

A Formal Model for Information Selection in Multi-Sentence Text Extraction

Author: Filatova Elena
Hatzivassiloglou Vasileios
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units with an associated mapping between these two dimensions. This representation is then used to describe the task of selecting textual units for a summary or answer as a formal optimization task. We provide approximation algorithms and empirically validate the performance of the proposed model when used with two very different sets of features, words and atomic events

Crossref

Columbia University Academic Commons

Recommended from our members

Text-based approaches for non-topical image categorization

Author: Sable Carl L.
Hatzivassiloglou Vasileios
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2000
Field of study

The rapid expansion of multimedia digital collections brings to the fore the need for classifying not only text documents but their embedded non-textual parts as well. We propose a model for basing classification of multimedia on broad, non-topical features, and show how information on targeted nearby pieces of text can be used to effectively classify photographs on a first such feature, distinguishing between indoor and outdoor images. We examine several variations to a TF*IDF-based approach for this task, empirically analyze their effects, and evaluate our system on a large collection of images from current news newsgroups. In addition, we investigate alternative classification and evaluation methods, and the effects that secondary features have on indoor/outdoor classification. Using density estimation over the raw TF*IDF values, we obtain a classification accuracy of 82%, a number that outperforms baseline estimates and earlier, image-based approaches, at least in the domain of news articles, and that nears the accuracy of humans who perform the same task with access to comparable information

Columbia University Academic Commons

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Recommended from our members

Using Density Estimation to Improve Text Categorization

Author: Hatzivassiloglou Vasileios
McKeown Kathleen
Sable Carl
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

This paper explores the use of a statistical technique known as density estimation to potentially improve the results of text categorization systems which label documents by computing similarities between documents and categories. In addition to potentially improving a system's overall accuracy, density estimation converts similarity scores to probabilities. These probabilities provide confidence measures for a system's predictions which are easily interpretable and could potentially help to combine results of various systems. We discuss the results of three complete experiments on three separate data sets applying density estimation to the results of a TF*IDF/Rocchio system, and we compare these results to those of many competing approaches

Columbia University Academic Commons

Recommended from our members

Generation and Evaluation of Intraoperative Inferences for Automated Health Care Briefings on Patient Status After Bypass Surgery

Author: Concepcion Kristian
Feiner Steven
Hatzivassiloglou Vasileios
Jordan Desmond
McKeown Kathleen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

The authors present a system that scans electronic records from cardiac surgery and uses inference rules to identify and classify abnormal events (e.g., hypertension) that may occur during critical surgical points (e.g., start of bypass). This vital information is used as the content of automatically generated briefings designed by MAGIC, a multimedia system that they are developing to brief intensive care unit clinicians on patient status after cardiac surgery. By recognizing patterns in the patient record, inferences concisely summarize detailed patient data

Columbia University Academic Commons

PubMed Central

Categorizing web queries according to geographical locality

Author: Luis Gravano
Richard Lichtenstein
Vasileios Hatzivassiloglou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

Author: Chander Ishwar
Haines Matthew
Hatzivassiloglou Vasileios
Hovy Eduard
Iida Masayo
Knight Kevin
Luk Steve K.
Whitney Richard
Yamada Kenji
Publication venue
Publication date: 01/01/1995
Field of study

Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Columbia University at DUC 2004

Author: Barzilay Regina
Blair-Goldensohn Sasha
Evans David
Hatzivassiloglou Vasileios
McKeown Kathleen
Nenkova Ani
Schiffman Barry
Schlaiker Andrew
Siddharthan Advaith
Sigelman Sergey
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

We describe our participation in tasks 2, 4 and 5 of the DUC 2004 evaluation. For each task, we present the system (s) used, focusing on novel and newly developed aspects. We also analyze the results of the human and automatic evaluations

Columbia University Academic Commons

Learning anchor verbs for biological interaction patterns from published text articles

Author: Benson
Brill
Fleiss
Friedman
Park
Rzhetsky
Santner
Sekimizu
van Rijsbergen
Vasileios Hatzivassiloglou
Wubin Weng
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref