9 research outputs found

    Алгоритм ранжирования связных структур в задачах автоматического составления обзорных рефератов новостных сюжетов

    Full text link
    Работа посвящена одной из актуальных проблем автоматического реферирования – составлению обзорных рефератов по набору документов. Рассмотрен новый на сегодняшний день алгоритм ранжирования связных структур (Manifold Ranking Algorithm) применительно к автоматическому реферированию новостных сюжетов. Алгоритм позволяет учитывать как зависимости между предложениями внутри одного документа, так и зависимости между всеми предложениями коллекции. Проведен анализ возможности использования алгоритма для русского языка. Построена пробная система автоматического реферирования. Приведены результаты работы системы. Сформулированы основные проблемы реализации системы и возможные методы их решения. Оценка качества работы системы произведена при помощи критерия ROUGE. Произведено сравнение результатов работы построенной системы с результатами в DUC 2003, DUC 2005.This work deals with one of the topical problems of automatic summarization – multi-document summarization in respect to news stories. This paper presents a novel extractive approach based on manifold-ranking of sentences to this summarization task. The manifold-ranking algorithm differentiates the intra-document and inter-document links between sentences with different weights. The possibility of the use the algorithm for Russian language is analyzed. A sample system for automatic summarization is build. This paper represents the sample summaries and describes experiments of summarization evaluation. The main problems of implementation of the system and possible methods of their solutions are formulated. The ROUGE criteria was used for evaluation. The results of work of built system are compared with the results of DUC 2003, DUC 2005

    Automatic analysis of medical dialogue in the home hemodialysis domain : structure induction and summarization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 129-134).Spoken medical dialogue is a valuable source of information, and it forms a foundation for diagnosis, prevention and therapeutic management. However, understanding even a perfect transcript of spoken dialogue is challenging for humans because of the lack of structure and the verbosity of dialogues. This work presents a first step towards automatic analysis of spoken medical dialogue. The backbone of our approach is an abstraction of a dialogue into a sequence of semantic categories. This abstraction uncovers structure in informal, verbose conversation between a caregiver and a patient, thereby facilitating automatic processing of dialogue content. Our method induces this structure based on a range of linguistic and contextual features that are integrated in a supervised machine-learning framework. Our model has a classification accuracy of 73%, compared to 33% achieved by a majority baseline (p<0.01). We demonstrate the utility of this structural abstraction by incorporating it into an automatic dialogue summarizer. Our evaluation results indicate that automatically generated summaries exhibit high resemblance to summaries written by humans and significantly outperform random selections (p<0.0001) in precision and recall.(cont.) In addition, task-based evaluation shows that physicians can reasonably answer questions related to patient care by looking at the automatically-generated summaries alone, in contrast to the physicians' performance when they were given summaries from a naive summarizer (p<0.05). This is a significant result because it spares the physician from the need to wade through irrelevant material ample in dialogue transcripts. This work demonstrates the feasibility of automatically structuring and summarizing spoken medical dialogue.by Ronilda Covar Lacson.Ph.D

    Temporal Information Extraction and Knowledge Base Population

    Full text link
    Temporal Information Extraction (TIE) from text plays an important role in many Natural Language Processing and Database applications. Many features of the world are time-dependent, and rich temporal knowledge is required for a more complete and precise understanding of the world. In this thesis we address aspects of two core tasks in TIE. First, we provide a new corpus of labeled temporal relations between events and temporal expressions, dense enough to facilitate a change in research directions from relation classification to identification, and present a system designed to address corresponding new challenges. Second, we implement a novel approach for the discovery and aggregation of temporal information about entity-centric fluent relations

    Sentence ordering in multidocument summarization

    No full text
    The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. In this paper, we describe two naive ordering techniques and show that they do not perform well. We present an integrated strategy for ordering information, combining constraints from chronological order of events and cohesion. This strategy was derived from empirical observations based on experiments asking humans to order information. Evaluation of our augmented algorithm shows a significant improvement of the ordering over the two naive techniques we used as baseline

    Leveraging Semantic Annotations for Event-focused Search & Summarization

    Get PDF
    Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.Im heutigen Big Data Zeitalters existieren überwältigende Mengen an Textinformationen, die über mehrere Quellen verteilt sind und ein hohes Maß an Redundanz haben. Durch diese Gegebenheiten ist eine Retroperspektive auf vergangene Ereignisse für Konsumenten nur schwer möglich. Eine plausible Lösung ist die Verknüpfung semantisch ähnlicher, aber über mehrere Quellen verteilter Informationen, um dadurch eine Struktur zu erzwingen, die mehrere Zugriffspfade auf relevante Informationen, bietet. Vor diesem Hintergrund benutzt diese Dissertation Wikipedia und Onlinenachrichten als zwei prominente, aber dennoch grundverschiedene Informationsquellen, um die folgenden drei Probleme anzusprechen: • Wir adressieren ein Verknüpfungsproblem, um Wikipedia-Auszüge mit Nachrichtenartikeln zu verbinden und das Problem in eine Information-Retrieval-Aufgabe umzuwandeln. Unser neuartiger Ansatz integriert Zeit- und Geobezüge sowie Entitäten mit Text, um relevante Dokumente, die mit einem gegebenen Auszug verknüpft werden können, zu identifizieren. • Wir befassen uns mit einer unüberwachten Extraktionsmethode zur automatischen Zusammenfassung von Texten aus mehreren Dokumenten um Ereigniszusammenfassungen mit fester Länge zu generieren, was eine effiziente Aufnahme von Informationen aus großen Dokumentenmassen ermöglicht. Unser neuartiger Ansatz schlägt eine ganzzahlige lineare Optimierungslösung vor, die globale Inferenzen über Text, Zeit, Geolokationen und mit Ereignis-verbundenen Entitäten zieht. • Um den zeitlichen Fokus kurzer Ereignisbeschreibungen abzuschätzen, stellen wir einen semi-überwachten Ansatz vor, der die Redundanz innerhalb einer langzeitigen Dokumentensammlung ausnutzt, um genaue probabilistische Zeitmodelle abzuschätzen. Umfangreiche experimentelle Auswertungen zeigen die Wirksamkeit und Tragfähigkeit unserer vorgeschlagenen Ansätze zur Erreichung des größeren Ziels

    Sentence Ordering in Multidocument Summarization

    No full text
    The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. In this paper, we describe two naive ordering techniques and show that they do not perform well. We present an integrated strategy for ordering information, combining constraints from chronological order of events and cohesion. This strategy was derived from empirical observations based on experiments asking humans to order information. Evaluation of our augmented algorithm shows a signi cant improvement of the ordering over the two naive techniques we used as baseline
    corecore