26,141 research outputs found
Segmenting broadcast news streams using lexical chains
In this paper we propose a course-grained NLP approach to text segmentation based on the
analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual
units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our system SeLeCT first builds a set of lexical chains, in order to model the discourse structure of the text. A boundary detector is then used to search for breaking points in this structure indicated by patterns of cohesive strength and weakness within the text. We evaluate this technique on a test set of concatenated CNN news story transcripts and compare it with an established statistical approach to segmentation called TextTiling
Exploring lexical patterns in text : lexical cohesion analysis with WordNet
We present a system for the linguistic exploration and analysis of lexical cohesion in English texts. Using an electronic thesaurus-like resource, Princeton WordNet, and the Brown Corpus of English, we have implemented a process of annotating text with lexical chains and a graphical user interface for inspection of the annotated text. We describe the system and report on some sample linguistic analyses carried out using the combined thesaurus-corpus resource
Recommended from our members
Lexical Chaining and Word-Sense-Disambiguation
Lexical chains algorithms attempt to find sequences of words in a document that are closely related semantically. Such chains have been argued to provide a good indication of the topics covered by the document without requiring a deeper analysis of the text, and have been proposed for many NLP tasks. Different underlying lexical semantic relations based on WordNet have been used for this task. Since links in WordNet connect synsets rather than words, open word-sense disambiguation becomes a necessary part of any chaining algorithm, even if the intended application is not disambiguation. Previous chaining algorithms have combined the tasks of disambiguation and chaining by choosing those word senses that maximize chain connectivity, a strategy which yields poor disambiguation accuracy in practice.
We present a novel probabilistic algorithm for finding lexical chains. Our algorithm explicitly balances the requirements of maximizing chain connectivity with the choice of probable word-senses. The algorithm achieves better disambiguation results than all previous ones, but under its optimal settings shifts this balance totally in favor of probable senses, essentially ignoring the chains. This model points to an inherent conflict between chaining and word-sensedisambiguation. By establishing an upper bound on the disambiguation potential of lexical chains, we show that chaining is theoretically highly unlikely to achieve accurate disambiguation.
Moreover, by defining a novel intrinsic evaluation criterion for lexical chains, we show that poor disambiguation accuracy also implies poor chain accuracy. Our results have crucial implications for chaining algorithms. At the very least, they show that disentangling disambiguation from chaining significantly improves chaining accuracy. The hardness of all-words disambiguation, however, implies that finding accurate lexical chains is harder than suggested by the literature.Engineering and Applied Science
Using lexical chains for keyword extraction
Cataloged from PDF version of article.Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained. (C) 2007 Elsevier Ltd. All rights reserved
SeLeCT: a lexical cohesion based news story segmentation system
In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e., distinct news stories from broadcast news programmes. Our approach to news story segmentation (the SeLeCT system) is based on an analysis of lexical cohesive strength between textual units using a linguistic technique called lexical chaining. We evaluate the relative performance of SeLeCT with respect to two other cohesion based segmenters: TextTiling and C99. Using a recently introduced evaluation metric WindowDiff, we contrast the segmentation accuracy of each system on both "spoken" (CNN news transcripts) and "written" (Reuters newswire) news story test sets extracted from the TDT1 corpus
A Novel Approach Towards Automatic Text Summarization Using Lexical Chains
Text summarization is a process of extracting text by virtue of reduction of document contents while preserving the salient information intact. By using different set of parameters like position, format and type of sentences in an input text, frequency of words in a text etc., techniques have been developed. But the parameters vary depending on source of input texts. This in turn affects the performance of the algorithms. In this paper, we present a new method of automatic text summarization by making use of lexical cohesion in the text. Until now lexical chains have been used to model lexical cohesion. These lexical chains are sequences of words having semantic relations between them. In our proposed algorithm, we have used a modification of lexical chains to model the relationships that exist between words.
DOI: 10.17762/ijritcc2321-8169.15081
Space Efficient Breadth-First and Level Traversals of Consistent Global States of Parallel Programs
Enumerating consistent global states of a computation is a fundamental
problem in parallel computing with applications to debug- ging, testing and
runtime verification of parallel programs. Breadth-first search (BFS)
enumeration is especially useful for these applications as it finds an
erroneous consistent global state with the least number of events possible. The
total number of executed events in a global state is called its rank. BFS also
allows enumeration of all global states of a given rank or within a range of
ranks. If a computation on n processes has m events per process on average,
then the traditional BFS (Cooper-Marzullo and its variants) requires
space in the worst case, whereas ou r
algorithm performs the BFS requires space. Thus, we
reduce the space complexity for BFS enumeration of consistent global states
exponentially. and give the first polynomial space algorithm for this task. In
our experimental evaluation of seven benchmarks, traditional BFS fails in many
cases by exhausting the 2 GB heap space allowed to the JVM. In contrast, our
implementation uses less than 60 MB memory and is also faster in many cases
- …