Search CORE

16 research outputs found

Recommended from our members

The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

To properly implement a simple Tibetan Information Retrieval (IR) system segmentation of one form or another (n-gram, POS-tagging, dictionary substring matching, etc.) must be performed (see Hackett (2000b)). To take Tibetan indexing to a more sophisticated level however, some form of topic detection must be employed. This paper reports the results of a pilot study on the application to Tibetan of one technique for topic boundary detection: Lexical Cohesion. The resources developed and deployed, the theoretical model used, and its potential applications are discussed

Columbia University Academic Commons

Automatic Recognition of Narrative Drama Units: A Structured Learning Approach

Author: Danilo Croce
Eleonora Ceccaldi
Lombardo Vincenzo
Roberto Basili
Publication venue: CEUR-WS.org
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

Segmenting Lecture Videos by Topic: From Manual to Automated Methods

Author: Diller Christopher B.R.
Forsgren Nicole
Huang Yunchu
Lin Ming
Nunamaker Jr., Jay F.
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2005
Field of study

More and more universities and corporations are starting to provide videotaped lectures online for knowledge sharing and learning. Segmenting lecture videos into short clips by topic can extract the hidden information structure of the videos and facilitate information searching and learning. Manual segmentation has high accuracy rates but is very labor intensive. In order to develop a high performance automated segmentation method for lecture videos, we conducted a case study to learn the segmentation process of humans and the effective segmentation features used in the process. Based on the findings from the case study, we designed an automated segmentation approach with two phases: initial segmentation and segmentation refinement. The approach combines segmentation features from three information sources of video (speech text transcript, audio and video) and makes use of various knowledge sources such as world knowledge and domain knowledge. Our preliminary results show that the proposed two-phase approach is promising

AIS Electronic Library (AISeL)

Segmentation of lecture videos based on text: A method combining multiple linguistic features

Author: Chau M
Chen H
Lin M
Nunamaker Jr JF
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

In multimedia-based e-Learning systems, there are strong needs for segmenting lecture videos into topic units in order to organize the videos for browsing and to provide search capability. Automatic segmentation is highly desired because of the high cost of manual segmentation. While a lot of research has been conducted on topic segmentation of transcribed spoken text, most attempts rely on domain-specific cues and formal presentation format, and require extensive training; none of these features exist in lecture videos with unscripted and spontaneous speech. In addition, lecture videos usually have few scene changes, which implies that the visual information that most video segmentation methods rely on is not available. Furthermore, even when there are scene changes, they do not match with the topic transitions. In this paper, we make use of the transcribed speech text extracted from the audio track of video to segment lecture videos into topics. We review related research and propose a new segmentation approach. Our approach utilizes features such as noun phrases and combines multiple content-based and discourse-based features. Our preliminary results show that the noun phrases are salient features and the combination of multiple features is promising to improve segmentation accuracy.published_or_final_versio

HKU Scholars Hub

How ontology based information retrieval systems may benefit from lexical text analysis

Author: Augereau Patrick
Duthil Benjamin
Montmain Jacky
Ranwez Sylvie
Ranwez Vincent
Sy Mohameth-François
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2013
Field of study

International audienceThe exponential growth of available electronic data is almost useless without efficient tools to retrieve the right information at the right time. It is now widely acknowledged that information retrieval systems need to take semantics into account to enhance the use of available information. However, there is still a gap between the amounts of relevant information that can be accessed through optimized IRSs on the one hand, and users' ability to grasp and process a handful of relevant data at once on the other. This chapter shows how conceptual and lexical approaches may be jointly used to enrich document description. After a survey on semantic based methodologies designed to efficiently retrieve and exploit information, hybrid approaches are discussed. The original approach presented here benefits from both lexical and ontological document description, and combines them in a software architecture dedicated to information retrieval and rendering in specific domains

HAL Descartes

HAL-CIRAD

Hal-Diderot

A Probabilistic model of meetings that combines words and discourse features

Author: Dowman M
Griffiths TL
Kording KP
Purver M
Savova V
Tenenbaum JB
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2008
Field of study

(c) 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.This is the author's accepted version of this article. The final published version can be found here: http://dx.doi.org/10.1109/TASL.2008.92586

Queen Mary Research Online

Topic segmentation of TV-streams by watershed transform and vectorization

Author: Claveau Vincent
Lefèvre Sébastien
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

International audienceA fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processing tasks. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we present a new segmentation technique inspired from the image analysis field and relying on a new way to compute similarities between candidate segments called Vectorization. Vectorization makes it possible to match text segments that do not share common words; this property is shown to be particularly useful when dealing with transcripts in which transcription errors and short segments makes the segmentation difficult. This new topic segmen-tation technique is evaluated on two corpora of transcripts from French TV broadcasts on which it largely outperforms other existing approaches from the state-of-the-art

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Filtrage pour la construction de résumés multi-documents guidée par un profil

Author: Ferret Olivier
Fluhr Christian
Leila Châar Sana
Publication venue: ATALA (Association pour le Traitement Automatique des Langues)
Publication date: 01/01/2004
Field of study

National audienceDans cet article, nous présentons une méthode de filtrage permettant de sélectionner à partir d'un ensemble de documents les extraits de textes les plus significatifs relativement à un profil défini par un utilisateur. Pour ce faire, nous mettons l'accent sur l'utilisation conjointe de profils structurés et d'une analyse thématique des documents. Cette analyse permet également d'étendre le vocabulaire définissant un profil en fonction du document traité en sélectionnant les termes de ce dernier les plus étroitement liés aux termes du profil. Tous ces aspects assurent une plus grande finesse du filtrage tout en permettant la sélection d'extraits de documents ayant un lien plus ténu avec les profils mais davantage susceptibles d'apporter des informations nouvelles et donc intéressantes. L'intérêt de l'approche présentée a été illustré au travers du système REDUIT qui a fait l'objet d'une évaluation concernant à la fois le filtrage de documents et l'extraction de passages

HAL-CEA

News Story Segmentation in Multiple Modalities

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Automatic text summarization using lexical chains : algorithms and experiments

Author: Kolla Maheedhar
University of Lethbridge. Faculty of Arts and Science
Publication venue: 'University of Central Missouri, Department of Mathematics and Computer Science'
Publication date: 01/01/2004
Field of study

viii, 80 leaves : ill. ; 29 cm.Summarization is a complex task that requires understanding of the document content to determine the importance of the text. Lexical cohesion is a method to identify connected portions of the text based on the relations between the words in the text. Lexical cohesive relations can be represented using lexical chaings. Lexical chains are sequences of semantically related words spread over the entire text. Lexical chains are used in variety of Natural Language Processing (NLP) and Information Retrieval (IR) applications. In current thesis, we propose a lexical chaining method that includes the glossary relations in the chaining process. These relations enable us to identify topically related concepts, for instance dormitory and student, and thereby enhances the identification of cohesive ties in the text. We then present methods that use the lexical chains to generate summaries by extracting sentences from the document(s). Headlines are generated by filtering the portions of the sentences extracted, which do not contribute towards the meaning of the sentence. Headlines generated can be used in real world application to skim through the document collections in a digital library. Multi-document summarization is gaining demand with the explosive growth of online news sources. It requires identification of the several themes present in the collection to attain good compression and avoid redundancy. In this thesis, we propose methods to group the portions of the texts of a document collection into meaningful clusters. clustering enable us to extract the various themes of the document collection. Sentences from clusters can then be extracted to generate a summary for the multi-document collection. Clusters can also be used to generate summaries with respect to a given query. We designed a system to compute lexical chains for the given text and use them to extract the salient portions of the document. Some specific tasks considered are: headline generation, multi-document summarization, and query-based summarization. Our experimental evaluation shows that efficient summaries can be extracted for the above tasks

OPUS: Open Uleth Scholarship - University of Lethbridge Research Repository