Search CORE

15 research outputs found

Topic Segmentation for Short Texts

Author: Chang Tao-Hsing
Lee Chia-Honag
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

Author: Andreas Stolcke
Dilek Hakkani-Tür
Elizabeth Shriberg
Grosz B.
Gökhan Tür
Hearst Marti A
Passonneau Rebecca J
Publication venue
Publication date: 01/01/2000
Field of study

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Mining Hidden Markov Models in Sequences of Characters Using Recurrent Neural Networks

Author: Khan Sushmita
Publication venue: Digital Commons@Georgia Southern
Publication date: 01/01/2020
Field of study

Restoring damaged historical manuscripts and making them available to the large public has been of great interest for humanities researchers long before computers provided assistance for this task. Current technologies and models make this process easier, more accurate, and capable of discovering parts that were previously unknown. I use Recurrent Neural Networks for uncovering hidden Markov models in sequences of characters from historic manuscripts. Such manuscripts are typically written in some archaic language, which makes the underlying machine learning problem inherently difficult, as not much training data is available, in general. I use bidirectional, hierarchical models for sequences of one or more characters, trained on the existent manuscript data. I tested my model and present experimental results using an Old English manuscript

Georgia Southern University: Digital Commons@Georgia Southern

Automated Monitoring of Online News Streams: Topic Detection and Tracking Considerations

Author: Polinski Patrick
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/11/2003
Field of study

This paper describes the term frequency patterns found in online news summaries published over a seven-week period. The patterns are analyzed qualitatively and quantitatively to facilitate the refinement of algorithms used for the automatic detection and tracking of important topics appearing in streams of text. It is shown that a term's importance cannot be measured in raw frequency counts or significant increases in volume alone. The impact of these findings on existing algorithms is discussed, and new approaches for automated story detection and presentation are considered

Carolina Digital Repository

Understanding musical genre preference evolution within a social network

Author: Silva João Pedro Ramos Pereira da
Publication venue
Publication date: 16/04/2020
Field of study

Dissertation presented as partial requirement for obtaining the Master’s degree in Information Management, specialization in Knowledge Management and Business IntelligenceA música é um campo que simplesmente não pode ser desassociado dos aspetos sociais da vida. Durante a história da humanidade, a música mais popular consistiu sempre num reflexo dos diferentes aspetos da sociedade. Como tal, diferentes estudos foram feitos anteriormente que demonstram este reflexo e obtiveram diversas conclusões. Nesta tese, iremos contribuir para este campo através de uma análise da evolução das preferências de géneros musicais ao longo do tempo através de uma rede social. Usando dados obtidos através de uma experiência de evolução social com cerca de 80 participantes faremos uma análise dos dados existentes. De seguida, esta análise é tida em conta para definir os princípios necessários para representar e analisar a rede social existente. Após esta definição, iremos avaliar a homogeneização da rede social ao longo do tempo. Isto é, iremos avaliar a evolução das diferenças de preferências musicais entre indivíduos que estão ligados na rede social, de forma a perceber se existe alguma tendência de estas diminuírem ao longo do tempo. Um Sequential Algorithm, conhecido como Hidden Markov Model, é aplicado para prever mudanças nas preferências de géneros musicais, considerando as próprias preferências de cada individuo, bem como as preferências dos indivíduos com que este se encontra ligado na nossa rede social. O algoritmo Support Vector Machines é também utilizado para fazer o mesmo tipo de previsão que o modelo anterior servindo como comparação. Por último, discutimos o processo e as limitações que conduziram à definição final do nosso modelo e de forma a contextualizar os resultados que foram obtidos através deste. Em suma, esta tese procurar acrescentar ao trabalho existente em termos de preferências de géneros musicais através de uma avaliação destes dentro do contexto de uma rede social e tendo também em conta a evolução destas ao longo do tempo.Music is a field that simply cannot be disassociated with the social aspects of life. Throughout human history, popular music has always been a reflection of the different aspects of society. As such, there is an interesting amount of studies available that showcase this reflection and draw multiple types of insights. In this thesis, we will look to contribute to this field by assessing the evolution of musical genre preferences over time throughout a social network. Using data obtained through a social evolution experiment of around 80 different individuals we will make an initial assessment of our existing data. This evaluation is then taken into consideration in the next phase of our work where we define the principles necessary to represent and analyse the existing social network. Afterwards, we will showcase a representation of this network, as well as analyse it using various metrics and sub-structures commonly applied in Social Network Analysis. After this, we will evaluate the homogenisation of a network as time goes on. In other words, we will assess the evolution of differences in preferences between individuals that were connected in the social network, in order to understand if there is a trend of these differences diminishing over time. A Sequential-Based algorithm, more specifically, a Hidden Markov Model is used to predict the change in musical genre preferences. This was done by considering each individual’s own preferences as well as the preferences of his connections within the social network with the ultimate goal of assessing how influential the network is in the evolution of a person’s musical genre preferences. To tackle the same research question and provide an alternative approach, as well as a comparison model, we used a Support Vector Machine model. Finally, we discuss the results and limitations that led to our model definition. Overall, this thesis seeks to build upon previous work regarding musical genre preferences by assessing these within the context of a network and taking into account the evolution of these over time

Repositório da Universidade Nova de Lisboa

Sistema autonômico para detecção de mudanças em eventos a partir de notícias

Author: Paranhos Douglas Fonseca Alves.
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/09/2018
Field of study

Topic Detection and Tracking (TDT) has been a topic of many researches since it was defined in the late 90’s and early 2000’s and the main goal is to identify real-world events from non-structured information. Autonomic Computing, in the same way, has been growing since the early 2000’s and is designated for systems which are capable of measuring its own performance automatically, used in latest and modern technologies. Many works were developed in both topics, nevertheless only a few unite these two important concepts, minimizing human intervention to analyze non-structured information. The present work aims to create an autonomic system for change detection in events from news articles.Detecção e Rastreio de Tópicos (TDT) tem sido um tema de bastante pesquisas desde que foi definido no final dos anos 90 e começo dos anos 2000 e tem por objetivo identificar eventos do mundo real a partir de informação não-estruturada. Computação Autonômica, do mesmo modo, também tem crescido bastante à partir dos anos 2000 e é designado para sistemas que tem capacidade de medir seu próprio desempenho automaticamente, sendo aplicado nas mais modernas tecnologias. Muitos trabalhos foram desenvolvidos em ambos os temas, porém poucos que unissem estes dois importantes conceitos, reduzindo assim a necessidade de intervenção humana na importante tarefa de analisar informações não-estruturadas. O presente trabalho tem por objetivo criar um sistema autonômico para detecção de modificações em eventos a partir de notícias

Pantheon

A Framework for Logical Structure Extraction from Software Requirements Documents

Author: Rauf Rehan
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

General purpose rich-text editors, such as MS Word are often used to author software requirements specifications. These requirements specifications contain many different logical structures, such as use cases, business rules and functional requirements. Automated recognition and extraction of these logical structures is necessary to provide useful automated requirements management features, such as automated traceability, template conformance checking, guided editing and interoperability with sophisticated requirements management tools like Requisite Pro. The variability among instances of these logical structures and their attributes poses many challenges for their accurate recognition and extraction. The thesis provides a framework for the extraction of logical structures from software requirements documents. The framework models information about style, structure, and attributes of the logical structures and uses the defined meta-model to extract instances of logical structures. A meta-model also incorporates information about the variability present in the instances. The framework includes an extraction tool, ET, that reads the meta-model and extracts instances of modelled logical structures from the documents. The framework is evaluated on a collection of real-world software requirements documents. Using the framework, different logical structures can be extracted with high precision and recall, each close to 100%. The performance of the extraction tool is acceptable for fast extraction of logical structures from documents with extraction times ranging from a few milliseconds to a few seconds

University of Waterloo's Institutional Repository