Search CORE

11 research outputs found

SOTXTSTREAM: Density-based self-organizing clustering of text streams

Author: Bryant Avory C.
Cios Krzysztof J.
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets

Crossref

Directory of Open Access Journals

VCU Scholars Compass

Document Clustering with Bursty Information

Author: Chaoji Vineet
Hoonlor Apirak
Szymanski Bolesław K.
Zaki Mohamed J.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2013
Field of study

Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text collections, but it was also able to cluster the news articles related to specific events much better than other models

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Anticipating annotations and emerging trends in biomedical literature

Author: Bernd Wachmann
Dmitriy Fradkin
Fabian Mörchen
Julien Etienne
Markus Bundschus
Mathäus Dejori
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

The BioJournalMonitor is a decision support system for the analysis of trends and topics in the biomedical literature. Its main goal is to identify potential diagnostic and therapeu-tic biomarkers for specific diseases. Several data sources are continuously integrated to provide the user with up-to-date information on current research in this field. State-of-the-art text mining technologies are deployed to provide added value on top of the original content, including named en-tity detection, relation extraction, classification, clustering, ranking, summarization, and visualization. We present two novel technologies that are related to the analysis of tem-poral dynamics of text archives and associated ontologies. Currently, the MeSH ontology is used to annotate the sci-entific articles entering the PubMed database with medical terms. Both the maintenance of the ontology as well as the annotation of new articles is performed largely manually. We describe how probabilistic topic models can be used to anno-tate recent articles with the most likely MeSH terms. This provides our users with a competitive advantage because, when searching for MeSH terms, articles are found long be-fore they are manually annotated. We further present a study on how to predict the inclusion of new terms in the MeSH ontology. The results suggest that early prediction of emerging trends is possible. The trend ranking functions are deployed in our system to enable interactive searches for the hottest new trends relating to a disease

CiteSeerX

Crossref

Modeling Anticipatory Event Transitions

Author: CHANG Kuiyu
LIM Ee Peng
QI He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Institutional Knowledge at Singapore Management University

Keep it simple with time: A reexamination of probabilistic topic detection models

Author: Banerjee Arindam
CHANG Kuiyu
HE Qi
LIM Ee Peng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Institutional Knowledge at Singapore Management University

Top-k term publish/subscribe for geo-textual data streams

Author: Chen Lisi
Jensen Christian S.
Kalnis Panos
Shang Shuo
Shao Ling
Xu Jianliang
Yao Bin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

VBN

An approach for measuring the software modularity based on the bursty evolution of functional dependencies

Author: Tedlapu Ajay
University of Lethbridge. Faculty of Arts and Science
Publication venue: 'University of Central Missouri, Department of Mathematics and Computer Science'
Publication date: 01/01/2019
Field of study

Modular Design of a software system is one of the parameters which defines the complexity of a software system. If the software is built as one whole module, then it makes testing a long process. Also, updating the software will make a significant impact on the whole system code because of the dependencies. We propose a methodology to study and visualize the evolution of the modular structure of a network of functional dependencies in a software system. We used the Understand C++ tool for analyzing the dependencies and Gephi to produce the network. Our method analyzes the modularity of the software and identifies specific periods of significant activities, which are known as the evolutionary hot spots in software systems. As a case study, we analyzed the modular structure of Octave during its life cycle beginning from 1993 to the present

OPUS: Open Uleth Scholarship - University of Lethbridge Research Repository

Um Modelo de descoberta de conhecimento inerente à evolução temporal dos relacionamentos entre elementos textuais

Author: Bovo Alessandro Botelho
Publication venue
Publication date: 25/10/2012
Field of study

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia e Gestão do Conhecimento, Florianópolis, 2011Há algum tempo tem sido observado e discutido o aumento expressivo na quantidade de informação produzida e publicada pelo mundo. Se por um lado essa situação propicia muitas oportunidades de uso dessa informação para a tomada de decisão, por outro, lança muitos desafios em como armazenar, recuperar e transformar essa informação em conhecimento. Umas das formas de descoberta de conhecimento que tem atraído atenção de pesquisadores é a análise dos relacionamentos presentes nas informações disponíveis. Não obstante, devido à grande velocidade de criação de novos conteúdos a dimensão tempo torna-se uma propriedade intrínseca e relevante presente nestas fontes de informação. Assim, o objetivo é desenvolver um modelo para descoberta de conhecimento a partir de informações não estruturadas analisando a evolução dos relacionamentos entre os elementos textuais ao longo do tempo. O modelo proposto é dividido por fases, assim como os modelos tradicionais de descoberta de conhecimento. As fases deste modelo são: configuração dos temas de análise, identificação das ocorrências dos conceitos, correlação e correlação temporal, associação e associação temporal, criação do repositório de temas de análise, e tarefas intensivas em conhecimento, com ênfase nos relacionamentos diretos e indiretos entre os conceitos do domínio. A demonstração de viabilidade é realizada por meio de um protótipo baseado no modelo proposto e sua aplicação em um estudo de caso. É realizada também uma análise comparativa do modelo proposto com outros modelos de descoberta de conhecimento em textos

Repositório Institucional da UFSC

Bursty feature representation for clustering text streams

Author: CHANG Kuiyu
HE Qi
LIM Ee Peng
ZHANG Jun
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2007
Field of study

Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bagof-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks involving a temporal dimension, such as modeling of online blog pages.

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University