7,669 research outputs found
The CAITLIN Auralization System: Hierarchical Leitmotif Design as a Clue to Program Comprehension
Early experiments have suggested that program auralization can convey information about program structure [8].
Languages like Pascal contain classes of construct that are similar in nature allowing hierarchical classification of their features. This taxonomy can be reflected in the design of musical signatures which are used within the CAITLIN program auralization system. Experiments using these hierarchical leitmotifs indicate whether or not their similarities can be put to good use in communicating information about program structure and state
Inference and Evaluation of the Multinomial Mixture Model for Text Clustering
In this article, we investigate the use of a probabilistic model for
unsupervised clustering in text collections. Unsupervised clustering has become
a basic module for many intelligent text processing applications, such as
information retrieval, text classification or information extraction. The model
considered in this contribution consists of a mixture of multinomial
distributions over the word counts, each component corresponding to a different
theme. We present and contrast various estimation procedures, which apply both
in supervised and unsupervised contexts. In supervised learning, this work
suggests a criterion for evaluating the posterior odds of new documents which
is more statistically sound than the "naive Bayes" approach. In an unsupervised
context, we propose measures to set up a systematic evaluation framework and
start with examining the Expectation-Maximization (EM) algorithm as the basic
tool for inference. We discuss the importance of initialization and the
influence of other features such as the smoothing strategy or the size of the
vocabulary, thereby illustrating the difficulties incurred by the high
dimensionality of the parameter space. We also propose a heuristic algorithm
based on iterative EM with vocabulary reduction to solve this problem. Using
the fact that the latent variables can be analytically integrated out, we
finally show that Gibbs sampling algorithm is tractable and compares favorably
to the basic expectation maximization approach
Improving Search Engine Results by Query Extension and Categorization
Since its emergence, the Internet has changed the way in which information is distributed and it has strongly influenced how people communicate. Nowadays, Web search engines are widely used to locate information on the Web, and online social networks have become pervasive platforms of communication.
Retrieving relevant Web pages in response to a query is not an easy task for Web search engines due to the enormous corpus of data that the Web stores and the inherent ambiguity of search queries. We present two approaches to improve the effectiveness of Web search engines. The first approach allows us to retrieve more Web pages relevant to a user\u27s query by extending the query to include synonyms and other variations. The second, gives us the ability to retrieve Web pages that more precisely reflect the user\u27s intentions by filtering out those pages which are not related to the user-specified interests.
Discovering communities in online social networks (OSNs) has attracted much attention in recent years. We introduce the concept of subject-driven communities and propose to discover such communities by modeling a community using a posting/commenting interaction graph which is relevant to a given subject of interest, and then applying link analysis on the interaction graph to locate the core members of a community
- …