43,438 research outputs found
Beyond Sentiment: The Manifold of Human Emotions
Sentiment analysis predicts the presence of positive or negative emotions in
a text document. In this paper we consider higher dimensional extensions of the
sentiment concept, which represent a richer set of human emotions. Our approach
goes beyond previous work in that our model contains a continuous manifold
rather than a finite set of human emotions. We investigate the resulting model,
compare it to psychological observations, and explore its predictive
capabilities. Besides obtaining significant improvements over a baseline
without manifold, we are also able to visualize different notions of positive
sentiment in different domains.Comment: 15 pages, 7 figure
Automated user modeling for personalized digital libraries
Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to
improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in
an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information
A Short Survey on Data Clustering Algorithms
With rapidly increasing data, clustering algorithms are important tools for
data analytics in modern research. They have been successfully applied to a
wide range of domains; for instance, bioinformatics, speech recognition, and
financial analysis. Formally speaking, given a set of data instances, a
clustering algorithm is expected to divide the set of data instances into the
subsets which maximize the intra-subset similarity and inter-subset
dissimilarity, where a similarity measure is defined beforehand. In this work,
the state-of-the-arts clustering algorithms are reviewed from design concept to
methodology; Different clustering paradigms are discussed. Advanced clustering
algorithms are also discussed. After that, the existing clustering evaluation
metrics are reviewed. A summary with future insights is provided at the end
Inferring Interpersonal Relations in Narrative Summaries
Characterizing relationships between people is fundamental for the
understanding of narratives. In this work, we address the problem of inferring
the polarity of relationships between people in narrative summaries. We
formulate the problem as a joint structured prediction for each narrative, and
present a model that combines evidence from linguistic and semantic features,
as well as features based on the structure of the social community in the text.
We also provide a clustering-based approach that can exploit regularities in
narrative types. e.g., learn an affinity for love-triangles in romantic
stories. On a dataset of movie summaries from Wikipedia, our structured models
provide more than a 30% error-reduction over a competitive baseline that
considers pairs of characters in isolation
Gene Expression based Survival Prediction for Cancer Patients: A Topic Modeling Approach
Cancer is one of the leading cause of death, worldwide. Many believe that
genomic data will enable us to better predict the survival time of these
patients, which will lead to better, more personalized treatment options and
patient care. As standard survival prediction models have a hard time coping
with the high-dimensionality of such gene expression (GE) data, many projects
use some dimensionality reduction techniques to overcome this hurdle. We
introduce a novel methodology, inspired by topic modeling from the natural
language domain, to derive expressive features from the high-dimensional GE
data. There, a document is represented as a mixture over a relatively small
number of topics, where each topic corresponds to a distribution over the
words; here, to accommodate the heterogeneity of a patient's cancer, we
represent each patient (~document) as a mixture over cancer-topics, where each
cancer-topic is a mixture over GE values (~words). This required some
extensions to the standard LDA model eg: to accommodate the "real-valued"
expression values - leading to our novel "discretized" Latent Dirichlet
Allocation (dLDA) procedure. We initially focus on the METABRIC dataset, which
describes breast cancer patients using the r=49,576 GE values, from
microarrays. Our results show that our approach provides survival estimates
that are more accurate than standard models, in terms of the standard
Concordance measure. We then validate this approach by running it on the
Pan-kidney (KIPAN) dataset, over r=15,529 GE values - here using the mRNAseq
modality - and find that it again achieves excellent results. In both cases, we
also show that the resulting model is calibrated, using the recent
"D-calibrated" measure. These successes, in two different cancer types and
expression modalities, demonstrates the generality, and the effectiveness, of
this approach
- …