3,795 research outputs found
Six papers on computational methods for the analysis of structured and unstructured data in the economic domain
This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common
goal of promoting the stability of the financial system: systemic risk and bank supervision.
The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor
bank news and deep learning for text classification. New applications and variants of these
models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in
Italian, based on deep learning, to simplify future researches relying on sentiment analysis.
The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted
for inspection and descriptive tasks while deep learning has been applied more for predictive
(classification) problems. Overall, the integration of textual (unstructured) and numerical
(structured) information has proven useful for systemic risk and bank supervision related
analysis. The integration of textual data with numerical data in fact, has brought either to
higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common
goal of promoting the stability of the financial system: systemic risk and bank supervision.
The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor
bank news and deep learning for text classification. New applications and variants of these
models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in
Italian, based on deep learning, to simplify future researches relying on sentiment analysis.
The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted
for inspection and descriptive tasks while deep learning has been applied more for predictive
(classification) problems. Overall, the integration of textual (unstructured) and numerical
(structured) information has proven useful for systemic risk and bank supervision related
analysis. The integration of textual data with numerical data in fact, has brought either to
higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events
NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation
Sequence-to-Sequence (seq2seq) models have become overwhelmingly popular in
building end-to-end trainable dialogue systems. Though highly efficient in
learning the backbone of human-computer communications, they suffer from the
problem of strongly favoring short generic responses. In this paper, we argue
that a good response should smoothly connect both the preceding dialogue
history and the following conversations. We strengthen this connection through
mutual information maximization. To sidestep the non-differentiability of
discrete natural language tokens, we introduce an auxiliary continuous code
space and map such code space to a learnable prior distribution for generation
purpose. Experiments on two dialogue datasets validate the effectiveness of our
model, where the generated responses are closely related to the dialogue
context and lead to more interactive conversations.Comment: Accepted by EMNLP201
Ranking coherence in Topic Models using Statistically Validated Networks
Probabilistic topic models have become one of the most widespread
machine learning techniques in textual analysis. Topic discovering is
an unsupervised process that does not guarantee the interpretability
of its output. Hence, the automatic evaluation of topic coherence
has attracted the interest of many researchers over the last decade,
and it is an open research area. The present article offers a new
quality evaluation method based on Statistically Validated Networks
(SVNs). The proposed probabilistic approach consists of representing
each topic as a weighted network of its most probable words. The
presence of a link between each pair of words is assessed by
statistically validating their co-occurrence in sentences against the null
hypothesis of random co-occurrence. The proposed method allows one
to distinguish between high-quality and low-quality topics, by making
use of a battery of statistical tests. The statistically significant pairwise
associations of words represented by the links in the SVN might
reasonably be expected to be strictly related to the semantic coherence
and interpretability of a topic. Therefore, the more connected the
network, the more coherent the topic in question. We demonstrate the
effectiveness of the method through an analysis of a real text corpus,
which shows that the proposed measure is more correlated with human
judgement than the state-of-the-art coherence measures
Machine Learning of Generic and User-Focused Summarization
A key problem in text summarization is finding a salience function which
determines what information in the source should be included in the summary.
This paper describes the use of machine learning on a training corpus of
documents and their abstracts to discover salience functions which describe
what combination of features is optimal for a given summarization task. The
method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98),
p. 821-82
Query Expansion with Locally-Trained Word Embeddings
Continuous space word embeddings have received a great deal of attention in
the natural language processing and machine learning communities for their
ability to model term similarity and other relationships. We study the use of
term relatedness in the context of query expansion for ad hoc information
retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when
trained globally, underperform corpus and query specific embeddings for
retrieval tasks. These results suggest that other tasks benefiting from global
embeddings may also benefit from local embeddings
A Topic Coverage Approach to Evaluation of Topic Models
Topic models are widely used unsupervised models of text capable of learning
topics - weighted lists of words and documents - from large collections of text
documents. When topic models are used for discovery of topics in text
collections, a question that arises naturally is how well the model-induced
topics correspond to topics of interest to the analyst. In this paper we
revisit and extend a so far neglected approach to topic model evaluation based
on measuring topic coverage - computationally matching model topics with a set
of reference topics that models are expected to uncover. The approach is well
suited for analyzing models' performance in topic discovery and for large-scale
analysis of both topic models and measures of model quality. We propose new
measures of coverage and evaluate, in a series of experiments, different types
of topic models on two distinct text domains for which interest for topic
discovery exists. The experiments include evaluation of model quality, analysis
of coverage of distinct topic categories, and the analysis of the relationship
between coverage and other methods of topic model evaluation. The contributions
of the paper include new measures of coverage, insights into both topic models
and other methods of model evaluation, and the datasets and code for
facilitating future research of both topic coverage and other approaches to
topic model evaluation.Comment: Results and contributions unchanged; Added new references; Improved
the contextualization and the description of the work (abstr, intro, 7.1
concl, rw, concl); Moved technical details of data and model building to
appendices; Improved layout
- …