87,548 research outputs found
Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning
Deep compositional models of meaning acting on distributional representations
of words in order to produce vectors of larger text constituents are evolving
to a popular area of NLP research. We detail a compositional distributional
framework based on a rich form of word embeddings that aims at facilitating the
interactions between words in the context of a sentence. Embeddings and
composition layers are jointly learned against a generic objective that
enhances the vectors with syntactic information from the surrounding context.
Furthermore, each word is associated with a number of senses, the most
plausible of which is selected dynamically during the composition process. We
evaluate the produced vectors qualitatively and quantitatively with positive
results. At the sentence level, the effectiveness of the framework is
demonstrated on the MSRPar task, for which we report results within the
state-of-the-art range.Comment: Accepted for presentation at EMNLP 201
Ground truth? Concept-based communities versus the external classification of physics manuscripts
Community detection techniques are widely used to infer hidden structures
within interconnected systems. Despite demonstrating high accuracy on
benchmarks, they reproduce the external classification for many real-world
systems with a significant level of discrepancy. A widely accepted reason
behind such outcome is the unavoidable loss of non-topological information
(such as node attributes) encountered when the original complex system is
represented as a network. In this article we emphasize that the observed
discrepancies may also be caused by a different reason: the external
classification itself. For this end we use scientific publication data which i)
exhibit a well defined modular structure and ii) hold an expert-made
classification of research articles. Having represented the articles and the
extracted scientific concepts both as a bipartite network and as its unipartite
projection, we applied modularity optimization to uncover the inner thematic
structure. The resulting clusters are shown to partly reflect the author-made
classification, although some significant discrepancies are observed. A
detailed analysis of these discrepancies shows that they carry essential
information about the system, mainly related to the use of similar techniques
and methods across different (sub)disciplines, that is otherwise omitted when
only the external classification is considered.Comment: 15 pages, 2 figure
Bayesian outlier detection in Capital Asset Pricing Model
We propose a novel Bayesian optimisation procedure for outlier detection in
the Capital Asset Pricing Model. We use a parametric product partition model to
robustly estimate the systematic risk of an asset. We assume that the returns
follow independent normal distributions and we impose a partition structure on
the parameters of interest. The partition structure imposed on the parameters
induces a corresponding clustering of the returns. We identify via an
optimisation procedure the partition that best separates standard observations
from the atypical ones. The methodology is illustrated with reference to a real
data set, for which we also provide a microeconomic interpretation of the
detected outliers
Identifying overlapping terrorist cells from the Noordin Top actor-event network
Actor-event data are common in sociological settings, whereby one registers
the pattern of attendance of a group of social actors to a number of events. We
focus on 79 members of the Noordin Top terrorist network, who were monitored
attending 45 events. The attendance or non-attendance of the terrorist to
events defines the social fabric, such as group coherence and social
communities. The aim of the analysis of such data is to learn about the
affiliation structure. Actor-event data is often transformed to actor-actor
data in order to be further analysed by network models, such as stochastic
block models. This transformation and such analyses lead to a natural loss of
information, particularly when one is interested in identifying, possibly
overlapping, subgroups or communities of actors on the basis of their
attendances to events. In this paper we propose an actor-event model for
overlapping communities of terrorists, which simplifies interpretation of the
network. We propose a mixture model with overlapping clusters for the analysis
of the binary actor-event network data, called {\tt manet}, and develop a
Bayesian procedure for inference. After a simulation study, we show how this
analysis of the terrorist network has clear interpretative advantages over the
more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA
Possibilistic and fuzzy clustering methods for robust analysis of non-precise data
This work focuses on robust clustering of data affected by imprecision. The imprecision is managed in terms of fuzzy sets. The clustering process is based on the fuzzy and possibilistic approaches. In both approaches the observations are assigned to the clusters by means of membership degrees. In fuzzy clustering the membership degrees express the degrees of sharing of the observations to the clusters. In contrast, in possibilistic clustering the membership degrees are degrees of typicality. These two sources of information are complementary because the former helps to discover the best fuzzy partition of the observations while the latter reflects how well the observations are described by the centroids and, therefore, is helpful to identify outliers. First, a fully possibilistic k-means clustering procedure is suggested. Then, in order to exploit the benefits of both the approaches, a joint possibilistic and fuzzy clustering method for fuzzy data is proposed. A selection procedure for choosing the parameters of the new clustering method is introduced. The effectiveness of the proposal is investigated by means of simulated and
real-life data
- …