87,548 research outputs found

    Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning

    Full text link
    Deep compositional models of meaning acting on distributional representations of words in order to produce vectors of larger text constituents are evolving to a popular area of NLP research. We detail a compositional distributional framework based on a rich form of word embeddings that aims at facilitating the interactions between words in the context of a sentence. Embeddings and composition layers are jointly learned against a generic objective that enhances the vectors with syntactic information from the surrounding context. Furthermore, each word is associated with a number of senses, the most plausible of which is selected dynamically during the composition process. We evaluate the produced vectors qualitatively and quantitatively with positive results. At the sentence level, the effectiveness of the framework is demonstrated on the MSRPar task, for which we report results within the state-of-the-art range.Comment: Accepted for presentation at EMNLP 201

    Ground truth? Concept-based communities versus the external classification of physics manuscripts

    Full text link
    Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node attributes) encountered when the original complex system is represented as a network. In this article we emphasize that the observed discrepancies may also be caused by a different reason: the external classification itself. For this end we use scientific publication data which i) exhibit a well defined modular structure and ii) hold an expert-made classification of research articles. Having represented the articles and the extracted scientific concepts both as a bipartite network and as its unipartite projection, we applied modularity optimization to uncover the inner thematic structure. The resulting clusters are shown to partly reflect the author-made classification, although some significant discrepancies are observed. A detailed analysis of these discrepancies shows that they carry essential information about the system, mainly related to the use of similar techniques and methods across different (sub)disciplines, that is otherwise omitted when only the external classification is considered.Comment: 15 pages, 2 figure

    Bayesian outlier detection in Capital Asset Pricing Model

    Full text link
    We propose a novel Bayesian optimisation procedure for outlier detection in the Capital Asset Pricing Model. We use a parametric product partition model to robustly estimate the systematic risk of an asset. We assume that the returns follow independent normal distributions and we impose a partition structure on the parameters of interest. The partition structure imposed on the parameters induces a corresponding clustering of the returns. We identify via an optimisation procedure the partition that best separates standard observations from the atypical ones. The methodology is illustrated with reference to a real data set, for which we also provide a microeconomic interpretation of the detected outliers

    Identifying overlapping terrorist cells from the Noordin Top actor-event network

    Get PDF
    Actor-event data are common in sociological settings, whereby one registers the pattern of attendance of a group of social actors to a number of events. We focus on 79 members of the Noordin Top terrorist network, who were monitored attending 45 events. The attendance or non-attendance of the terrorist to events defines the social fabric, such as group coherence and social communities. The aim of the analysis of such data is to learn about the affiliation structure. Actor-event data is often transformed to actor-actor data in order to be further analysed by network models, such as stochastic block models. This transformation and such analyses lead to a natural loss of information, particularly when one is interested in identifying, possibly overlapping, subgroups or communities of actors on the basis of their attendances to events. In this paper we propose an actor-event model for overlapping communities of terrorists, which simplifies interpretation of the network. We propose a mixture model with overlapping clusters for the analysis of the binary actor-event network data, called {\tt manet}, and develop a Bayesian procedure for inference. After a simulation study, we show how this analysis of the terrorist network has clear interpretative advantages over the more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA

    Possibilistic and fuzzy clustering methods for robust analysis of non-precise data

    Get PDF
    This work focuses on robust clustering of data affected by imprecision. The imprecision is managed in terms of fuzzy sets. The clustering process is based on the fuzzy and possibilistic approaches. In both approaches the observations are assigned to the clusters by means of membership degrees. In fuzzy clustering the membership degrees express the degrees of sharing of the observations to the clusters. In contrast, in possibilistic clustering the membership degrees are degrees of typicality. These two sources of information are complementary because the former helps to discover the best fuzzy partition of the observations while the latter reflects how well the observations are described by the centroids and, therefore, is helpful to identify outliers. First, a fully possibilistic k-means clustering procedure is suggested. Then, in order to exploit the benefits of both the approaches, a joint possibilistic and fuzzy clustering method for fuzzy data is proposed. A selection procedure for choosing the parameters of the new clustering method is introduced. The effectiveness of the proposal is investigated by means of simulated and real-life data
    • …
    corecore