536 research outputs found

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    A Context-theoretic Framework for Compositionality in Distributional Semantics

    Full text link
    Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a framework for natural language semantics in which words, phrases and sentences are all represented as vectors, based on a theoretical analysis which assumes that meaning is determined by context. In the theoretical analysis, we define a corpus model as a mathematical abstraction of a text corpus. The meaning of a string of words is assumed to be a vector representing the contexts in which it occurs in the corpus model. Based on this assumption, we can show that the vector representations of words can be considered as elements of an algebra over a field. We note that in applications of vector spaces to representing meanings of words there is an underlying lattice structure; we interpret the partial ordering of the lattice as describing entailment between meanings. We also define the context-theoretic probability of a string, and, based on this and the lattice structure, a degree of entailment between strings. We relate the framework to existing methods of composing vector-based representations of meaning, and show that our approach generalises many of these, including vector addition, component-wise multiplication, and the tensor product.Comment: Submitted to Computational Linguistics on 20th January 2010 for revie

    Localized anomaly detection via hierarchical integrated activity discovery

    Get PDF
    2014 Spring.Includes bibliographical references.With the increasing number and variety of camera installations, unsupervised methods that learn typical activities have become popular for anomaly detection. In this thesis, we consider recent methods based on temporal probabilistic models and improve them in multiple ways. Our contributions are the following: (i) we integrate the low level processing and the temporal activity modeling, showing how this feedback improves the overall quality of the captured information, (ii) we show how the same approach can be taken to do hierarchical multi-camera processing, (iii) we use spatial analysis of the anomalies both to perform local anomaly detection and to frame automatically the detected anomalies. We illustrate the approach on both traffic data and videos coming from a metro station. We also investigate the application of topic models in Brain Computing Interfaces for Mental Task classification. We observe a classification accuracy of up to 68% for four Mental Tasks on individual subjects

    Computational algorithms to predict Gene Ontology annotations

    Get PDF
    Background Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations

    Extracting topic-sensitive content from textual documents - A hybrid topic model approach

    Get PDF
    When exploring information of a topic, users often concern its different aspects. For instance, product designers are interested in seeking information of specific topic aspects such as technical challenge and usability from online consumer opinions, while potential buyers wish to obtain general sentiment of public opinions. In this paper, we study an interesting problem called topic-sensitive content extraction (TSCE). TSCE aims to extract contents that are relevant to the samples of topic aspects highlighted by users from a single document in a given text collection. To tackle TSCE, we have proposed a new hybrid topic model which integrates different structures in both topic space and context space. It focuses on identifying contents associated with a specified topic aspect from each document. By modeling gradient documents via term profiles for context modeling and by leveraging local and global differences between probability distributions over words in both topic modeling and context modeling, it has better captured the features of various language patterns. Hence, sentence relevance ranking according to a specific topic aspect is largely improved. The experimental studies on extracting critical contents of specific aspects, including motivation and design solution, from technical patents for design analysis have shown the merits of the proposed modeling

    Semantic multimedia analysis using knowledge and context

    Get PDF
    PhDThe difficulty of semantic multimedia analysis can be attributed to the extended diversity in form and appearance exhibited by the majority of semantic concepts and the difficulty to express them using a finite number of patterns. In meeting this challenge there has been a scientific debate on whether the problem should be addressed from the perspective of using overwhelming amounts of training data to capture all possible instantiations of a concept, or from the perspective of using explicit knowledge about the concepts’ relations to infer their presence. In this thesis we address three problems of pattern recognition and propose solutions that combine the knowledge extracted implicitly from training data with the knowledge provided explicitly in structured form. First, we propose a BNs modeling approach that defines a conceptual space where both domain related evi- dence and evidence derived from content analysis can be jointly considered to support or disprove a hypothesis. The use of this space leads to sig- nificant gains in performance compared to analysis methods that can not handle combined knowledge. Then, we present an unsupervised method that exploits the collective nature of social media to automatically obtain large amounts of annotated image regions. By proving that the quality of the obtained samples can be almost as good as manually annotated images when working with large datasets, we significantly contribute towards scal- able object detection. Finally, we introduce a method that treats images, visual features and tags as the three observable variables of an aspect model and extracts a set of latent topics that incorporates the semantics of both visual and tag information space. By showing that the cross-modal depen- dencies of tagged images can be exploited to increase the semantic capacity of the resulting space, we advocate the use of all existing information facets in the semantic analysis of social media

    Learning Contextualized Music Semantics from Tags via a Siamese Network

    Full text link
    Music information retrieval faces a challenge in modeling contextualized musical concepts formulated by a set of co-occurring tags. In this paper, we investigate the suitability of our recently proposed approach based on a Siamese neural network in fighting off this challenge. By means of tag features and probabilistic topic models, the network captures contextualized semantics from tags via unsupervised learning. This leads to a distributed semantics space and a potential solution to the out of vocabulary problem which has yet to be sufficiently addressed. We explore the nature of the resultant music-based semantics and address computational needs. We conduct experiments on three public music tag collections -namely, CAL500, MagTag5K and Million Song Dataset- and compare our approach to a number of state-of-the-art semantics learning approaches. Comparative results suggest that this approach outperforms previous approaches in terms of semantic priming and music tag completion.Comment: 20 pages. To appear in ACM TIST: Intelligent Music Systems and Application

    TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm

    Get PDF
    The aim of this paper is to present a new point of view that makes it possible to give a statistical interpretation of the traditional latent semantic analysis (LSA) paradigm based on the truncated singular value decomposition (TSVD) technique. We show how the TSVD can be interpreted as a statistical estimator derived from the LSA co-occurrence relationship matrix by mapping probability distributions on Riemanian manifolds. Besides, the quality of the estimator model can be expressed by introducing a figure of merit arising from the Solomonoff approach. This figure of merit takes into account both the adherence to the sample data and the simplicity of the model. In our model, the simplicity parameter of the proposed figure of merit depends on the number of the singular values retained after the truncation process, while the TSVD estimator, according to the Hellinger distance, guarantees the minimal distance between the sample probability distribution and the inferred probabilistic model
    • …
    corecore