5,696 research outputs found

    Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

    Full text link
    Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

    Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization

    Full text link
    Nonnegative Matrix Factorization (NMF) has been continuously evolving in several areas like pattern recognition and information retrieval methods. It factorizes a matrix into a product of 2 low-rank non-negative matrices that will define parts-based, and linear representation of nonnegative data. Recently, Graph regularized NMF (GrNMF) is proposed to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In GNMF, an affinity graph is constructed from the original data space to encode the geometrical information. In this paper, we propose a novel idea which engages a Multiple Kernel Learning approach into refining the graph structure that reflects the factorization of the matrix and the new data space. The GrNMF is improved by utilizing the graph refined by the kernel learning, and then a novel kernel learning method is introduced under the GrNMF framework. Our approach shows encouraging results of the proposed algorithm in comparison to the state-of-the-art clustering algorithms like NMF, GrNMF, SVD etc.Comment: This paper has been withdrawn by the author due to the terrible writin

    Reproducible Econometric Research. A Critical Review of the State of the Art.

    Get PDF
    Recent software developments are reviewed from the vantage point of reproducible econometric research. We argue that the emergence of new tools, particularly in the open-source community, have greatly eased the burden of documenting and archiving both empirical and simulation work in econometrics. Some of these tools are highlighted in the discussion of three small replication exercises.Series: Research Report Series / Department of Statistics and Mathematic

    Layout-based substitution tree indexing and retrieval for mathematical expressions

    Get PDF
    We introduce a new system for layout-based indexing and retrieval of mathematical expressions using substitution trees. Substitution trees can efficiently store and find hierarchically-structured data based on similarity. Previously Kolhase and Sucan applied substitution trees to indexing mathematical expressions in operator tree representation (Content MathML) and query-by-expression retrieval. In this investigation, we use substitution trees to index mathematical expressions in symbol layout tree representation (LaTeX) to group expressions based on the similarity of their symbols, symbol layout, sub-expressions and size. We describe our novel substitution tree indexing and retrieval algorithms and our many significant contributions to the behavior of these algorithms, including: allowing substitution trees to index and retrieve layout-based mathematical expressions instead of predicates; introducing a bias in the insertion function that helps group expressions in the index based on similarity in baseline size; modifying the search function to find expressions that are not identical yet still structurally similar to a search query; and ranking search results based on their similarity in symbols and symbol layout to the search query. We provide an experiment testing our system against the term frequency-inverse document frequency (TF-IDF) keyword-based system of Zanibbi and Yuan and demonstrate that: in many cases, the two systems are comparable; our system excelled at finding expressions identical to the search query and expressions containing relevant sub-expressions; and our system experiences some limitations due to the insertion bias and the presence of LaTeX formatting in expressions. Future work includes: designing a different insertion bias that improves the quality of search results; modifying the behavior of the search and ranking functions; and extending the scope of the system so that it can index websites or non-LaTeX expressions (such as MathML or images). Overall, we present a promising first attempt at layout-based substitution tree indexing and retrieval for mathematical expressions

    Retrieval of similar chess positions

    Get PDF
    We address the problem of retrieving chess game positions similar to a given query position from a collection of archived chess games. We investigate this problem from an information retrieval (IR) perspective. The advantage of our proposed IR-based approach is that it allows using the standard inverted organization of stored chess positions, leading to an ecient retrieval. Moreover, in contrast to retrieving exactly identical board positions, the IR-based approach is able to provide approximate search functionality. In order to define the similarity between two chess board positions, we encode each game state with a textual representation. This textual encoding is designed to represent the position, reachability and the connectivity between chess pieces. Due to the absence of a standard IR dataset that can be used for this search task, a new evaluation benchmark dataset was constructed comprising of documents (chess positions) from a freely available chess game archive. Experiments conducted on this dataset demonstrate that our proposed method of similarity computation, which takes into account a combination of the mobility and the connectivities between the chess pieces, performs well on the search task, achieving MAP and nDCG values of 0:4233 and 0:6922 respectively

    Dealing with uncertain entities in ontology alignment using rough sets

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

    Mathematical Formula Recognition and Automatic Detection and Translation of Algorithmic Components into Stochastic Petri Nets in Scientific Documents

    Get PDF
    A great percentage of documents in scientific and engineering disciplines include mathematical formulas and/or algorithms. Exploring the mathematical formulas in the technical documents, we focused on the mathematical operations associations, their syntactical correctness, and the association of these components into attributed graphs and Stochastic Petri Nets (SPN). We also introduce a formal language to generate mathematical formulas and evaluate their syntactical correctness. The main contribution of this work focuses on the automatic segmentation of mathematical documents for the parsing and analysis of detected algorithmic components. To achieve this, we present a synergy of methods, such as string parsing according to mathematical rules, Formal Language Modeling, optical analysis of technical documents in forms of images, structural analysis of text in images, and graph and Stochastic Petri Net mapping. Finally, for the recognition of the algorithms, we enriched our rule based model with machine learning techniques to acquire better results

    A framework for supporting knowledge representation – an ontological based approach

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia ElectrotĂ©cnica e de ComputadoresThe World Wide Web has had a tremendous impact on society and business in just a few years by making information instantly available. During this transition from physical to electronic means for information transport, the content and encoding of information has remained natural language and is only identified by its URL. Today, this is perhaps the most significant obstacle to streamlining business processes via the web. In order that processes may execute without human intervention, knowledge sources, such as documents, must become more machine understandable and must contain other information besides their main contents and URLs. The Semantic Web is a vision of a future web of machine-understandable data. On a machine understandable web, it will be possible for programs to easily determine what knowledge sources are about. This work introduces a conceptual framework and its implementation to support the classification and discovery of knowledge sources, supported by the above vision, where such sources’ information is structured and represented through a mathematical vector that semantically pinpoints the relevance of those knowledge sources within the domain of interest of each user. The presented work also addresses the enrichment of such knowledge representations, using the statistical relevance of keywords based on the classical vector space model concept, and extending it with ontological support, by using concepts and semantic relations, contained in a domain-specific ontology, to enrich knowledge sources’ semantic vectors. Semantic vectors are compared against each other, in order to obtain the similarity between them, and better support end users with knowledge source retrieval capabilities

    Aerospace Medicine and Biology: A continuing bibliography with indexes (supplement 141)

    Get PDF
    This special bibliography lists 267 reports, articles, and other documents introduced into the NASA scientific and technical information system in April 1975
    • 

    corecore