1,363 research outputs found

    Potential Maximal Clique Algorithms for Perfect Phylogeny Problems

    Full text link
    Kloks, Kratsch, and Spinrad showed how treewidth and minimum-fill, NP-hard combinatorial optimization problems related to minimal triangulations, are broken into subproblems by block subgraphs defined by minimal separators. These ideas were expanded on by Bouchitt\'e and Todinca, who used potential maximal cliques to solve these problems using a dynamic programming approach in time polynomial in the number of minimal separators of a graph. It is known that solutions to the perfect phylogeny problem, maximum compatibility problem, and unique perfect phylogeny problem are characterized by minimal triangulations of the partition intersection graph. In this paper, we show that techniques similar to those proposed by Bouchitt\'e and Todinca can be used to solve the perfect phylogeny problem with missing data, the two- state maximum compatibility problem with missing data, and the unique perfect phylogeny problem with missing data in time polynomial in the number of minimal separators of the partition intersection graph

    Unique Perfect Phylogeny Characterizations via Uniquely Representable Chordal Graphs

    Full text link
    The perfect phylogeny problem is a classic problem in computational biology, where we seek an unrooted phylogeny that is compatible with a set of qualitative characters. Such a tree exists precisely when an intersection graph associated with the character set, called the partition intersection graph, can be triangulated using a restricted set of fill edges. Semple and Steel used the partition intersection graph to characterize when a character set has a unique perfect phylogeny. Bordewich, Huber, and Semple showed how to use the partition intersection graph to find a maximum compatible set of characters. In this paper, we build on these results, characterizing when a unique perfect phylogeny exists for a subset of partial characters. Our characterization is stated in terms of minimal triangulations of the partition intersection graph that are uniquely representable, also known as ur-chordal graphs. Our characterization is motivated by the structure of ur-chordal graphs, and the fact that the block structure of minimal triangulations is mirrored in the graph that has been triangulated

    Semantic Entity Retrieval Toolkit

    Get PDF
    Unsupervised learning of low-dimensional, semantic representations of words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our previously published entity representation models. The toolkit provides a unified interface to different representation learning algorithms, fine-grained parsing configuration and can be used transparently with GPUs. In addition, users can easily modify existing models or implement their own models in the framework. After model training, SERT can be used to rank entities according to a textual query and extract the learned entity/word representation for use in downstream algorithms, such as clustering or recommendation.Comment: SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17). 201

    Pyndri: a Python Interface to the Indri Search Engine

    Get PDF
    We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open and fast-paced IR research.Comment: ECIR2017. Proceedings of the 39th European Conference on Information Retrieval. 2017. The final publication will be available at Springe

    Lexical Query Modeling in Session Search

    Get PDF
    Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighing perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search.Comment: ICTIR2016, Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval. 201

    Neural Vector Spaces for Unsupervised Information Retrieval

    Get PDF
    We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.Comment: TOIS 201

    Structural Regularities in Text-based Entity Vector Spaces

    Get PDF
    Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in finite-dimensional vector spaces, where both are constructed from text sequences. We investigate entity vector spaces and the degree to which they capture structural regularities. Such vector spaces are constructed in an unsupervised manner without explicit information about structural aspects. For concreteness, we address these questions for a specific type of entity: experts in the context of expert finding. We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank. We compare latent, continuous representations created using methods based on distributional semantics (LSI), topic models (LDA) and neural networks (word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as doc2vec and SERT, systematically perform better at clustering than LSI, LDA and word2vec. When it comes to encoding entity relations, SERT performs best.Comment: ICTIR2017. Proceedings of the 3rd ACM International Conference on the Theory of Information Retrieval. 201
    • …
    corecore