Search CORE

1,363 research outputs found

Potential Maximal Clique Algorithms for Perfect Phylogeny Problems

Author: Gysel Rob
Publication venue
Publication date: 15/03/2013
Field of study

Kloks, Kratsch, and Spinrad showed how treewidth and minimum-fill, NP-hard combinatorial optimization problems related to minimal triangulations, are broken into subproblems by block subgraphs defined by minimal separators. These ideas were expanded on by Bouchitt\'e and Todinca, who used potential maximal cliques to solve these problems using a dynamic programming approach in time polynomial in the number of minimal separators of a graph. It is known that solutions to the perfect phylogeny problem, maximum compatibility problem, and unique perfect phylogeny problem are characterized by minimal triangulations of the partition intersection graph. In this paper, we show that techniques similar to those proposed by Bouchitt\'e and Todinca can be used to solve the perfect phylogeny problem with missing data, the two- state maximum compatibility problem with missing data, and the unique perfect phylogeny problem with missing data in time polynomial in the number of minimal separators of the partition intersection graph

arXiv.org e-Print Archive

eScholarship - University of California

Unique Perfect Phylogeny Characterizations via Uniquely Representable Chordal Graphs

Author: Gysel Rob
Publication venue
Publication date: 06/05/2013
Field of study

The perfect phylogeny problem is a classic problem in computational biology, where we seek an unrooted phylogeny that is compatible with a set of qualitative characters. Such a tree exists precisely when an intersection graph associated with the character set, called the partition intersection graph, can be triangulated using a restricted set of fill edges. Semple and Steel used the partition intersection graph to characterize when a character set has a unique perfect phylogeny. Bordewich, Huber, and Semple showed how to use the partition intersection graph to find a maximum compatible set of characters. In this paper, we build on these results, characterizing when a unique perfect phylogeny exists for a subset of partial characters. Our characterization is stated in terms of minimal triangulations of the partition intersection graph that are uniquely representable, also known as ur-chordal graphs. Our characterization is motivated by the structure of ur-chordal graphs, and the fact that the block structure of minimal triangulations is mirrored in the graph that has been triangulated

arXiv.org e-Print Archive

eScholarship - University of California

Semantic Entity Retrieval Toolkit

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue
Publication date: 01/01/2017
Field of study

Unsupervised learning of low-dimensional, semantic representations of words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our previously published entity representation models. The toolkit provides a unified interface to different representation learning algorithms, fine-grained parsing configuration and can be used transparently with GPUs. In addition, users can easily modify existing models or implement their own models in the framework. After model training, SERT can be used to rank entities according to a textual query and extract the learned entity/word representation for use in downstream algorithms, such as clustering or recommendation.Comment: SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17). 201

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Pyndri: a Python Interface to the Indri Search Engine

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue
Publication date: 01/01/2017
Field of study

We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open and fast-paced IR research.Comment: ECIR2017. Proceedings of the 39th European Conference on Information Retrieval. 2017. The final publication will be available at Springe

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Lexical Query Modeling in Session Search

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighing perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search.Comment: ICTIR2016, Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval. 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Neural Vector Spaces for Unsupervised Information Retrieval

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/08/2018
Field of study

We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.Comment: TOIS 201

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Structural Regularities in Text-based Entity Vector Spaces

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in finite-dimensional vector spaces, where both are constructed from text sequences. We investigate entity vector spaces and the degree to which they capture structural regularities. Such vector spaces are constructed in an unsupervised manner without explicit information about structural aspects. For concreteness, we address these questions for a specific type of entity: experts in the context of expert finding. We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank. We compare latent, continuous representations created using methods based on distributional semantics (LSI), topic models (LDA) and neural networks (word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as doc2vec and SERT, systematically perform better at clustering than LSI, LDA and word2vec. When it comes to encoding entity relations, SERT performs best.Comment: ICTIR2017. Proceedings of the 3rd ACM International Conference on the Theory of Information Retrieval. 201

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications