14,319 research outputs found

    The study of probability model for compound similarity searching

    Get PDF
    Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model

    Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows

    Get PDF
    We describe a computational protocol to aid the design of small molecule and peptide drugs that target protein-protein interactions, particularly for anti-cancer therapy. To achieve this goal, we explore multiple strategies, including finding binding hot spots, incorporating chemical similarity and bioactivity data, and sampling similar binding sites from homologous protein complexes. We demonstrate how to combine existing interdisciplinary resources with examples of semi-automated workflows. Finally, we discuss several major problems, including the occurrence of drug-resistant mutations, drug promiscuity, and the design of dual-effect inhibitors.Fil: Goncearenco, Alexander. National Institutes of Health; Estados UnidosFil: Li, Minghui. Soochow University; China. National Institutes of Health; Estados UnidosFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Shoemaker, Benjamin A. National Institutes of Health; Estados UnidosFil: Panchenko, Anna R. National Institutes of Health; Estados Unido

    Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search

    Full text link
    Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have only been applied to standard ad hoc retrieval tasks over web pages and newswire documents. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network) a novel neural ranking model specifically designed for ranking short social media posts. We identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain, and present a model specifically designed with these characteristics in mind. Our model uses hierarchical convolutional layers to learn latent semantic soft-match relevance signals at the character, word, and phrase levels. A pooling-based similarity measurement layer integrates evidence from multiple types of matches between the query, the social media post, as well as URLs contained in the post. Extensive experiments using Twitter data from the TREC Microblog Tracks 2011--2014 show that our model significantly outperforms prior feature-based as well and existing neural ranking models. To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models.Comment: AAAI 2019, 10 page

    BIKE: Bilingual Keyphrase Experiments

    Get PDF
    This paper presents a novel strategy for translating lists of keyphrases. Typical keyphrase lists appear in scientific articles, information retrieval systems and web page meta-data. Our system combines a statistical translation model trained on a bilingual corpus of scientific papers with sense-focused look-up in a large bilingual terminological resource. For the latter, we developed a novel technique that benefits from viewing the keyphrase list as contextual help for sense disambiguation. The optimal combination of modules was discovered by a genetic algorithm. Our work applies to the French / English language pair

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Automated simulation as part of a design workstation

    Get PDF
    A development project for a design workstation for advanced life-support systems (called the DAWN Project, for Design Assistant Workstation), incorporating qualitative simulation, required the implementation of a useful qualitative simulation capability and the integration of qualitative and quantitative simulation such that simulation capabilities are maximized without duplication. The reason is that to produce design solutions to a system goal, the behavior of the system in both a steady and perturbed state must be represented. The Qualitative Simulation Tool (QST), on an expert-system-like model building and simulation interface toll called ScratchPad (SP), and on the integration of QST and SP with more conventional, commercially available simulation packages now being applied in the evaluation of life-support system processes and components are discussed

    Cross-lingual Question Answering with QED

    Get PDF
    We present improvements and modifications of the QED open-domain question answering system developed for TREC-2003 to make it cross-lingual for participation in the CrossLinguistic Evaluation Forum (CLEF) Question Answering Track 2004 for the source languages French and German and the target language English. We use rule-based question translation extended with surface pattern-oriented pre- and post-processing rules for question reformulation to create and English query from its French or German original. Our system uses deep processing for the question and answers, which requires efficient and radical prior search space pruning. For answering factoid questions, we report an accuracy of 16% (German to English) and 20% (French to English), respectively

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids
    corecore