Search CORE

19,192 research outputs found

The study of probability model for compound similarity searching

Author: Abd. Wahid Mohd. Taib
Alwee Razana
Dollah @ Md. Zain Rozilawati
Salim Naomie
Publication venue: Faculty of Computer Science and Information System
Publication date: 30/09/2006
Field of study

Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model

Universiti Teknologi Malaysia Institutional Repository

Spherical harmonics coeffcients for ligand-based virtual screening of cyclooxygenase inhibitors

Author: Angioni Carlo Federico
Birod Kerstin
Geppert Tim
Grösch Sabine
Rupp Matthias
Schneider Gisbert (Prof. Dr.)
Schneider Petra
Wang Quan
Publication venue
Publication date: 27/07/2011
Field of study

Background: Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. Methodology/Principal Findings: We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. Conclusions/Significance: 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort

Hochschulschriftenserver - Universität Frankfurt am Main

Query Expansion of Zero-Hit Subject Searches: Using a Thesaurus in Conjunction with NLP Techniques

Author: A. Shiri
E.P. Lau
J. Greenberg
L. Hollink
L. Villén-Rueda
R. Mandala
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

The focus of our study is zero-hit queries in keyword subject searches and the effort of increasing recall in these cases by reformulating and, then, expanding the initial queries using an external source of knowledge, namely a thesaurus. To this end, the objectives of this study are twofold. First, we perform the mapping of query terms to the thesaurus terms. Second, we use the matched terms to expand the user’s initial query by taking advantage of the thesaurus relations and implementing natural language processing (NLP) techniques. We report on the overall procedure and elaborate on key points and considerations of each step of the process

E-LIS

Crossref

¹³C NMR metabolomics: applications at natural abundance.

Author: Clendinen Chaevien
Edison Arthur
Hahn Daniel
Lee-McMullen Brittany
Stupp Gregory
Vandenborne Krista
Walter Glenn
Williams Caroline
Publication venue: eScholarship, University of California
Publication date: 01/09/2014
Field of study

(13)C NMR has many advantages for a metabolomics study, including a large spectral dispersion, narrow singlets at natural abundance, and a direct measure of the backbone structures of metabolites. However, it has not had widespread use because of its relatively low sensitivity compounded by low natural abundance. Here we demonstrate the utility of high-quality (13)C NMR spectra obtained using a custom (13)C-optimized probe on metabolomic mixtures. A workflow was developed to use statistical correlations between replicate 1D (13)C and (1)H spectra, leading to composite spin systems that can be used to search publicly available databases for compound identification. This was developed using synthetic mixtures and then applied to two biological samples, Drosophila melanogaster extracts and mouse serum. Using the synthetic mixtures we were able to obtain useful (13)C-(13)C statistical correlations from metabolites with as little as 60 nmol of material. The lower limit of (13)C NMR detection under our experimental conditions is approximately 40 nmol, slightly lower than the requirement for statistical analysis. The (13)C and (1)H data together led to 15 matches in the database compared to just 7 using (1)H alone, and the (13)C correlated peak lists had far fewer false positives than the (1)H generated lists. In addition, the (13)C 1D data provided improved metabolite identification and separation of biologically distinct groups using multivariate statistical analysis in the D. melanogaster extracts and mouse serum

PubMed Central

eScholarship - University of California

Query recovery of short user queries: on query expansion with stopwords

Author: Jones Gareth J.F.
Leveling Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate a linguistically well-formed query from a given user query as input to provide natural language processing tasks and cross-language information retrieval (CLIR). The evaluation of query translation shows that translation scores (NIST and BLEU) decrease after case folding, stopword removal, and stemming. A baseline method for query recovery reconstructs capitalization and stopwords, which considerably increases translation scores and significantly increases mean average precision for a standard CLIR task

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

Author: Jones Gareth J.F.
Leveling Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2010
Field of study

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

Irish Universities

DCU Online Research Access Service

Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows

Author: A Goncearenco
A Goncearenco
A Marchler-Bauer
A Truszkowski
AA Bogan
B Graves
B Ma
BA Shoemaker
BA Shoemaker
BA Shoemaker
BJ Smith
CA Goble
CM Yates
D Petrey
E Cukuroglu
FP Davis
H Perez-Sanchez
HS Haase
J Bhagat
J Cinatl
JA Wells
K Wolstencroft
M Guharoy
M Li
M Li
M Li
M Petukh
M Tyagi
MK Gilson
MP Mazanetz
N Estrada-Ortiz
P Aloy
P Aloy
P Filippakopoulos
R Mosca
RR Thangudu
S Beisken
S Kim
S Shangary
S Teng
T Rolland
W Yang
WS Valdar
Y Wang
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We describe a computational protocol to aid the design of small molecule and peptide drugs that target protein-protein interactions, particularly for anti-cancer therapy. To achieve this goal, we explore multiple strategies, including finding binding hot spots, incorporating chemical similarity and bioactivity data, and sampling similar binding sites from homologous protein complexes. We demonstrate how to combine existing interdisciplinary resources with examples of semi-automated workflows. Finally, we discuss several major problems, including the occurrence of drug-resistant mutations, drug promiscuity, and the design of dual-effect inhibitors.Fil: Goncearenco, Alexander. National Institutes of Health; Estados UnidosFil: Li, Minghui. Soochow University; China. National Institutes of Health; Estados UnidosFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Shoemaker, Benjamin A. National Institutes of Health; Estados UnidosFil: Panchenko, Anna R. National Institutes of Health; Estados Unido

Crossref

CONICET Digital

Incorporation of two terminology projects into a system for information retrieval using NLP for term expansion

Author: Van Wiele Kurt
Vanopstal Klaar
Publication venue: Universidad de Alicante. Instituto interUniversitario de Lenguas Modernas Aplicadas (IULMA)
Publication date: 01/01/2007
Field of study

In this paper, we will discuss two medical terminology projects at the University College of Ghent, Faculty of translation studies, and the benefits of combining them to provide Dutch professionals and laymen with better access to information in biomedical databases. Our first project, the MeSH Termbase Project (MTB) is aimed at health care professionals, medical translators and also patients in need of language support. The main aim of our second project, the Multilingual Glossary of Technical and Popular Medical Terms, is the simplification of the terminology used in patient information leaflets

Ghent University Academic Bibliography

AFLOW-ML: A RESTful API for machine-learning predictions of materials properties

Author: Carrete Jesús
Curtarolo Stefano
Gossett Eric
Isayev Olexandr
Legrain Fleur
Mingo Natalio
Oses Corey
Rose Frisco
Toher Cormac
Tropsha Alexander
Zurek Eva
Publication venue
Publication date: 29/11/2017
Field of study

Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materials \unicode{x2014} neglecting the non-synthesizable systems and those without the desired properties \unicode{x2014} thus reducing the amount of resources spent on expensive computations and/or time-consuming experimental synthesis. However, using these predictive models is not always straightforward. Often, they require a panoply of technical expertise, creating barriers for general users. AFLOW-ML (AFLOW

\underline{\mathrm{M}}

achine

\underline{\mathrm{L}}

earning) overcomes the problem by streamlining the use of the machine learning methods developed within the AFLOW consortium. The framework provides an open RESTful API to directly access the continuously updated algorithms, which can be transparently integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties. These types of interconnected cloud-based applications are envisioned to be capable of further accelerating the adoption of machine learning methods into materials development.Comment: 10 pages, 2 figure

arXiv.org e-Print Archive

MPG.PuRe