Search CORE

44 research outputs found

A practical index for approximate dictionary matching with few mismatches

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue
Publication date: 11/02/2016
Field of study

Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in

q

-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

arXiv.org e-Print Archive

Crossref

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Similarity-based virtual screening using 2D fingerprints

Author: Bajorath
Belkin
Bender
Brown
Brown
Carhart
Charifson
Chen
Chen
Clark
Cramer
Cramer
Cruciani
Dixon
Downs
Everitt
Fligner
Flower
Ginn
Ginn
Godden
Godden
Gower
Hall
Harper
He
Hert
Hert
Hert
Hert
Holliday
Holliday
Hsu
Hubálek
Jenkins
Kearsley
Klein
Kubinyi
Lajiness
Leach
Makara
Martin
Matter
Nikolova
Patel
Peter Willett
Salim
Schuffenhauer
Schuffenhauer
Shanmugasundaram
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Stahura
Stahura
Walters
Wang
Warr
Whittle
Willett
Willett
Willett
Willett
Xue
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/12/2006
Field of study

This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available

Crossref

White Rose Research Online

IMPROVING MOLECULAR FINGERPRINT SIMILARITY VIA ENHANCED FOLDING

Author: Chen Victor
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2011
Field of study

Drug discovery depends on scientists finding similarity in molecular fingerprints to the drug target. A new way to improve the accuracy of molecular fingerprint folding is presented. The goal is to alleviate a growing challenge due to excessively long fingerprints. This improved method generates a new shorter fingerprint that is more accurate than the basic folded fingerprint. Information gathered during preprocessing is used to determine an optimal attribute order. The most commonly used blocks of bits can then be organized and used to generate a new improved fingerprint for more optimal folding. We thenapply the widely usedTanimoto similarity search algorithm to benchmark our results. We show an improvement in the final results using this method to generate an improved fingerprint when compared against other traditional folding methods

SJSU ScholarWorks

The use of MoStBioDat for rapid screening of molecular diversity

Author: Bąk Andrzej
Kurczyk Agata
Polański Jarosław
Publication venue: 'MDPI AG'
Publication date: 01/01/2009
Field of study

MoStBioDat is a uniform data storage and extraction system with an extensive array of tools for structural similarity measures and pattern matching which is essential to facilitate the drug discovery process. Structure-based database screening has recently become a common and efficient technique in early stages of the drug development, shifting the emphasis from rational drug design into the probability domain of more or less random discovery. The virtual ligand screening (VLS), an approach based on high-throughput flexible docking, samples a virtually infinite molecular diversity of chemical libraries increasing the concentration of molecules with high binding affinity. The rapid process of subsequent examination of a large number of molecules in order to optimize the molecular diversity is an attractive alternative to the traditional methods of lead discovery. This paper presents the application of the MoStBioDat package not only as a data management platform but mainly in substructure searching. In particular, examples of the applications of MoStBioDat are discussed and analyze

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Repozytorium Uniwersytetu Śląskiego RE-BUŚ

RASCAL: calculation of graph similarity using maximum common edge subgraphs

Author: Gardiner E.J.
Raymond J.W.
Willett P.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2002
Field of study

A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection

CiteSeerX

Crossref

White Rose Research Online

Chemical Diversity of Scarab Beetle Pheromones and its Implication in Chemical Evolution

Author: Janairo Jose Isagani B.
Lumabas Jason Lorenzo
Sioson John Carlos
Publication venue: European Online Journal of Natural and Social Sciences
Publication date: 28/01/2016
Field of study

Pheromones are species-specific chemical signals used by insects to communicate, to find a mate, and to identify their territory. In this paper, we analyzed the structural similarity of scarab beetle pheromones using the Tanimoto coefficient in an attempt to draw insights regarding their ecology, evolution and chemotaxonomy. The results showed a very diverse scarab beetle pheromone structure which provides further support to an earlier hypothesis regarding beetle pheromone evolution. In addition, it was found that the scarab beetle pheromone structure cannot be used as a species marker in chemotaxonomy owing to the observed high structural diversity

European Online Journal of Natural and Social Sciences

European Online Journal of Natural and Social Sciences (ES)

Clustering Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small Molecules

Author: Aci Samia
Bisson Gilles
Gordon Mirta
Lafanechère Laurence
Maréchal Eric
Roy Sylvaine
Wieczorek Samuel
Publication venue: 'IntechOpen'
Publication date: 12/09/2011
Field of study

IntechOpen

Crossref

Functional Group and Substructure Searching as a Tool in Metabolomics

Author: A Dalby
AG McDonald
Andrew G. McDonald
B Chen
BL Bush
C Chang
CA James
CA Lipinski
CA Nicolaou
D Weininger
D Weininger
D Wild
DM Bayada
DR Flower
F Oellien
FH Allen
G Klopman
GJ Leigh
I Muegge
J Antal
J Polanski
Ji Zhu
JL Wisniewski
JM Barnard
JW Raymond
JW Raymond
JW Raymond
K Hult
Keith F. Tipton
LB Ellis
M Hattori
M Kanehisa
M Kotera
M Kotera
Masaaki Kotera
MG Poolman
NJ Richmond
O Hofmann
R Arimoto
RD Brown
RD Brown
S Ash
SA Khedkar
Sinéad Boyce
SJ Coles
VV Poroikov
W Schwab
WD Ihlenfeldt
WD Ihlenfeldt
WJ Wiswesser
WJ Wiswesser
WJ Wiswesser
Publication venue: Public Library of Science
Publication date: 06/02/2008
Field of study

BACKGROUND: A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases. METHODOLOGY: We have developed a database named "Biochemical Substructure Search Catalogue" (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest. CONCLUSIONS: This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Identifying new topoisomerase II poison scaffolds by combining publicly available toxicity data and 2D/3D-based virtual screening

Author: Heffeter Petra
Kalaszi Adrian
Lovrics Anna
Magyar Csaba
Pape Veronika
Szakács Gergely
Szisz Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Molecular descriptor (2D) and three dimensional (3D) shape based similarity methods are widely used in ligand based virtual drug design. In the present study pairwise structure comparisons among a set of 4858 DTP compounds tested in the NCI60 tumor cell line anticancer drug screen were computed using chemical hashed fingerprints and 3D molecule shapes to calculate 2D and 3D similarities, respectively. Additionally, pairwise biological activity similarities were calculated by correlating the 60 element vectors of pGI50 values corresponding to the cytotoxicity of the compounds across the NCI60 panel. Subsequently, we compared the power of 2D and 3D structural similarity metrics to predict the toxicity pattern of compounds. We found that while the positive predictive value and sensitivity of 3D and molecular descriptor based approaches to predict biological activity are similar, a subset of molecule pairs yielded contradictory results. By simultaneously requiring similarity of biological activities and 3D shapes, and dissimilarity of molecular descriptor based comparisons, we identify pairs of scaffold hopping candidates displaying characteristic core structural changes such as heteroatom/heterocycle change and ring closure. Attempts to discover scaffold hopping candidates of mitoxantrone recovered known Topoisomerase II (Top2) inhibitors, and also predicted new, previously unknown chemotypes possessing in vitro Top2 inhibitory activity

Repository of the Academy's Library

Semmelweis Repository