1,663 research outputs found

    Analysis of Random Fragment Profiles for the Detection of Structure-Activity Relationships

    Get PDF
    Substructure- or fragment-type descriptors are effective and widely used tools for chemical similarity searching and other applications in chemoinformatics and computer-aided drug discovery. Therefore, a large number of well-defined computational fragmentation schemes has been devised including hierarchical fragmentation of molecules for the analysis of core structures in drugs or retrosynthetic fragmentation of compounds for de novo ligand design. Furthermore, the generation of dictionaries of structural key-type descriptors that are important tools in pharmaceutical research involves knowledge-based fragment design. Currently more than 5 000 standard descriptors are available for the representation of molecular structures, and therefore the selection of suitable combinations of descriptors for specific chemoinformatic applications is a crucial task. This thesis departs from well-defined substructure design approaches. Randomly generated fragment populations are generated and mined for substructures associated with different compound classes. A novel method termed MolBlaster is introduced for the evaluation of molecular similarity relationships on the basis of randomly generated fragment populations. Fragment profiles of molecules are generated by random deletion of bonds in connectivity tables and quantitatively compared using entropy-based metrics. In test calculations, MolBlaster accurately reproduced a structural key-based similarity ranking of druglike molecules. To adapt the generation and comparison of random fragment populations for largescale compound screening, different fragmentation schemes are compared and a novel entropic similarity metric termed PSE is introduced for compound ranking. The approach is extensively tested on different compound activity classes with varying degrees of intra-class structural diversity and produces promising results in these calculations, comparable to similarity searching using state-of-the-art fingerprints. These results demonstrate the potential of randomly generated fragments for the detection of structure-activity relationships. Furthermore, a methodology to analyze random fragment populations at the molecular level of detail is introduced. It determines conditional probability relationships between fragments. Random fragment profiles are generated for an arbitrary set of molecules, and a frequency vector is assigned for each observed fragment. An algorithm is designed to compare frequency vectors and derive dependencies of fragment occurrence. Using calculated dependency values, random fragment populations can be organized in graphs that capture their relationships and make it possible to map fragment pathways of biologically active molecules. For sets of molecules having similar activity, unique fragment signatures, so-called Activity Class Characteristic Substructures (ACCS), are identified. Random fragment profiles are found to contain compound class-specific information and activity-specific fragment hierarchies. In virtual screening trials, short ACCS fingerprints perform well on many compound classes when compared to more complex state-of-the-art 2D fingerprints. In order to elucidate potential reasons for the high predictive utility of ACCS a thorough systematic analysis of their distribution in active and database compounds have been carried out. This reveals that the discriminatory power of ACCS results from the rare occurrence of individual and combinations of ACCS in screening databases. Furthermore, it is shown that ACCS sets isolated from random populations are typically found to form coherent molecular cores in active compounds. Characteristic core regions are already formed by small numbers of substructures and remain stable when more fragments are added. Thus, classspecific random fragment hierarchies encode meaningful structural information, providing a structural rationale for the signature character of activity-specific fragment hierarchies. It follows that compound-class-directed structural descriptors that do not depend on the application of predefined fragmentation or design schemes can be isolated from random fragment populations

    Computational Methods for the Integration of Biological Activity and Chemical Space

    Get PDF
    One general aim of medicinal chemistry is the understanding of structure-activity relationships of ligands that bind to biological targets. Advances in combinatorial chemistry and biological screening technologies allow the analysis of ligand-target relationships on a large-scale. However, in order to extract useful information from biological activity data, computational methods are needed that link activity of ligands to their chemical structure. In this thesis, it is investigated how fragment-type descriptors of molecular structure can be used in order to create a link between activity and chemical ligand space. First, an activity class-dependent hierarchical fragmentation scheme is introduced that generates fragmentation pathways that are aligned using established methodologies for multiple alignment of biological sequences. These alignments are then used to extract consensus fragment sequences that serve as a structural signature for individual biological activity classes. It is also investigated how defined, chemically intuitive molecular fragments can be organized based on their topological environment and co-occurrence in compounds active against closely related targets. Therefore, the Topological Fragment Index is introduced that quantifies the topological environment complexity of a fragment in a given molecule, and thus goes beyond fragment frequency analysis. Fragment dependencies have been established on the basis of common topological environments, which facilitates the identification of activity class-characteristic fragment dependency pathways that describe fragment relationships beyond structural resemblance. Because fragments are often dependent on each other in an activity class-specific manner, the importance of defined fragment combinations for similarity searching is further assessed. Therefore, Feature Co-occurrence Networks are introduced that allow the identification of feature cliques characteristic of individual activity classes. Three differently designed molecular fingerprints are compared for their ability to provide such cliques and a clique-based similarity searching strategy is established. For molecule- and activity class-centric fingerprint designs, feature combinations are shown to improve similarity search performance in comparison to standard methods. Moreover, it is demonstrated that individual features can form activity-class specific combinations. Extending the analysis of feature cliques characteristic of individual activity classes, the distribution of defined fragment combinations among several compound classes acting against closely related targets is assessed. Fragment Formal Concept Analysis is introduced for flexible mining of complex structure-activity relationships. It allows the interactive assembly of fragment queries that yield fragment combinations characteristic of defined activity and potency profiles. It is shown that pairs and triplets, rather than individual fragments distinguish between different activity profiles. A classifier is built based on these fragment signatures that distinguishes between ligands of closely related targets. Going beyond activity profiles, compound selectivity is also analyzed. Therefore, Molecular Formal Concept Analysis is introduced for the systematic mining of compound selectivity profiles on a whole-molecule basis. Using this approach, structurally diverse compounds are identified that share a selectivity profile with selected template compounds. Structure-selectivity relationships of obtained compound sets are further analyzed

    Development of a Novel Virtual Screening Cascade Protocol to Identify Potential Trypanothione Reductase Inhibitors

    Get PDF
    The implementation of a novel sequential computational approach that can be used effectively for virtual screening and identification of prospective ligands that bind to trypanothione reductase (TryR) is reported. The multistep strategy combines a ligand-based virtual screening for building an enriched library of small molecules with a docking protocol (AutoDock, X-Score) for screening against the TryR target. Compounds were ranked by an exhaustive conformational consensus scoring approach that employs a rank-by-rank strategy by combining both scoring functions. Analysis of the predicted ligand-protein interactions highlights the role of bulky quaternary amine moieties for binding affinity. The scaffold hopping (SHOP) process derived from this computational approach allowed the identification of several chemotypes, not previously reported as antiprotozoal agents, which includes dibenzothiepine, dibenzooxathiepine, dibenzodithiepine, and polycyclic cationic structures like thiaazatetracyclo-nonadeca-hexaen-3-ium. Assays measuring the inhibiting effect of these compounds on T. cruzi and T. brucei TryR confirm their potential for further rational optimization

    Study of ligand-based virtual screening tools in computer-aided drug design

    Get PDF
    Virtual screening is a central technique in drug discovery today. Millions of molecules can be tested in silico with the aim to only select the most promising and test them experimentally. The topic of this thesis is ligand-based virtual screening tools which take existing active molecules as starting point for finding new drug candidates. One goal of this thesis was to build a model that gives the probability that two molecules are biologically similar as function of one or more chemical similarity scores. Another important goal was to evaluate how well different ligand-based virtual screening tools are able to distinguish active molecules from inactives. One more criterion set for the virtual screening tools was their applicability in scaffold-hopping, i.e. finding new active chemotypes. In the first part of the work, a link was defined between the abstract chemical similarity score given by a screening tool and the probability that the two molecules are biologically similar. These results help to decide objectively which virtual screening hits to test experimentally. The work also resulted in a new type of data fusion method when using two or more tools. In the second part, five ligand-based virtual screening tools were evaluated and their performance was found to be generally poor. Three reasons for this were proposed: false negatives in the benchmark sets, active molecules that do not share the binding mode, and activity cliffs. In the third part of the study, a novel visualization and quantification method is presented for evaluation of the scaffold-hopping ability of virtual screening tools.Siirretty Doriast

    Virtual compound screening and SAR analysis: method development and practical applications in the design of new serine and cysteine protease inhibitors

    Get PDF
    Virtual screening is an important tool in drug discovery that uses different computational methods to screen chemical databases for the identification of possible drug candidates. Most virtual screening methodologies are knowledge driven where the availability of information on either the nature of the target binding pocket or the type of ligand that is expect to bind is essential. In this regard, the information contained in X-ray crystal structures of protein-ligand complexes provides a detailed insight into the interactions between the protein and the ligand and opens the opportunity for further understanding of drug action and structure activity relationships at molecular level. Protein-ligand interaction information can be utilized to introduce target-specific interaction-based constraints in the design of focused combinatorial libraries. It can also be directly transformed into structural interaction fingerprints and can be applied in virtual screening to analyze docking studies or filter compounds. However, the integration of protein-ligand interaction information into two-dimensional compound similarity searching is not fully explored. Therefore, novel methods are still required to efficiently utilize protein-ligand interaction information in two-dimensional ligand similarity searching. Furthermore, application of protein-ligand interaction information in the interpretation of SARs at the ligand level needs further exploration. Thus, utilization of three-dimensional protein ligand interaction information in virtual screening and SAR analysis was the major aim of this thesis. The thesis is presented in two major parts. In the first part, utilization of three-dimensional protein-ligand interaction information for the development of a new hybrid virtual screening method and analysis of the nature of SARs in analog series at molecular level is presented. The second part of the thesis is focused on the application of different virtual screening methods for the identification of new cysteine and membrane-bound serine proteases inhibitors. In addition, molecular modeling studies were also applied to analyze the binding mode of structurally complex cyclic peptide inhibitors

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    지도 학습 기반 바이오패닝 클론 증폭 패턴 분석을 통한 항원 결합 반응성 예측

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 의과대학 의과학과, 2021.8. 정준호.Background: Monoclonal antibodies (mAbs) are produced by B cells and specifically binds to target antigens. Technical advances in molecular and cellular cloning made it possible to purify recombinant mAbs in a large scale, enhancing the multiple research area and potential for their clinical application. Since the importance of therapeutic mAbs is increasing, mAbs have become the predominant drug classes for various diseases over the past decades. During that time, immense technological advances have made the discovery and development of mAb therapeutics more efficient. Owing to advances in high-throughput methodology in genomic sequencing, phenotype screening, and computational data analysis, it is conceivable to generate the panel of antibodies with annotated characteristics without experiments. Thesis objective: This thesis aims to develop the next-generation antibody discovery methods utilizing high-throughput antibody repertoire sequencing and bioinformatics analysis. I developed novel methods for construction of in vitro display antibody library, and machine learning based antibody discovery. In chapter 3, I described a new method for generating immunoglobulin (Ig) gene repertoire, which minimizes the amplification bias originated from a large number of primers targeting diverse Ig germline genes. Universal primer-based amplification method was employed in generating Ig gene repertoire then validated by high-throughput antibody repertoire sequencing, in the aspect of clonal diversity and immune repertoire reproducibility. A result of this research work is published in ‘Journal of Immunological Methods (2021). doi: 10.1016/j.jim.2021. 113089’. In chapter 4, I described a novel machine learning based antibody discovery method. In conventional colony screening approach, it is impossible to identify antigen specific binders having low clonal abundance, or hindered by non-specific phage particles having antigen reactivity on p8 coat protein. To overcome the limitations, I applied the supervised learning algorithm on high-throughput sequencing data annotated with binding property and clonal frequency through bio-panning. NGS analysis was performed to generate large number of antibody sequences annotated with its’ clonal frequency at each selection round of the bio-panning. By using random forest (RF) algorithm, antigen reactive binders were predicted and validated with in vitro screening experiment. A result of this research work is published in ‘Experimental & Molecular Medicine (2017). doi:0.1038/emm.2017.22’ and ‘Biomolecule (2020). doi:10.3390/biom10030421’. Conclusion: By combining conventional antibody discovery techniques and high-throughput antibody repertoire sequencing, it was able to make advances in multiple attributes of the previous methodology. Multi-cycle amplification with Ig germline gene specific primers showed the high level of repertoire distortion, but could be improved by employing universal primer-based amplification method. RF model generates the large number of antigen reactive antibody sequences having various clonal enrichment pattern. This result offers the new insight in interpreting clonal enrichment process, frequency of antigen specific binder does not increase gradually but depends on the multiple selection rounds. Supervised learning-based method also provides the more diverse antigen specific clonotypes than conventional antibody discovery methods.연구의 배경: 단일 클론 항체 (monoclonal antibody, mAb) 는 B 세포에서 생산되어 표적 항원에 특이적으로 결합하는 폴리펩타이드 복합체 이다. 분자 및 세포 클로닝 기술의 발전으로 재조합 단일 클론 항체를 대용량으로 생산하는것이 가능해졌으며, 이를 바탕으로 다양한 연구 및 임상 분야에서의 활용이 확대되고 있다. 또한 치료용 항체를 효율적으로 발굴하고 개발하는 기술에 대한 비약적인 발전이 이루어졌다. 유전자 서열 분석, 표현형 스크리닝, 컴퓨팅 기반 분석법 분야에서 이루어진 고집적 방법론 (high-throughput methodology) 의 발전과 이의 응용을 통해, 비실험적 방법을 통해 항원 반응성 항체 패널을 생산하는것이 가능해졌다. 연구의 목표: 본 박사 학위 논문은 고집적 항체 레퍼토어 시퀀싱 (high-throughput antibody repertoire sequencing) 과 생물정보학 (bioinformatics) 기법을 활용하여 신규한 (novel) 차세대 항체 발굴법 (next-generation antibody discovery method) 을 개발하는것을 목표로 하고 있다. 본 연구를 통해 in vitro display 항체 라이브러리를 제작하기 위한 신규 프로토콜 및 기계 학습을 기반으로한 항체 발굴법을 개발 하였다. Chapter 3: 항체 레퍼토어를 증폭하는 과정에서, 다수의 생식세포 면역 글로불린 유전자 (germline immunoglobulin gene) 특이적 프라이머 사용에 의해 발생하는 증폭 편차 (amplification bias) 를 최소화 하는 방법론에 대해 기술하였다. 유니버셜 (universal) 프라이머를 사용한 다중 사이클 증폭 (multi-cycle amplification) 법이 사용되었으며, 고집적 항체 레퍼토어 시퀀싱을 통해, 클론 다양성 (clonal diversity) 및 면역 레퍼토어 재구성도 (immune repertoire reproducibility) 를 생물정보학적 기법으로 측정하여 신규 방법론에 대한 검증을 수행하였다. 본 연구의 연구결과는 다음의 학술지에 출판 되었다: Journal of Immunological Methods (2021). doi: 10.1016/j.jim.2021. 113089. Chapter 4: 기계 학습 기반의 항체 발굴법 개발에 대해 기술하였다. 전통적 콜로니 스크리닝 (colony screening) 방법에서는, 클론 빈도 (clonal abundance) 가 낮은 클론을 발굴 하거나 선택압 (selective pressure) 이 부여되는 과정에서, p8 표면 단백질의 비 특이적 항원 특이성을 제거할 수 없다. 이러한 제한점을 극복하기 위해서 항원 결합능 및 바이오패닝 에서의 클론 빈도가 측정 되어있는 고집적 항체 서열 데이터를 대상으로 지도 학습 알고리즘을 적용하였다. 랜덤 포레스트 (random forest, RF) 알고리즘을 적용하여 항원 특이적 항체 클론을 예측하였으며, 시험관 내 스크리닝을 통해 항원 특이성을 검증하였다. 본 연구의 연구 결과는 다음의 학술지에 출판되었다: 1) Experimental & Molecular Medicine (2017). doi:0.1038/emm.2017.22., 2) Biomolecule (2020). doi:10.3390/biom10030421. 결론: 전통적 항체 발굴 기술과 고집적 항체 레퍼토어 시퀀싱 기술을 융합함으로써, 기존 방법론의 다양한 한계점을 개선할 수 있었다. 면역 글로불린 생식세포 유전자 특이적 프라이머를 사용한 다중 사이클 증폭은 클론 빈도 및 다양성에 왜곡을 유도 하였으나, 유니버셜 프라이머를 사용한 증폭법을 통해 높은 효율로 레퍼토어 왜곡을 개선시킬 수 있음을 관찰할 수 있었다. RF 모델은 다양한 클론 증폭 패턴 (enrichment pattern) 을 가지는 항원 반응성 항체 서열을 생성하였다. 이를 통해 항원에 특이적으로 결합하는 클론이 단계적으로 증폭되는 것이 아니라 초기 및 후기의 다수의 선별 단계 (selection round) 에 의존함을 확인할 수 있었으며, 바이오패닝 에서의 클론 증폭에 대한 새로운 해석을 제시하였다. 또한 지도 학습을 기반으로 발굴 된 클론들에서, 전통적 콜로니 스크리닝 방법과 대비하여 더 높은 서열 다양성을 관찰할 수 있었다.1. Introduction 8 1.1. Antibody and immunoglobulin repertoire 8 1.2. Antibody therapeutics 16 1.3. Methodology: antibody discovery and engineering 21 2. Thesis objective 28 3. Establishment of minimally biased phage display library construction method for antibody discovery 29 3.1. Abstract 29 3.2. Introduction 30 3.3. Results 32 3.4. Discussion 44 3.5. Methods 47 4. In silico identification of target specific antibodies by high-throughput antibody repertoire sequencing and machine learning 58 4.1. Abstract 58 4.2. Introduction 60 4.3. Results 64 4.4. Discussion 111 4.5. Methods 116 5. Future perspectives 129 6. References 135 7. Abstract in Korean 150박

    Molecular Distance Maps: An alignment-free computational tool for analyzing and visualizing DNA sequences\u27 interrelationships

    Get PDF
    In an attempt to identify and classify species based on genetic evidence, we propose a novel combination of methods to quantify and visualize the interrelationships between thousand of species. This is possible by using Chaos Game Representation (CGR) of DNA sequences to compute genomic signatures which we then compare by computing pairwise distances. In the last step, the original DNA sequences are embedded in a high dimensional space using Multi-Dimensional Scaling (MDS) before everything is projected on a Euclidean 3D space. To start with, we apply this method to a mitochondrial DNA dataset from NCBI containing over 3,000 species. The analysis shows that the oligomer composition of full mtDNA sequences can be a source of taxonomic information, suggesting that this method could be used for unclassified species and taxonomic controversies. Next, we test the hypothesis that CGR-based genomic signature is preserved along a species\u27 genome by comparing inter- and intra-genomic signatures of nuclear DNA sequences from six different organisms, one from each kingdom of life. We also compare six different distances and we assess their performance using statistical measures. Our results support the existence of a genomic signature for a species\u27 genome at the kingdom level. In addition, we test whether CGR-based genomic signatures originating only from nuclear DNA can be used to distinguish between closely-related species and we answer in the negative. To overcome this limitation, we propose the concept of ``composite signatures\u27\u27 which combine information from different types of DNA and we show that they can effectively distinguish all closely-related species under consideration. We also propose the concept of ``assembled signatures\u27\u27 which, among other advantages, do not require a long contiguous DNA sequence but can be built from smaller ones consisting of ~100-300 base pairs. Finally, we design an interactive webtool MoDMaps3D for building three-dimensional Molecular Distance Maps. The user can explore an already existing map or build his/her own using NCBI\u27s accession numbers as input. MoDMaps3D is platform independent, written in Javascript and can run in all major modern browsers

    Clustering for 2D chemical structures

    Get PDF
    The clustering of chemical structures is important and widely used in several areas of chemoinformatics. A little-discussed aspect of clustering is standardization, it ensures all descriptors in a chemical representation make a comparable contribution to the measurement of similarity. The initial study compares the effectiveness of seven different standardization procedures that have been suggested previously, the results were also compared with unstandardized datasets. It was found that no one standardization method offered consistently the best performance. Comparative studies of clustering effectiveness are helpful in providing suitability and guidelines of different methods. In order to examine the suitability of different clustering methods for the application in chemoinformatics, especially those had not previously been applied to chemoinformatics, the second piece of study carries out an effectiveness comparison of nine clustering methods. However, the result revealed that it is unlikely that a single clustering method can provide consistently the best partition under all circumstances. Consensus clustering is a technique to combine multiple input partitions of the same set of objects to achieve a single clustering that is expected to provide a more robust and more generally effective representation of the partitions that are submitted. The third piece of study reports the use of seven different consensus clustering methods which had not previously been used on sets of chemical compounds represented by 2D fingerprints. Their effectiveness was compared with some traditional clustering methods discussed in the second study. It was observed that no consistently best consensus clustering method was found
    corecore