319 research outputs found

    Exploring the potential of Spherical Harmonics and PCVM for compounds activity prediction

    Get PDF
    Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs

    Robust Object Classification Approach using Spherical Harmonics

    Get PDF
    Point clouds produced by either 3D scanners or multi-view images are often imperfect and contain noise or outliers. This paper presents an end-to-end robust spherical harmonics approach to classifying 3D objects. The proposed framework first uses the voxel grid of concentric spheres to learn features over the unit ball. We then limit the spherical harmonics order level to suppress the effect of noise and outliers. In addition, the entire classification operation is performed in the Fourier domain. As a result, our proposed model learned features that are less sensitive to data perturbations and corruptions. We tested our proposed model against several types of data perturbations and corruptions, such as noise and outliers. Our results show that the proposed model has fewer parameters, competes with state-of-art networks in terms of robustness to data inaccuracies, and is faster than other robust methods. Our implementation code is also publicly available1

    Geometric guides for interactive evolutionary design

    Get PDF
    This thesis describes the addition of novel Geometric Guides to a generative Computer-Aided Design (CAD) application that supports early-stage concept generation. The application generates and evolves abstract 3D shapes, used to inspire the form of new product concepts. It was previously a conventional Interactive Evolutionary system where users selected shapes from evolving populations. However, design industry users wanted more control over the shapes, for example by allowing the system to influence the proportions of evolving forms. The solution researched, developed, integrated and tested is a more cooperative human-machine system combining classic user interaction with innovative geometric analysis. In the literature review, different types of Interactive Evolutionary Computation (IEC), Pose Normalisation (PN), Shape Comparison, and Minimum-Volume Bounding Box approaches are compared, with some of these technologies identified as applicable for this research. Using its Application Programming Interface, add-ins for the Siemens NX CAD system have been developed and integrated with an existing Interactive Evolutionary CAD system. These add-ins allow users to create a Geometric Guide (GG) at the start of a shape exploration session. Before evolving shapes can be compared with the GG, they must be aligned and scaled (known as Pose Normalisation in the literature). Computationally-efficient PN has been achieved using geometric functions such as Bounding Box for translation and scaling, and Principle Axes for the orientation. A shape comparison algorithm has been developed that is based on the principle of non-intersecting volumes. This algorithm is also implemented with standard, readily available geometric functions, is conceptually simple, accessible to other researchers and also offers appropriate efficacy. Objective geometric testing showed that the PN and Shape Comparison methods developed are suitable for this guiding application and can be efficiently adapted to enhance an Interactive Evolutionary Design system. System performance with different population sizes was examined to indicate how best to use the new guiding capabilities to assist users in evolutionary shape searching. This was backed up by participant testing research into two user interaction strategies. A Large Background Population (LBP) approach where the GG is used to select a sub-set of shapes to show to the user was shown to be the most effective. The inclusion of Geometric Guides has taken the research from the existing aesthetic focused tool to a system capable of application to a wider range of engineering design problems. This system supports earlier design processes and ideation in conceptual design and allows a designer to experiment with ideas freely to interactively explore populations of evolving solutions. The design approach has been further improved, and expanded beyond the previous quite limited scope of form exploration

    Protein-protein docking using region-based 3D Zernike descriptors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.</p> <p>Results</p> <p>We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-<it>α</it>RMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.</p> <p>Conclusion</p> <p>We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.</p

    Machine learning approaches for lung cancer diagnosis.

    Get PDF
    The enormity of changes and development in the field of medical imaging technology is hard to fathom, as it does not just represent the technique and process of constructing visual representations of the body from inside for medical analysis and to reveal the internal structure of different organs under the skin, but also it provides a noninvasive way for diagnosis of various disease and suggest an efficient ways to treat them. While data surrounding all of our lives are stored and collected to be ready for analysis by data scientists, medical images are considered a rich source that could provide us with a huge amount of data, that could not be read easily by physicians and radiologists, with valuable information that could be used in smart ways to discover new knowledge from these vast quantities of data. Therefore, the design of computer-aided diagnostic (CAD) system, that can be approved for use in clinical practice that aid radiologists in diagnosis and detecting potential abnormalities, is of a great importance. This dissertation deals with the development of a CAD system for lung cancer diagnosis, which is the second most common cancer in men after prostate cancer and in women after breast cancer. Moreover, lung cancer is considered the leading cause of cancer death among both genders in USA. Recently, the number of lung cancer patients has increased dramatically worldwide and its early detection doubles a patient’s chance of survival. Histological examination through biopsies is considered the gold standard for final diagnosis of pulmonary nodules. Even though resection of pulmonary nodules is the ideal and most reliable way for diagnosis, there is still a lot of different methods often used just to eliminate the risks associated with the surgical procedure. Lung nodules are approximately spherical regions of primarily high density tissue that are visible in computed tomography (CT) images of the lung. A pulmonary nodule is the first indication to start diagnosing lung cancer. Lung nodules can be benign (normal subjects) or malignant (cancerous subjects). Large (generally defined as greater than 2 cm in diameter) malignant nodules can be easily detected with traditional CT scanning techniques. However, the diagnostic options for small indeterminate nodules are limited due to problems associated with accessing small tumors. Therefore, additional diagnostic and imaging techniques which depends on the nodules’ shape and appearance are needed. The ultimate goal of this dissertation is to develop a fast noninvasive diagnostic system that can enhance the accuracy measures of early lung cancer diagnosis based on the well-known hypotheses that malignant nodules have different shape and appearance than benign nodules, because of the high growth rate of the malignant nodules. The proposed methodologies introduces new shape and appearance features which can distinguish between benign and malignant nodules. To achieve this goal a CAD system is implemented and validated using different datasets. This CAD system uses two different types of features integrated together to be able to give a full description to the pulmonary nodule. These two types are appearance features and shape features. For the appearance features different texture appearance descriptors are developed, namely the 3D histogram of oriented gradient, 3D spherical sector isosurface histogram of oriented gradient, 3D adjusted local binary pattern, 3D resolved ambiguity local binary pattern, multi-view analytical local binary pattern, and Markov Gibbs random field. Each one of these descriptors gives a good description for the nodule texture and the level of its signal homogeneity which is a distinguishable feature between benign and malignant nodules. For the shape features multi-view peripheral sum curvature scale space, spherical harmonics expansions, and different group of fundamental geometric features are utilized to describe the nodule shape complexity. Finally, the fusion of different combinations of these features, which is based on two stages is introduced. The first stage generates a primary estimation for every descriptor. Followed by the second stage that consists of an autoencoder with a single layer augmented with a softmax classifier to provide us with the ultimate classification of the nodule. These different combinations of descriptors are combined into different frameworks that are evaluated using different datasets. The first dataset is the Lung Image Database Consortium which is a benchmark publicly available dataset for lung nodule detection and diagnosis. The second dataset is our local acquired computed tomography imaging data that has been collected from the University of Louisville hospital and the research protocol was approved by the Institutional Review Board at the University of Louisville (IRB number 10.0642). These frameworks accuracy was about 94%, which make the proposed frameworks demonstrate promise to be valuable tool for the detection of lung cancer

    Untersuchung der Struktur und Interaktion mit allosterischen Modulatoren der Familie C GPCRs mit Hilfe von Sequenz-, Struktur- und Ligand-basierten Verfahren

    Get PDF
    This study focuses on structural features of a particular GPCR type, the family C GPCRs. Structure- and ligand-based approaches were adopted for prediction of novel mGluR5 binding ligand and their binding modes. The objectives of this study were: 1. An analysis of function and structural implication of amino acids in the TM region of family C GPCRs. 2. The prediction of the TM domain structure of mGluR5. 3. The discovery of novel selective allosteric modulators of mGluR5 by virtual screening. 4. The prediction of a ligand binding mode for the allosteric binding site in mGluR5. GPCRs are a super-family of structurally related proteins although their primary amino acid sequence can be diverse. Using sequence information a conservation analysis of family C GPCRs should be applied to reveal characteristic differences and similarities with respect function, folding and ligand binding. Using experimental data and conservation analysis the allosteric binding site of mGluR5 should be characterized regarding NAM and PAM and selective ligand binding. For further evaluation experimental knowledge about family A GPCRs as well as conservation between vertebrate rhodopsins was planned to be compared to results obtained for family C GPCRs (Section 4.1 Conservation analysis of family C GPCRs). Since no receptor structure is available for any family C GPCR, discussion of conserved sequence positions between family A and C GPCRs requires the prediction of a receptor structure for mGluR5 using a family A receptor as template. In order to predict the mGluR5 structure a sequence alignment to a GPCR template protein will have to be proposed and GPCR specific features considered in structure calculation (Section 4.1.4 Structure prediction of mGluR5). The obtained structure was intended to be involved in ligand binding mode prediction of newly discovered active molecules. For discovery of novel selective mGluR modulators several ligand-based virtual screening protocols were adapted and evaluated. Prediction models were derived for selection of possibly active molecules using a diverse collection of known mGluR binding ligands. For that purpose a data collection of known mGluR binding ligands should be established and this reference collection analyzed with respect to different ligand activity classes, NAM or PAM and selective modulators. The prediction of novel NAMs and PAMs using several combinations of 2D-, 3D-, pharmacophore or molecule shape encoding methods with machine learning techniques and similarity determining methods should be tested in a prospective manner (Section 4.2 Virtual screening for novel mGluR modulators). In collaboration with Merz Pharmaceuticals (Merz GmbH & Co. KGaA, Frankfurt am Main, Germany) the modulating effect of a few hundred molecules should be approved in a functional cell-based assay. With the objective to predict a binding mode of the discovered active molecules, molecule docking should be applied using the allosteric binding site of the modeled mGluR5 structure (Section 4.2.4 Modeling of binding modes). Predicted ligand binding modes are to be correlated to conservation profiles that had resulted from the sequence-based entropy analysis and information from mutation experiments, and shall be compared to known ligand binding poses from crystal structures of family A GPCRs.Im Rahmen dieser Arbeit wurden Konzepte zur Aufklärung struktureller und funktioneller Eigenschaften von G-Protein gekoppelten Rezeptoren (GPCR) der Familie C entwickelt und angewendet. Mit unterschiedlichen Methodiken der Bio- und Chemieinformatik orientiert an experimentellen Ergebnissen wurden Fragestellungen bezüglich des Funktionsmechanismus von GPCRs untersucht. In Verlauf wurde anhand verfügbarer experimenteller Daten aus Mutations- und Ligandenbindungsstudien ein Vergleich konservierter Bereiche der Rezeptor-Familien A und C angefertigt. Die Konserviertheitsanalyse stützte sich auf die Berechnung der Shannon-Entropie und wurde für ein multiples Sequenzalignment von Transmembrandomänen unterschiedlicher 96 Familie C GPCRs ermittelt. Konservierte Bereiche wurden mit Hilfe experimenteller Daten interpretiert und insbesondere zur Definition von Regionen in der allosterischen Bindetasche hinsichtlich Selektivität verwendet. Mit dem Ziel, neue selektive allosterische Modulatoren für den metabotropen Glutamatrezeptor des Typs fünf (mGluR5) zu finden, wurden mehrere Liganden-basierte Ansätze zur virtuellen Vorhersage der Aktivität von Molekülen entwickelt und getestet. Die dabei angewendete Strategie basierte auf der Kenntnis bereits bekannter Liganden, deren Strukturen und Aktivitätswerte für das Erstellen von Vorhersagemodelle genutzt werden konnten. Die prospektive Vorhersage stützte sich auf unterschiedliche Methoden zur Ähnlichkeitsberechnung und Arten der Molekülkodierung. Die Testung der Moleküle erfolgte hinsichtlich ihrer modulatorischen Wirkung am mGluR5. Die Art der Messung erfasste die Änderungen des Ca2+-Levels in der Zelle. mGluR5-bindende Modulatoren wurden zur Selektivitätsbestimmung einer Testung am mGluR1 unterzogen. Insgesamt konnten 8 von 228 getesteten Molekülen im Aktivitätsbereich unter 10&#956;M ermittelt werden, darunter befand sich ein positiver allosterischer Modulator. Von den restlichen sieben negativen Modulatoren (NAM) waren fünf selektiv für mGluR5. Alle identifizierten NAMs wurden mittels molekularem Dockings auf mögliche Interaktion mit der Transmembrandomäne von mGluR5 untersucht. Die Bindungshypothese entsprach einer Überlagerung der gefundenen Moleküle und ihrer möglicher Interaktionspunkte. Exemplarisch am mGluR5 konnte somit die Eignung einer modellierten GPCR-Struktur für eine Hypothesengenerierung bezüglich Ligandenbindung und struktureller Zusammenhänge untersucht werden

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

    Get PDF
    Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class
    • …
    corecore