5,722 research outputs found

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Prediction of protein-protein interaction types using association rule based classification

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund - Copyright @ 2009 Park et alBackground: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches. Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content. Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/SHP was supported by the Korea Research Foundation Grant funded by the Korean Government(KRF-2005-214-E00050). JAR has been supported by the Programme Alβan, the European Union Programme of High level Scholarships for Latin America, scholarship E04D034854CL. SK was supported by Soongsil University Research Fund

    Scalable Neural Network Decoders for Higher Dimensional Quantum Codes

    Get PDF
    Machine learning has the potential to become an important tool in quantum error correction as it allows the decoder to adapt to the error distribution of a quantum chip. An additional motivation for using neural networks is the fact that they can be evaluated by dedicated hardware which is very fast and consumes little power. Machine learning has been previously applied to decode the surface code. However, these approaches are not scalable as the training has to be redone for every system size which becomes increasingly difficult. In this work the existence of local decoders for higher dimensional codes leads us to use a low-depth convolutional neural network to locally assign a likelihood of error on each qubit. For noiseless syndrome measurements, numerical simulations show that the decoder has a threshold of around 7.1%7.1\% when applied to the 4D toric code. When the syndrome measurements are noisy, the decoder performs better for larger code sizes when the error probability is low. We also give theoretical and numerical analysis to show how a convolutional neural network is different from the 1-nearest neighbor algorithm, which is a baseline machine learning method

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Urban Image Classification: Per-Pixel Classifiers, Sub-Pixel Analysis, Object-Based Image Analysis, and Geospatial Methods

    Get PDF
    Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification steps. Within this chapter, each of the four approaches is described in terms of scale and accuracy classifying urban land use and urban land cover; and for its range of urban applications. We demonstrate the overview of four main classification groups in Figure 1 while Table 1 details the approaches with respect to classification requirements and procedures (e.g., reflectance conversion, steps before training sample selection, training samples, spatial approaches commonly used, classifiers, primary inputs for classification, output structures, number of output layers, and accuracy assessment). The chapter concludes with a brief summary of the methods reviewed and the challenges that remain in developing new classification methods for improving the efficiency and accuracy of mapping urban areas

    Triboinformatic Approaches for Surface Characterization: Tribological and Wetting Properties

    Get PDF
    Tribology is the study of surface roughness, adhesion, friction, wear, and lubrication of interacting solid surfaces in relative motion. In addition, wetting properties are very important for surface characterization. The combination of Tribology with Machine Learning (ML) and other data-centric methods is often called Triboinformatics. In this dissertation, triboinformatic methods are applied to the study of Aluminum (Al) composites, antimicrobial, and water-repellent metallic surfaces, and organic coatings.Al and its alloys are often preferred materials for aerospace and automotive applications due to their lightweight, high strength, corrosion resistance, and other desired material properties. However, Al exhibits high friction and wear rates along with a tendency to seize under dry sliding or poor lubricating conditions. Graphite and graphene particle-reinforced Al metal matrix composites (MMCs) exhibit self-lubricating properties and they can be potential alternatives for Al alloys in dry or starved lubrication conditions. In this dissertation, artificial neural network (ANN), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), gradient boosting machine (GBM), and hybrid ensemble algorithm-based ML models have been developed to correlate the dry friction and wear of aluminum alloys, Al-graphite, and Al-graphene MMCs with material properties, the composition of alloys and MMCs, and tribological parameters. ML analysis reveals that the hardness, sliding distance, and tensile strength of the alloys influences the COF most significantly. On the other hand, the normal load, sliding speed, and hardness were the most influential parameters in predicting wear rate. The graphite content is the most significant parameter for friction and wear prediction in Al-graphite MMCs. For Al-graphene MMCs, the normal load, graphene content, and hardness are identified as the most influential parameters for COF prediction, while the graphene content, load, and hardness have the greatest influence on the wear rate. The ANN, KNN, SVM, RF, and GBM, as well as hybrid regression models (RF-GBM), with the principal component analysis (PCA) descriptors for COF and wear rate were also developed for Al-graphite MMCs in liquid-lubricated conditions. The hybrid RF-GBM models have exhibited the best predictive performance for COF and wear rate. Lubrication condition, lubricant viscosity, and applied load are identified as the most important variables for predicting wear rate and COF, and the transition from dry to lubricated friction and wear is studied. The micro- and nanoscale roughness of zinc (Zn) oxide-coated stainless steel and sonochemically treated brass (Cu Zn alloy) samples are studied using the atomic force microscopy (AFM) images to obtain the roughness parameters (standard deviation of the profile height, correlation length, the extreme point location, persistence diagrams, and barcodes). A new method of the calculation of roughness parameters involving correlation lengths, extremum point distribution, persistence diagrams, and barcodes are developed for studying the roughness patterns and anisotropic distributions inherent in coated surfaces. The analysis of the 3Ă—3, 4Ă—4, and 5Ă—5 sub-matrices or patches has revealed the anisotropic nature of the roughness profile at the nanoscale. The scale dependency of the roughness features is explained by the persistence diagrams and barcodes. Solid surfaces with water-repellent, antimicrobial, and anticorrosive properties are desired for many practical applications. TiO2/ZnO phosphate and Polymethyl Hydrogen Siloxane (PMHS) based 2-layer antimicrobial and anticorrosive coatings are synthesized and applied to steel, ceramic, and concrete substrates. Surfaces with these coatings possess complex topographies and roughness patterns, which cannot be characterized completely by the traditional analysis. Correlations between surface roughness, coefficient of friction (COF), and water contact angle for these surfaces are obtained. The hydrophobic modification in anticorrosive coatings does not make the coated surfaces slippery and retained adequate friction for transportation application. The dissertation demonstrates that Triboinformatic approaches can be successfully implemented in surface science, and tribology and they can generate novel insights into structure-property relationships in various classes of materials

    Hierarchical Visualization of Materials Space with Graph Convolutional Neural Networks

    Full text link
    The combination of high throughput computation and machine learning has led to a new paradigm in materials design by allowing for the direct screening of vast portions of structural, chemical, and property space. The use of these powerful techniques leads to the generation of enormous amounts of data, which in turn calls for new techniques to efficiently explore and visualize the materials space to help identify underlying patterns. In this work, we develop a unified framework to hierarchically visualize the compositional and structural similarities between materials in an arbitrary material space with representations learned from different layers of graph convolutional neural networks. We demonstrate the potential for such a visualization approach by showing that patterns emerge automatically that reflect similarities at different scales in three representative classes of materials: perovskites, elemental boron, and general inorganic crystals, covering material spaces of different compositions, structures, and both. For perovskites, elemental similarities are learned that reflects multiple aspects of atom properties. For elemental boron, structural motifs emerge automatically showing characteristic boron local environments. For inorganic crystals, the similarity and stability of local coordination environments are shown combining different center and neighbor atoms. The method could help transition to a data-centered exploration of materials space in automated materials design.Comment: 22 + 7 pages, 6 + 5 figure

    A Genetic Algorithm Approach for Technology Characterization

    Get PDF
    It is important for engineers to understand the capabilities and limitations of the technologies they consider for use in their systems. Several researchers have investigated approaches for modeling the capabilities of a technology with the aim of supporting the design process. In these works, the information about the physical form is typically abstracted away. However, the efficient generation of an accurate model of technical capabilities remains a challenge. Pareto frontier based methods are often used but yield results that are of limited use for subsequent decision making and analysis. Models based on parameterized Pareto frontiers—termed Technology Characterization Models (TCMs)—are much more reusable and composable. However, there exists no efficient technique for modeling the parameterized Pareto frontier. The contribution of this thesis is a new algorithm for modeling the parameterized Pareto frontier to be used as a model of the characteristics of a technology. The novelty of the algorithm lies in a new concept termed predicted dominance. The proposed algorithm uses fundamental concepts from multi-objective optimization and machine learning to generate a model of the technology frontier
    • …
    corecore