5,722 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Prediction of protein-protein interaction types using association rule based classification
This article has been made available through the Brunel Open Access Publishing Fund - Copyright @ 2009 Park et alBackground: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches. Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content. Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/SHP was supported by the Korea Research Foundation Grant funded by the Korean Government(KRF-2005-214-E00050). JAR has been
supported by the Programme Alβan, the European Union Programme of High level Scholarships for Latin America, scholarship E04D034854CL. SK was supported by Soongsil University Research Fund
Scalable Neural Network Decoders for Higher Dimensional Quantum Codes
Machine learning has the potential to become an important tool in quantum
error correction as it allows the decoder to adapt to the error distribution of
a quantum chip. An additional motivation for using neural networks is the fact
that they can be evaluated by dedicated hardware which is very fast and
consumes little power. Machine learning has been previously applied to decode
the surface code. However, these approaches are not scalable as the training
has to be redone for every system size which becomes increasingly difficult. In
this work the existence of local decoders for higher dimensional codes leads us
to use a low-depth convolutional neural network to locally assign a likelihood
of error on each qubit. For noiseless syndrome measurements, numerical
simulations show that the decoder has a threshold of around when
applied to the 4D toric code. When the syndrome measurements are noisy, the
decoder performs better for larger code sizes when the error probability is
low. We also give theoretical and numerical analysis to show how a
convolutional neural network is different from the 1-nearest neighbor
algorithm, which is a baseline machine learning method
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Urban Image Classification: Per-Pixel Classifiers, Sub-Pixel Analysis, Object-Based Image Analysis, and Geospatial Methods
Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification steps. Within this chapter, each of the four approaches is described in terms of scale and accuracy classifying urban land use and urban land cover; and for its range of urban applications. We demonstrate the overview of four main classification groups in Figure 1 while Table 1 details the approaches with respect to classification requirements and procedures (e.g., reflectance conversion, steps before training sample selection, training samples, spatial approaches commonly used, classifiers, primary inputs for classification, output structures, number of output layers, and accuracy assessment). The chapter concludes with a brief summary of the methods reviewed and the challenges that remain in developing new classification methods for improving the efficiency and accuracy of mapping urban areas
Triboinformatic Approaches for Surface Characterization: Tribological and Wetting Properties
Tribology is the study of surface roughness, adhesion, friction, wear, and lubrication of interacting solid surfaces in relative motion. In addition, wetting properties are very important for surface characterization. The combination of Tribology with Machine Learning (ML) and other data-centric methods is often called Triboinformatics. In this dissertation, triboinformatic methods are applied to the study of Aluminum (Al) composites, antimicrobial, and water-repellent metallic surfaces, and organic coatings.Al and its alloys are often preferred materials for aerospace and automotive applications due to their lightweight, high strength, corrosion resistance, and other desired material properties. However, Al exhibits high friction and wear rates along with a tendency to seize under dry sliding or poor lubricating conditions. Graphite and graphene particle-reinforced Al metal matrix composites (MMCs) exhibit self-lubricating properties and they can be potential alternatives for Al alloys in dry or starved lubrication conditions. In this dissertation, artificial neural network (ANN), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), gradient boosting machine (GBM), and hybrid ensemble algorithm-based ML models have been developed to correlate the dry friction and wear of aluminum alloys, Al-graphite, and Al-graphene MMCs with material properties, the composition of alloys and MMCs, and tribological parameters. ML analysis reveals that the hardness, sliding distance, and tensile strength of the alloys influences the COF most significantly. On the other hand, the normal load, sliding speed, and hardness were the most influential parameters in predicting wear rate. The graphite content is the most significant parameter for friction and wear prediction in Al-graphite MMCs. For Al-graphene MMCs, the normal load, graphene content, and hardness are identified as the most influential parameters for COF prediction, while the graphene content, load, and hardness have the greatest influence on the wear rate. The ANN, KNN, SVM, RF, and GBM, as well as hybrid regression models (RF-GBM), with the principal component analysis (PCA) descriptors for COF and wear rate were also developed for Al-graphite MMCs in liquid-lubricated conditions. The hybrid RF-GBM models have exhibited the best predictive performance for COF and wear rate. Lubrication condition, lubricant viscosity, and applied load are identified as the most important variables for predicting wear rate and COF, and the transition from dry to lubricated friction and wear is studied. The micro- and nanoscale roughness of zinc (Zn) oxide-coated stainless steel and sonochemically treated brass (Cu Zn alloy) samples are studied using the atomic force microscopy (AFM) images to obtain the roughness parameters (standard deviation of the profile height, correlation length, the extreme point location, persistence diagrams, and barcodes). A new method of the calculation of roughness parameters involving correlation lengths, extremum point distribution, persistence diagrams, and barcodes are developed for studying the roughness patterns and anisotropic distributions inherent in coated surfaces. The analysis of the 3Ă—3, 4Ă—4, and 5Ă—5 sub-matrices or patches has revealed the anisotropic nature of the roughness profile at the nanoscale. The scale dependency of the roughness features is explained by the persistence diagrams and barcodes. Solid surfaces with water-repellent, antimicrobial, and anticorrosive properties are desired for many practical applications. TiO2/ZnO phosphate and Polymethyl Hydrogen Siloxane (PMHS) based 2-layer antimicrobial and anticorrosive coatings are synthesized and applied to steel, ceramic, and concrete substrates. Surfaces with these coatings possess complex topographies and roughness patterns, which cannot be characterized completely by the traditional analysis. Correlations between surface roughness, coefficient of friction (COF), and water contact angle for these surfaces are obtained. The hydrophobic modification in anticorrosive coatings does not make the coated surfaces slippery and retained adequate friction for transportation application. The dissertation demonstrates that Triboinformatic approaches can be successfully implemented in surface science, and tribology and they can generate novel insights into structure-property relationships in various classes of materials
Hierarchical Visualization of Materials Space with Graph Convolutional Neural Networks
The combination of high throughput computation and machine learning has led
to a new paradigm in materials design by allowing for the direct screening of
vast portions of structural, chemical, and property space. The use of these
powerful techniques leads to the generation of enormous amounts of data, which
in turn calls for new techniques to efficiently explore and visualize the
materials space to help identify underlying patterns. In this work, we develop
a unified framework to hierarchically visualize the compositional and
structural similarities between materials in an arbitrary material space with
representations learned from different layers of graph convolutional neural
networks. We demonstrate the potential for such a visualization approach by
showing that patterns emerge automatically that reflect similarities at
different scales in three representative classes of materials: perovskites,
elemental boron, and general inorganic crystals, covering material spaces of
different compositions, structures, and both. For perovskites, elemental
similarities are learned that reflects multiple aspects of atom properties. For
elemental boron, structural motifs emerge automatically showing characteristic
boron local environments. For inorganic crystals, the similarity and stability
of local coordination environments are shown combining different center and
neighbor atoms. The method could help transition to a data-centered exploration
of materials space in automated materials design.Comment: 22 + 7 pages, 6 + 5 figure
A Genetic Algorithm Approach for Technology Characterization
It is important for engineers to understand the capabilities and limitations of the technologies they consider for use in their systems. Several researchers have investigated approaches for modeling the capabilities of a technology with the aim of supporting the design process. In these works, the information about the physical form is typically abstracted away. However, the efficient generation of an accurate model of technical capabilities remains a challenge. Pareto frontier based methods are often used but yield results that are of limited use for subsequent decision making and analysis. Models based on parameterized Pareto frontiers—termed Technology Characterization Models (TCMs)—are much more reusable and composable. However, there exists no efficient technique for modeling the parameterized Pareto frontier. The contribution of this thesis is a new algorithm for modeling the parameterized Pareto frontier to be used as a model of the characteristics of a technology. The novelty of the algorithm lies in a new concept termed predicted dominance. The proposed algorithm uses fundamental concepts from multi-objective optimization and machine learning to generate a model of the technology frontier
- …