1,420 research outputs found

    Insightful classification of crystal structures using deep learning

    Full text link
    Computational methods that automatically extract knowledge from data are critical for enabling data-driven materials science. A reliable identification of lattice symmetry is a crucial first step for materials characterization and analytics. Current methods require a user-specified threshold, and are unable to detect average symmetries for defective structures. Here, we propose a machine-learning-based approach to automatically classify structures by crystal symmetry. First, we represent crystals by calculating a diffraction image, then construct a deep-learning neural-network model for classification. Our approach is able to correctly classify a dataset comprising more than 100 000 simulated crystal structures, including heavily defective ones. The internal operations of the neural network are unraveled through attentive response maps, demonstrating that it uses the same landmarks a materials scientist would use, although never explicitly instructed to do so. Our study paves the way for crystal-structure recognition of - possibly noisy and incomplete - three-dimensional structural data in big-data materials science.Comment: Nature Communications, in press (2018

    Machine learning approaches for tomato crop yield prediction in precision agriculture

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe objective of this project was to apply ML techniques to predict processing tomato crop yield given information on soil properties, weather conditions, and applied fertilizers. Besides being robust enough for predicting tomato productivity, the model needed to be interpretable and transparent for the business. The models assessed were Decision Trees Regression, ensemble bagging models like Random Forest Regression, and boosting techniques like Gradient Boosting Regression, and Support Vector Regression. Overall, Gradient Boosting and Support Vector models presented the best performance. For improving the predictive power, we combined the predictions of our two best models into a stacked approach with a Ridge Regression as the final model. The generalization error of the final chosen model on new data was 9.02 ton/ha for the MAE metric, 9.5% for the MAPE, and 13.5 ton/ha for the RMSE. This means that our model can predict tomato crop yield with an approximate error of 9 ton/ha. Even though our final model was complex and not intrinsically interpretable, we were able to apply model-agnostic interpretation methods like the SHAP summary plot to better understand the feature importance and feature effects, and the Accumulated Local Effects (ALE) plot, to explain how features influence the outcome of the model on average. In general, the objectives of the project were accomplished and the company was satisfied with the result of the model and its interpretation

    In-situ surface porosity prediction in DED (directed energy deposition) printed SS316L parts using multimodal sensor fusion

    Full text link
    This study aims to relate the time-frequency patterns of acoustic emission (AE) and other multi-modal sensor data collected in a hybrid directed energy deposition (DED) process to the pore formations at high spatial (0.5 mm) and time (< 1ms) resolutions. Adapting an explainable AI method in LIME (Local Interpretable Model-Agnostic Explanations), certain high-frequency waveform signatures of AE are to be attributed to two major pathways for pore formation in a DED process, namely, spatter events and insufficient fusion between adjacent printing tracks from low heat input. This approach opens an exciting possibility to predict, in real-time, the presence of a pore in every voxel (0.5 mm in size) as they are printed, a major leap forward compared to prior efforts. Synchronized multimodal sensor data including force, AE, vibration and temperature were gathered while an SS316L material sample was printed and subsequently machined. A deep convolution neural network classifier was used to identify the presence of pores on a voxel surface based on time-frequency patterns (spectrograms) of the sensor data collected during the process chain. The results suggest signals collected during DED were more sensitive compared to those from machining for detecting porosity in voxels (classification test accuracy of 87%). The underlying explanations drawn from LIME analysis suggests that energy captured in high frequency AE waveforms are 33% lower for porous voxels indicating a relatively lower laser-material interaction in the melt pool, and hence insufficient fusion and poor overlap between adjacent printing tracks. The porous voxels for which spatter events were prevalent during printing had about 27% higher energy contents in the high frequency AE band compared to other porous voxels. These signatures from AE signal can further the understanding of pore formation from spatter and insufficient fusion

    Obstruction level detection of sewers videos using convolutional neural networks

    Get PDF
    Worldwide, sewer networks are designed to transport wastewater to a centralized treatment plant to be treated and returned to the environment. This is a critical process for preventing waterborne illnesses, providing safe drinking water and enhancing general sanitation in society. To keep a perfectly operational sewer network several inspections are manually performed by a Closed-Circuit Television system to report the obstruction level which may trigger a cleaning operative. In this work, we design a methodology to train a Convolutional Neural Network (CNN) for identifying the level of obstruction in pipes. We gathered a database of videos to generate useful frames to fed into the model. Our resulting classifier obtains deployment ready performances. To validate the consistency of the approach and its industrial applicability, we integrate the Layer-wise Relevance Propagation (LPR) algorithm, which endows a further understanding of the neural network behavior. The proposed system provides higher speed, accuracy, and consistency in the sewer process examination.This work is partially supported by the Consejo Nacional de Ciencia y Tecnologia (CONACYT), Estudiante No. CVU: 630716, by the RIS3CAT Utilities 4.0 SENIX project (COMRDI16-1-0055), cofounded by the European Regional Development Fund (FEDER) under the FEDER Catalonia Operative Programme 2014- 2020. It is also partially supported by the Spanish Government through Programa Severo Ochoa (SEV2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project, and by the Generalitat de Catalunya (contracts 2017-SGR-1414).Peer ReviewedPostprint (published version

    Shelling the Voronoi interface of protein-protein complexes predicts residue activity and conservation

    Get PDF
    The accurate description of protein-protein interfaces remains a challenging task. Traditional criteria, based on atomic contacts or changes in solvent accessibility, tend to over or underpredict the interface itself and cannot discriminate active from less relevant parts. A recent simulation study by Mihalek and co-authors (2007, JMB 369, 584-95) concluded that active residues tend to be `dry&#x27;, that is, insulated from water fluctuations. We show that patterns of `dry&#x27; residues can, to a large extent, be predicted by a fast, parameter-free and purely geometric analysis of protein interfaces. We introduce the shelling order of Voronoi facets as a straightforward quantitative measure of an atom&#x27;s depth inside an interface. We analyze the correlation between Voronoi shelling order, dryness, and conservation on a set of 54 protein-protein complexes. Residues with high shelling order tend to be dry; evolutionary conservation also correlates with dryness and shelling order but, perhaps not surprisingly, is a much less accurate predictor of either property. Voronoi shelling order thus seems a meaningful and efficient descriptor of protein interfaces. Moreover, the strong correlation with dryness suggests that water dynamics within protein interfaces may, in first approximation, be described by simple diffusion models

    BagStack Classification for Data Imbalance Problems with Application to Defect Detection and Labeling in Semiconductor Units

    Get PDF
    abstract: Despite the fact that machine learning supports the development of computer vision applications by shortening the development cycle, finding a general learning algorithm that solves a wide range of applications is still bounded by the ”no free lunch theorem”. The search for the right algorithm to solve a specific problem is driven by the problem itself, the data availability and many other requirements. Automated visual inspection (AVI) systems represent a major part of these challenging computer vision applications. They are gaining growing interest in the manufacturing industry to detect defective products and keep these from reaching customers. The process of defect detection and classification in semiconductor units is challenging due to different acceptable variations that the manufacturing process introduces. Other variations are also typically introduced when using optical inspection systems due to changes in lighting conditions and misalignment of the imaged units, which makes the defect detection process more challenging. In this thesis, a BagStack classification framework is proposed, which makes use of stacking and bagging concepts to handle both variance and bias errors. The classifier is designed to handle the data imbalance and overfitting problems by adaptively transforming the multi-class classification problem into multiple binary classification problems, applying a bagging approach to train a set of base learners for each specific problem, adaptively specifying the number of base learners assigned to each problem, adaptively specifying the number of samples to use from each class, applying a novel data-imbalance aware cross-validation technique to generate the meta-data while taking into account the data imbalance problem at the meta-data level and, finally, using a multi-response random forest regression classifier as a meta-classifier. The BagStack classifier makes use of multiple features to solve the defect classification problem. In order to detect defects, a locally adaptive statistical background modeling is proposed. The proposed BagStack classifier outperforms state-of-the-art image classification techniques on our dataset in terms of overall classification accuracy and average per-class classification accuracy. The proposed detection method achieves high performance on the considered dataset in terms of recall and precision.Dissertation/ThesisDoctoral Dissertation Computer Engineering 201

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF
    corecore