27 research outputs found

    Discrete Fourier Transform Improves the Prediction of the Electronic Properties of Molecules in Quantum Machine Learning

    Full text link
    High-throughput approximations of quantum mechanics calculations and combinatorial experiments have been traditionally used to reduce the search space of possible molecules, drugs and materials. However, the interplay of structural and chemical degrees of freedom introduces enormous complexity, which the current state-of-the-art tools are not yet designed to handle. The availability of large molecular databases generated by quantum mechanics (QM) computations using first principles open new venues for data science to accelerate the discovery of new compounds. In recent years, models that combine QM with machine learning (ML) known as QM/ML models have been successful at delivering the accuracy of QM at the speed of ML. The goals are to develop a framework that will accelerate the extraction of knowledge and to get insights from quantitative process-structure-property-performance relationships hidden in materials data via a better search of the chemical compound space, and to infer new materials with targeted properties. In this study, we show that by integrating well-known signal processing techniques such as discrete Fourier transform in the QM/ML pipeline, the outcomes can be significantly improved in some cases. We also show that the spectrogram of a molecule may represent an interesting molecular visualization tool.Comment: 4 pages, 3 figures, 2 tables. Accepted to present at 32nd IEEE Canadian Conference in Electrical Engineering and Computer Scienc

    An evolutionary variational autoencoder for perovskite discovery

    Get PDF
    Machine learning (ML) techniques emerged as viable means for novel materials discovery and target property determination. At the vanguard of discoverable energy materials are perovskite crystalline materials, which are known for their robust design space and multifunctionality. Previous efforts for simulating the discovery of novel perovskites via ML have often been limited to straightforward tabular-dataset models and compositional phase-field representations. Therefore, the present study makes a contribution in expanding ML capability by demonstrating the efficacy of a new deep evolutionary learning framework for discovering stable and functional inorganic materials that adopts the complex A2BB′X6 and AA′BB′X6 double perovskite stoichiometries. The model design is called the Evolutionary Variational Autoencoder for Perovskite Discovery (EVAPD), which is comprised of a semi-supervised variational autoencoder (SS-VAE), an evolutionary-based genetic algorithm, and a one-to-one similarity analytical model. The genetic algorithm performs adaptive metaheuristic search operations for finding the most theoretically stable candidates emerging from a target-learnable latent space of the generative SS-VAE model. The integrated similarity analytical model assesses the deviation in three-dimensional atomic coordination between newly generated perovskites and proven standards, and as such, recommends the most promising and experimentally feasible candidates. Using Density Functional Theory (DFT), the novel perovskites are subjected to thorough variable-cell optimization and property determination. The current study presents 137 new perovskite materials generated by the proposed EVAPD model and identifies potential candidates for photovoltaic and optoelectronic applications. The new materials data are archived at NOMAD repository (doi.org/10.17172/NOMAD/2023.05.31-1) and are made openly available to interested users

    Extracting biologically significant patterns from short time series gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.</p> <p>Results</p> <p>We developed two new algorithms that are capable of extracting biological patterns from short time point series gene expression data. The two algorithms, <it>ASTRO </it>and <it>MiMeSR</it>, are inspired by the <it>rank order preserving </it>framework and the <it>minimum mean squared residue </it>approach, respectively. However, <it>ASTRO </it>and <it>MiMeSR </it>differ from previous approaches in that they take advantage of the relatively few number of time points in order to reduce the problem from NP-hard to linear. Tested on well-defined short time expression data, we found that our approaches are robust to noise, as well as to random patterns, and that they can correctly detect the temporal expression profile of relevant functional categories. Evaluation of our methods was performed using Gene Ontology (GO) annotations and chromatin immunoprecipitation (ChIP-chip) data.</p> <p>Conclusion</p> <p>Our approaches generally outperform both standard clustering algorithms and algorithms designed specifically for clustering of short time series gene expression data. Both algorithms are available at <url>http://www.benoslab.pitt.edu/astro/</url>.</p

    Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.</p> <p>Results</p> <p>We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (<it>Plasmodium chabaudi</it>), systemic acquired resistance in <it>Arabidopsis thaliana</it>, similarities and differences between inner and outer cotyledon in <it>Brassica napus </it>during seed development, and to <it>Brassica napus </it>whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.</p> <p>Conclusions</p> <p>Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.</p

    V2B/V2G on Energy Cost and Battery Degradation under Different Driving Scenarios, Peak Shaving, and Frequency Regulations

    No full text
    The energy stored in electric vehicles (EVs) would be made available to commercial buildings to actively manage energy consumption and costs in the near future. These concepts known as vehicle-to-building (V2B) and vehicle-to-grid (V2G) technologies have the potential to provide storage capacity to benefit both EV and building owners respectively, by reducing some of the high cost of EVs, buildings&rsquo; energy cost, and providing reliable emergency backup services. In this study, we considered a vehicle-to-buildings/grid (V2B/V2G) system simultaneously for peak shaving and frequency regulation via a combined multi-objective optimization strategy which captures battery state of charge (SoC), EV battery degradation, EV driving scenarios, and operational constraints. Under these assumptions, we showed that the electricity usage/bill can be reduced by a difference of 0.1 on a scale of 0 to 1 (with 1 the normalized original electricity cost), and that EV batteries can also achieve superior economic benefits under controlled SoC limits (e.g., when kept between the SoC range of SoCmin &gt; 30% and SoCmax &lt; 90%) and subjected to very restricted charge-discharge battery cycling

    DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

    No full text
    Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.</p
    corecore