2 research outputs found

    Automated Parsing of Flexible Molecular Systems using Principal Component Analysis and K-Means Clustering Techniques

    Get PDF
    Computational investigation of molecular structures and reactions of biological and pharmaceutical interests remains a grand scientific challenge due to the size and conformational flexibility of these systems. The work requires parsing and analyzing thousands of conformations in each molecular state for meaningful chemical information and subjecting the ensemble to costly quantum chemical calculations. The current status quo typically involves a manual process where the investigator must look at each conformation, separating each into structural families. This process is time-intensive and tedious, making this process infeasible in some cases, and limiting the ability of theoreticians to study these systems. However, the use of computational software allows for the necessary exhaustive investigation without the bottlenecks of a brute force approach to each flexible system. I aim to create the solution to this problem. In my thesis project, I seek to develop a Python software that will (i) automate the parsing of each conformation within a conformational ensemble, (ii) use principal component analysis (PCA) and clustering to find and investigate conformational families within the ensemble, (iii) separate and visualize conformational families in a user-friendly manner, and (iv) convey to the user how conformational families were delineated by way of features found within data. Results explored this work show that the program has the ability to separate conformational families with varying ranges of difficulty
    corecore