11 research outputs found
Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns
Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A range of different chemical problems and respective rationalization, that have hitherto been inaccessible due to the lack of suitable analysis tools, is thus detailed, evidencing the breadth of potential applications of these emerging multidimensional approaches. Focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows (i) prompting the ability to understand the complexity of chemical data, (ii) streamlining and designing experiments, (ii) discovering new molecular targets and materials, and also (iv) planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly
Computational Approaches in Theranostics: Mining and Predicting Cancer Data
The ability to understand the complexity of cancer-related data has been prompted by the applications of (1) computer and data sciences, including data mining, predictive analytics, machine learning, and artificial intelligence, and (2) advances in imaging technology and probe development. Computational modelling and simulation are systematic and cost-effective tools able to identify important temporal/spatial patterns (and relationships), characterize distinct molecular features of cancer states, and address other relevant aspects, including tumor detection and heterogeneity, progression and metastasis, and drug resistance. These approaches have provided invaluable insights for improving the experimental design of therapeutic delivery systems and for increasing the translational value of the results obtained from early and preclinical studies. The big question is: Could cancer theranostics be determined and controlled in silico? This review describes the recent progress in the development of computational models and methods used to facilitate research on the molecular basis of cancer and on the respective diagnosis and optimized treatment, with particular emphasis on the design and optimization of theranostic systems. The current role of computational approaches is providing innovative, incremental, and complementary data-driven solutions for the prediction, simplification, and characterization of cancer and intrinsic mechanisms, and to promote new data-intensive, accurate diagnostics and therapeutics
Is standard multivariate analysis sufficient in clinical and epidemiological studies?
Clinical tests and epidemiological studies often produce large amounts of data, being multivariate in nature. The respective analysis is, in most cases, of importance comparable to the clinical and sampling tasks. Simple, easily interpretable techniques from chemometrics provide most of the ingredients to carry out this analysis. We have selected available data from different sources pertaining to cancer diagnosis and incidence: (1) cytological diagnosis of breast cancer, (2) classification of breast tissues through parameters obtained from impedance spectra and (3) distribution of new cancer cases in the United States. Hierarchical cluster analysis (HCA) is needed especially in cases where there is no a priori identification of classes, suggesting a structure of the data based on clusters. These clusters or the classes, are then further detailed and rationalized by principal component analysis (PCA). Partial least squares (PLS) and linear discriminant analysis (LDA) provide further insight into the systems. An additional step for understanding the data set is the removal of less characteristic data (NR) using a density-based approach, so as to make it more clearly defined. Results clearly reveal that breast cytology diagnosis relies on variables conveying mostly the same type of information, being thus interchangeable in nature. In the study on tissue characterization by electrical measurements, the distribution of the different types of tissues can be easily constructed. Finally, the distribution of new cancer cases possesses clear, easily unravelled, geographical patterns
A new perspective on correlated polyelectrolyte adsorption: positioning, conformation, and patterns
This work focuses on multiple chain deposition, using a coarse-grained model. The phenomenon is assessed from a novel perspective which emphasizes the conformation and relative arrangement of the deposited chains. Variations in chain number and length are considered, and the surface charge in the different systems ranges from partially neutralized to reversed by backbone deposition. New tools are proposed for the analysis of these systems, in which focus is given to configuration-wise approaches that allow the interpretation of correlated multi-chain behavior. It is seen that adsorption occurs, with a minimal effect upon the bulk conformation, even when overcharging occurs. Also, chain ends create a lower electrostatic potential, which makes them both the least adsorbed region of the backbone, and the prevalent site of closer proximity with other chains. Additionally, adsorption into the most favorable region of the surface overrides, to a large degree, interchain repulsion
Reconstructing the historical synthesis of mauveine from Perkin and Caro: procedure and details
Mauveine, an iconic dye, first synthesised in 1856 still has secrets to unveil. If nowadays one wanted to prepare the original Perkin's mauveine, what would be the procedure? It will be described in this work and lies on the use of a 1:2:1 (mole) ratio of aniline, p-toluidine and o-toluidine. This was found from a comparison of a series of products synthesized from different proportions of these starting materials, with a set of historical samples of mauveine and further analysed with two unsupervised chemometrics methods
Improving discrimination in the grading of rat mammary tumors using two-dimensional mapping of histopathological observations
This work aims at characterizing rat mammary tumors induced by 7,12-dimethylbenz(a)anthracene (DMBA) and the respective malignancy potential, commonly graded with histopathology features grouped by intensity levels. Tumors were described over fourteen multiple ranged microscopic parameters and a comprehensive characterization of the histological patterns and their relation with tumor grade was carried out by principal component analysis (PCA). The number of histological patterns present on a tumor tends to correlate with malignant features. High grade tumors are characterized by the presence of several structural patterns, with cribriform prevalence and necrosis. The cribriform pattern correlates with grading, i.e., tumors having a higher predominance of the cribriform pattern are likely to be more malignant. The findings may represent a benchmark for similar characterization studies in other models
Exploring PAZ/3′-overhang interaction to improve siRNA specificity. A combined experimental and modeling study
The understanding of the dynamical and mechanistic aspects that lie behind siRNA-based gene regulation is a requisite to boost the performance of siRNA therapeutics. A systematic experimental and computational study on the 3′-overhang structural requirements for the design of more specific and potent siRNA molecules was carried out using nucleotide analogues differing in structural parameters, such as sugar constraint, lack of nucleobase, distance between the phosphodiester backbone and nucleobase, enantioselectivity, and steric hindrance. The results established a set of rules governing the siRNA-mediated silencing, indicating that the thermodynamic stability of the 5′-end is a crucial determinant for antisense-mediated silencing but is not sufficient to avoid sense-mediated silencing. Both theoretical and experimental approaches consistently evidence the existence of a direct connection between the PAZ/3′-overhang binding affinity and siRNA’s potency and specificity. An overall description of the systems is thus achieved by atomistic simulations and free energy calculations that allow us to propose a robust and self-contained procedure for studying the factors implied in PAZ/3′-overhang siRNA interactions. A higher RNAi activity is associated with a moderate-to-strong PAZ/3′-overhang binding. Contrarily, lower binding energies compromise siRNA potency, increase specificity, and favor siRNA downregulation by Ago2-independent mechanisms. This work provides in-depth details for the design of powerful and safe synthetic nucleotide analogues for substitution at the 3′-overhang, enabling some of the intrinsic siRNA disadvantages to be overcome.We thank Dr J. C. Morales, R. Lucas, P. Peñalver, and M. Terrazas for fruitful discussions and the synthesis of the thymine glycol phosphoramidite. The financial support by the Spanish Ministerio de Ciencia e Innovación (MICINN) (Projects CTQ2014-52588-R CTQ2017-84415-R and CTQ2016-78636-P) and Generalitat de Catalunya is gratefully acknowledged. CIBER-BBN is an initiative funded by the VI National R + D + I Plan 2008–2011, Iniciativa Ingenio 2010, Consolider Program, CIBER Actions and financed by the Instituto de Salud Carlos III with assistance from the European Regional Development Fund. A. F. J. and T. F. G. G. C. acknowledge Fundação para a Ciência e Tecnologia (FCT), Portugal, for financial support regarding the Post-Doctoral grant SFRH/BPD/104544/2014 and the PhD grant SFRH/BD/95459/2013. CQC is supported by FCT through projects PEst-OE/QUI/UI0313/2014 and POCI-01-0145-FEDER-007630. The authors acknowledge the Laboratory for Advanced Computing at the University of Coimbra for providing {HPC, computing, consulting} resources that have contributed to the research results reported within this paper (URL http://www.lca.uc.pt).Peer reviewe