12 research outputs found

    Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction

    Get PDF
    In the fields of pharmaceutical research and biomedical sciences, QSAR modeling is an established approach during drug discovery for prediction of biological activity of drug candidates. Yet, QSAR modeling poses a series of open challenges. First, chemical compounds are represented on a high-dimensional space and thus feature selection is typically applied, although this task entails a challenging combinatorial problem with potential loss of information. Second, the definition of the applicability domain of a QSAR model is a desirable aspect to determine the reliability of predictions on unseen chemicals, which is often difficult to assess due to the extent of the chemical space. Finally, interpretability of these models is also a critical issue for drug designers. The purpose of this work is to thoroughly assess the application of neural-based methods and recent advances deep learning for QSAR modeling. We hypothesize that neural-based methods can overcome the need to perform a descriptor selection phase. We developed three QSAR models based on neural networks for prediction of relevant chemical and biomedical properties that, in the absence of any feature selection step, can outperform the state-of-the-art models for such properties. We also implemented an embedded applicability domain technique based on network output probabilities that proved to be effective; its application improved the predictive performance of the model. Finally, we proposed the use of a post hoc feature analysis technique based on an aggregation of network weights, which enabled effective detection of relevant features in the model.Fil: Sabando, María Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Soto, Axel Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentin

    Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?

    Full text link
    With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for QSAR modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets. We also compared these representations to traditional molecular representations, namely molecular descriptors and fingerprints. As opposed to the expected outcome, our experimental setup consisting of over 25,000 trained models and statistical tests revealed that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations. While supervised embeddings yielded competitive results compared to those using traditional molecular representations, unsupervised embeddings tended to perform worse than traditional representations. Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks, and motivate a discussion about the potential of molecular embeddings in computer-aided drug design

    Comprehensive assessment and its relationship with high school students learning

    Get PDF
    The research aims to analyze comprehensive assessment and its relationship with student learning in a particular educational unit in the city of Portoviejo. The applied methodology has a mixed approach (qualitative and quantitative), an exploratory research was carried out to analyze the problem. The research techniques used were the survey of the teaching staff of the institution and the academic averages of the elementary school students. The results reflected that in the institution the comprehensive evaluation is partially applied, which is reflected in an acceptable academic performance. It was concluded that teachers apply evaluative instruments as they deem necessary, there is no homogeneity between hetero-evaluation, co-evaluation and self-evaluation

    ChemVA: Interactive visual analysis of chemical compound similarity in virtual screening

    Get PDF
    In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessing the impact of individual molecular features on the resulting representations. To provide a solution for scrutinizing such complex data, we introduce ChemVA, an interactive application for the visual exploration of large molecular ensembles and their features. Our tool consists of multiple coordinated views: Hexagonal view, Detail view, 3D view, Table view, and a newly proposed Difference view designed for the comparison of DR projections. These views display DR projections combined with biological activity, selected molecular features, and confidence scores for each of these projections. This conjunction of views allows the user to drill down through the dataset and to efficiently select candidate compounds. Our approach was evaluated on two case studies of finding structurally similar ligands with similar binding affinity to a target protein, as well as on an external qualitative evaluation. The results suggest that our system allows effective visual inspection and comparison of different high-dimensional molecular representations. Furthermore, ChemVA assists in the identification of candidate compounds while providing information on the certainty behind different molecular representations.Fil: Sabando, María Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ulbrich, Pavol. Masaryk University. Faculty of Sciences; República ChecaFil: Selzer, Matias Nicolas. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Laboratorio de Ciencias de la Imágenes; ArgentinaFil: Byska, Jan. Masaryk University. Faculty of Sciences; República ChecaFil: Mican, Jan. Masaryk University. Faculty of Sciences; República ChecaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Soto, Axel Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ganuza, María Luján. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Laboratorio de Ciencias de la Imágenes; ArgentinaFil: Kozlikova, Barbora. Masaryk University. Faculty of Sciences; República Chec

    Implementation of ict in active methodologies for the teaching of mathematics

    No full text
    The objective of the research is to analyze the incidence of information and communication technologies as active methodologies for teaching mathematics, based on the search of information from various contributions that have manifested the importance of technology in times of pandemic. In the educational field, its implementation has generated a positive impact on students, but not on teachers resistant to change, considering themselves a learner and avoiding training on virtual platforms, which allow individual and group interaction. Documentary research was applied to search for information, through a qualitative approach to content analysis and assessment. It was obtained as a result, that the implementation of innovative technologies currently represents necessary tools for the development of the teaching and learning process

    Multitask deep neural networks for ames mutagenicity prediction

    No full text
    29 p.2 fig.-3 tab.-1 graph.abst.+Sup. Inf. 4 p._4 tab.The Ames mutagenicity test constitutes the most frequently used assay to estimate the mutagenic potential of drug candidates. While this test employs experimental results using various strains of Salmonella typhimurium, the vast majority of the published in silico models for predicting mutagenicity do not take into account the test results of the individual experiments conducted for each strain. Instead, such QSAR models are generally trained employing overall labels (i.e., mutagenic and nonmutagenic). Recently, neural-based models combined with multitask learning strategies have yielded interesting results in different domains, given their capabilities to model multitarget functions. In this scenario, we propose a novel neural-based QSAR model to predict mutagenicity that leverages experimental results from different strains involved in the Ames test by means of a multitask learning approach. To the best of our knowledge, the modeling strategy hereby proposed has not been applied to model Ames mutagenicity previously. The results yielded by our model surpass those obtained by single-task modeling strategies, such as models that predict the overall Ames label or ensemble models built from individual strains. For reproducibility and accessibility purposes, all source code and datasets used in our experiments are publicly available.This work was partially supported by the Argentinean National Council of Scientific and Technological Research (CONICET for its acronym in Spanish) [Grant No.PIP 112-2017-0100829], by the National Agency for the Promotion of Research, Technological Development and Innovation of Argentina (AGENCIA I+D+i in Spanish),through the Fund for Scientific and Technological Research (FONCyT for its acronym in Spanish) [Grant No. PICT 2019-03350], by the Universidad Nacional del Sur (UNS), Bahía Blanca, Argentina [Grant No. PGI 24/N042], Ministerio de Economía, Industria y Competitividad, Gobierno de España, under Grant No. RTI2018-096100B-100, and by a Google Latin America Research Award 2021−2022.Peer reviewe

    Multi-Task Deep Neural Networks for Ames Mutagenicity Prediction

    No full text
    The Ames mutagenicity test constitutes the most frequently used assay to estimate the mutagenic potential of drug candidates. While this test employs experimental results using various strains of Salmonella typhimurium, the vast majority of the published in silico models for predicting mutagenicity do not take into account the test results of the individual experiments conducted for each strain. Instead, such QSAR models are generally trained employing overall labels (i.e. mutagenic and non-mutagenic). Recently, neural-based models combined with multi-task learning strategies have yielded interesting results in different domains, given their capabilities to model multi-target functions. In this scenario, we propose a novel neural-based QSAR model to predict mutagenicity that leverages experimental results from different strains involved in the Ames test by means of a multi-task learning approach. To the best of our knowledge, the modeling strategy hereby proposed has not been applied to model Ames mutagenicity previously. The results yielded by our model surpass those obtained by single-task modeling strategies, such as models that predict the overall Ames label or ensemble models built from individual strains. For reproducibility and accessibility purposes, all source code and datasets used in our experiments are publicly available

    Ames Mutagenicity Dataset

    No full text
    El dataset contiene 5536 compuestos moleculares representados por su código SMILES y 1360 descriptores moleculares calculados con Mordred. Además, contiene las respectivas etiquetas para cada compuesto (1: mutagénico / 0: no mutagénico) para cada una de las cinco cepas (TA98, TA100, TA102, TA1535, TA1537) y una etiqueta general (Overall) que corresponde a la etiqueta de consenso utilizada para evaluar la predicción final del test de Ames. Los compuestos fueron compilados originalmente por el Istituto Superiore di Sanita’ (https://www.iss.it/isstox) y son el resultado de una etapa de preprocesamiento exhaustiva, que consta de diferentes pasos de filtrado, limpieza y canonicalización. The dataset contains 5,536 molecular compounds represented by their SMILES code and 1,360 molecular descriptors calculated with Mordred. Moreover, it contains the respective labels for each compound (1: mutagenic / 0: non-mutagenic) for each of the five strains (TA98, TA100, TA102, TA1535, TA1537) and a general label (Overall) that corresponds to the ground-truth consensus label used for evaluating the final Ames mutagenicity prediction. The compounds listed were originally compiled by the Istituto Superiore di Sanita’ (https://www.iss.it/isstox) and result from an exhaustive pre-processing stage, consisting of different filtering, sanitization, and canonicalization steps.Fil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Sabando, María Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Soto, Axel Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Roca Magadán, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas Margarita Salas; EspañaFil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas Margarita Salas; EspañaFil: Campillo Martín, Nuria Eugenia. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas Margarita Salas; EspañaFil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; EspañaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentin

    Spanish HTT gene study reveals haplotype and allelic diversity with possible implications for germline expansion dynamics in Huntington disease

    No full text
    We aimed to determine the genetic diversity and molecular characteristics of the Huntington disease (HD) gene (HTT) in Spain. We performed an extended haplotype and exon one deep sequencing analysis of the HTT gene in a nationwide cohort of population-based controls (n = 520) and families with symptomatic individuals referred for HD genetic testing. This group included 331 HD cases and 140 carriers of intermediate alleles. Clinical and family history data were obtained when available. Spanish normal alleles are enriched in C haplotypes (40.1%), while A1 (39.8%) and A2 (31.6%) prevail among intermediate and expanded alleles, respectively. Alleles ≥50 CAG repeats are primarily associated with haplotypes A2 (38.9%) and C (32%), which are also present in 50% and 21.4%, respectively, of HD families with large intergenerational expansions. Non-canonical variants of exon one sequence are less frequent, but much more diverse, in alleles of ≥27 CAG repeats. The deletion of CAACAG, one of the six rare variants not observed among smaller normal alleles, is associated with haplotype C and appears to correlate with larger intergenerational expansions and early onset of symptoms. Spanish HD haplotypes are characterised by a high genetic diversity, potentially admixed with other non-Caucasian populations, with a higher representation of A2 and C haplotypes than most European populations. Differences in haplotype distributions across the CAG length range support differential germline expansion dynamics, with A2 and C showing the largest intergenerational expansions. This haplotype-dependent germline instability may be driven by specific cis-elements, such as the CAACAG deletion
    corecore