24 research outputs found

    SOFTWARE SDBAYES: UM AUXÍLIO PARA A PREDIÇÃO DE EVASÃO DISCENTE

    Get PDF
    Esse artigo apresenta um software que é resultado de um projeto de pesquisa que visa a predição de evasão discente no ensino superior. O software foi criado com o intuito de prever os discentes mais propensos a evadir de uma instituição de ensino superior, apresentando a probabilidade individual e o motivo mais forte que está conduzindo o discente a evadir, auxiliando os gestores acadêmicos na tomada de decisão proativa. O software utiliza redes bayesianas que é um dos métodos de classificação descritos na literatura de Data Mining, muito usado para geração de estimativas discretas sobre o problema abordado. O software possui dois modos de operação: o primeiro, calcula a probabilidade de um único discente evadir, além de ser um modo mais completo e intuitivo de analisar os riscos de evasão do discente, já o segundo modo, analisa um conjunto de discentes, atribuindo a cada um deles um valor de probabilidade de evasão, além do que é possível exibir as variáveis que mais influenciam para tal probabilidade

    An activity prediction model using shape-based descriptor method

    Get PDF
    Similarity searching, the activity of an unknown compound (target) is predicted through the comparison of an unknown compound with a set of known activities of compounds. The known activities of the most similar compounds are assigned to the unknown compound. Different machine learning methods and Multilevel Neighborhoods of Atoms (MNA) structure descriptors have been applied for the activities prediction. In this paper, we introduced a new activity prediction model with Shape-Based Descriptor Method (SBDM). Experimental results show that SBDM-MNA provides a useful method of using the prior knowledge of target class information (active and inactive compounds) of predicting the activity of orphan compounds. To validate our method, we have applied the SBDM-MNA to different established data sets from literature and compare its performance with the classical MNA descriptor for activity prediction

    Review of QSAR Models and Software Tools for Predicting of Genotoxicity and Carcinogenicity

    Get PDF
    This review of QSARs for genotoxicity and carcinogenicity was performed in a broad sense, considering both models available in software tools and models that are published in the literature. The review considered the potential applicability of diverse models to pesticides as well as to other types of regulated chemicals and pharmaceuticals. The availability of models and information on their applicability is summarised in tables, and a range of illustrative or informative examples are described in more detail in the text. In many cases, promising models were identified but they are still at the research stage. For routine application in a regulatory setting, further efforts will be needed to explore the applicability of such models for specific purposes, and to implement them in a practically useful form (i.e. user-friendly software). It is also noted that a range of software tools are research tools suitable for model development, and these require more specialised expertise than other tools that are aimed primarily at end-users such as risk assessors. It is concluded that the most useful models are those which are implemented in software tools and associated with transparent documentation on the model development and validation process. However, it is emphasised that the assessment of model predictions requires a reasonable amount of QSAR knowledge, even if it is not necessary to be a QSAR practitioner.JRC.DG.I.6-Systems toxicolog

    Inductive queries for a drug designing robot scientist

    Get PDF
    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

    Novel topological descriptors for analyzing biological networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Topological descriptors, other graph measures, and in a broader sense, graph-theoretical methods, have been proven as powerful tools to perform biological network analysis. However, the majority of the developed descriptors and graph-theoretical methods does not have the ability to take vertex- and edge-labels into account, e.g., atom- and bond-types when considering molecular graphs. Indeed, this feature is important to characterize biological networks more meaningfully instead of only considering pure topological information.</p> <p>Results</p> <p>In this paper, we put the emphasis on analyzing a special type of biological networks, namely bio-chemical structures. First, we derive entropic measures to calculate the information content of vertex- and edge-labeled graphs and investigate some useful properties thereof. Second, we apply the mentioned measures combined with other well-known descriptors to supervised machine learning methods for predicting Ames mutagenicity. Moreover, we investigate the influence of our topological descriptors - measures for only unlabeled vs. measures for labeled graphs - on the prediction performance of the underlying graph classification problem.</p> <p>Conclusions</p> <p>Our study demonstrates that the application of entropic measures to molecules representing graphs is useful to characterize such structures meaningfully. For instance, we have found that if one extends the measures for determining the structural information content of unlabeled graphs to labeled graphs, the uniqueness of the resulting indices is higher. Because measures to structurally characterize labeled graphs are clearly underrepresented so far, the further development of such methods might be valuable and fruitful for solving problems within biological network analysis.</p

    Fusion of molecular representations and prediction of biological activity using convolutional neural network and transfer learning

    Get PDF
    Basic structural features and physicochemical properties of chemical molecules determine their behaviour during chemical, physical, biological and environmental processes and hence need to be investigated for determining and modelling the actions of the molecule. Computational approaches such as machine learning methods are alternatives to predict physiochemical properties of molecules based on their structures. However, limited accuracy and error rates of these predictions restrict their use. This study developed three classes of new methods based on deep learning convolutional neural network for bioactivity prediction of chemical compounds. The molecules are represented as a convolutional neural network (CNN) with new matrix format to represent the molecular structures. The first class of methods involved the introduction of three new molecular descriptors, namely Mol2toxicophore based on molecular interaction with toxicophores features, Mol2Fgs based on distributed representation for constructing abstract features maps of a selected set of small molecules, and Mol2mat, which is a molecular matrix representation adapted from the well-known 2D-fingerprint descriptors. The second class of methods was based on merging multi-CNN models that combined all the molecular representations. The third class of methods was based on automatic learning of features using values within the neurons of the last layer in the proposed CNN architecture. To evaluate the performance of the methods, a series of experiments were conducted using two standard datasets, namely MDL Drug Data Report (MDDR) and Sutherland datasets. The MDDR datasets comprised 10 homogeneous and 10 heterogeneous activity classes, whilst Sutherland datasets comprised four homogeneous activity classes. Based on the experiments, the Mol2toxicophore showed satisfactory prediction rates of 92% and 80% for homogeneous and heterogeneous activity classes, respectively. The Mol2Fgs was better than Mol2toxicophore with prediction accuracy result of 95% for homogeneous and 90% for heterogeneous activity classes. The Mol2mat molecular representation had the highest prediction accuracy with 97% and 94% for homogeneous and heterogeneous datasets, respectively. The combined multi-CNN model leveraging on the knowledge acquired from the three molecular presentations produced better accuracy rate of 99% for the homogeneous and 98% for heterogeneous datasets. In terms of molecular similarity measure, use of the values in the neurons of the last hidden layer as the automatically learned feature in the multi-CNN model as a novel molecular learning representation was found to perform well with 88.6% in terms of average recall value in 5% structures most similar to the target search. The results have demonstrated that the newly developed methods can be effectively used for bioactivity prediction and molecular similarity searching

    Plant extracts and natural products - Predictive structural and biodiversity-based analyses of uses, bioactivity, and 'research and development' potential

    Get PDF
    The process of drug discovery and development over the last 30 years has been increasingly shaped by formulaic approaches and natural products – integral to the drug discovery process and widely recognized as the most successful class of drug leads – have significantly been deprioritized by a struggling worldwide pharmaceutical industry. Alkaloids - historically the most important superclass of medically important secondary metabolites - have been used worldwide as a source of remedies to treat a wide variety of illnesses yet, there exists a wide discrepancy between their historical and modern significances. To understand these trends from an insider’s perspective, 52 senior-stakeholders in industry and academia were engaged to provide insights on a series of qualitative and quantitative aspects related to developments in the process of drug discovery from natural products. Stakeholders highlighted the dissonance between the perceived high potential of natural products as drug leads and overall industry and company level strategies. Many industry contacts were highly critical to prevalent company and industry-wide drug discovery strategies indicating a high level of dissatisfaction within the industry. One promising strategy which respondents highlighted was virtual screening which, to a large extent has not been explored in natural products research strategies. Furthermore, the physicochemical features of 27,783 alkaloids from the Dictionary of Natural Products were cross-referenced to pharmacologically significant and other metrics from various databases including the European Bioinformatics Institute’s ChEMBL and Global Biodiversity Information Facility’s GBIF biodiversity data. The combined dataset revealed that a compound's likelihood of medicinal use can be linked to its host species’ abundance and was input into target-independent machine learning algorithms to predict likelihood of pharmaceutical use. The neural network model demonstrated an accuracy of >57% for all pharmaceutical alkaloids and 98% of all alkaloids. This study is the first to incorporate the biodiversity of host organisms in a machine learning scheme characterizing druglikeness and thus demonstrates the link between host species’ abundance and druglikeness. These findings yield new insights into cost-effective, real-world indicators of drug development potential across the diverse field of natural products

    In silico modeling of chemical and biological interactions at different scales

    Get PDF
    En les últimes dècades, molts països han imposat regulacions sobre els efectes potencials de les substàncies químiques envers la salut humana i els criteris mediambientals. A més a més, tenint en compte el temps necessari per a les proves d’avaluació dels efectes de gran nombre de productes químics i el seu cost ha produït un ràpid augment en el nombre de models computacionals, que relacionen l'estructura de les substàncies químiques amb la seva activitat biològica. Actualment existeixen els models de relació estructura-activitat (SAR) per a productes químics, utilitzant un enfocament similar s’ha desenvolupat un nou model i generat conjunts d'alertes metabòliques que es puguin utilitzar juntament amb els mètodes Q(SAR). Aquest treball presenta regles SAR per a la predicció de mutagenicitat in vitro, juntament amb alertes metabòliques per a la predicció in vivo. Permetent, obtenir una idea preliminar sobre si un producte químic exhibeix el mateix comportament mutagènic in vitro i in vivo. Entre els compostos químics, les nanopartícules, també s'estan utilitzant cada cop més a través de diferents classes de productes usats pels consumidors. En un context fisiològic, la corona de les proteïnes constitueix la interfície entre les nanopartícules i les cèl·lules. En aquest treball, s'han utilitzat les propietats fisicoquímiques de la corona de les proteïnes per tal de desenvolupar un model capaç de predir l'associació cel·lular. Finalment, aquesta tesi es centra en el tema de la resistència als fàrmacs en els bacteris, que s'ha convertit en un assumpte d'interès global. Amb l'augment de la resistència dels bacteris als antibiòtics, és important disposar d'informació sobre la resposta que les noves proteïnes bacterianes tindrien sobre els antibiòtics actualment disponibles. Pel qual, en aquest treball s'ha desenvolupat un mètode d'alineació lliure per millorar la classificació en perfils de resistència de les proteïnes bacterianes, en base a les seves propietats fisicoquímiques.En las últimas décadas, muchos países han impuesto regulaciones sobre los efectos potenciales de las sustancias químicas con respecto a la salud humana y a criterios medio ambientales. Además, el tiempo necesario para las pruebas de evaluación de los efectos de un gran número de productos químicos y su coste ha producido un rápido aumento en el número de modelos computacionales que relacionan la estructura de las sustancias químicas con su actividad biológica. Actualmente existen los modelos de relación estructura-actividad (SAR) para productos químicos, utilizando un enfoque similar se ha desarrollado un nuevo modelo para generar conjuntos de alertas metabólicas que puedan utilizarse junto con los métodos Q(SAR). Este trabajo presenta reglas SAR para la predicción de mutagenicidad in vitro, junto con alertas metabólicas para la predicción también in vivo. Permitiendo, además, obtener una idea preliminar de si un producto químico exhibe el mismo comportamiento mutagénico in vitro e in vivo. Entre los compuestos químicos, las nanopartículas, también se están utilizando cada vez más en diferentes clases de productos usados por los consumidores. En términos fisiológicos, la corona de las proteínas constituye la interfaz entre las nanopartículas y las células. En este trabajo se ha desarrollado un modelo con las propiedades físico-químicas de la corona de las proteínas para predecir la asociación celular. Por último, esta tesis se centra en el tema de la resistencia a los fármacos en las bacterias, que se ha convertido en un asunto de interés global. Con el aumento de la resistencia de las bacterias a los antibióticos, es importante disponer información sobre la respuesta que las nuevas proteínas bacterianas tendrán sobre los antibióticos actualmente disponibles. Por esto se ha desarrollado un método de alineación libre para mejorar la clasificación en perfiles de resistencia de las proteínas bacterianas en base a sus propiedades físico-químicas.In the past decades, government, society and industry at large have taken keen interest in the impact at different scales that exposure to chemicals has on humans and environment. Many countries governments have imposed regulations as per which it has become important to establish the potential effects of these chemical entities with respect to human health and environmental endpoints. Given the time taken by traditional tests, costs and large number of chemicals to be evaluated, there has been a rapid growth in the number of computational models that link the structure of chemicals to their biological activity. To extend the basis of knowledge that currently exists in Structure Activity Relationship (SAR) models for chemicals, a similar approach was used to develop a new model and generate sets of metabolic triggers which can be used together with Q(SAR) methods. This thesis presents SAR rules for prediction of mutagenicity in vitro, along with metabolic triggers for prediction of mutagenicity in vitro and in vivo. Along with chemical compounds, nanoparticles are also being used increasingly across different classes of consumers’ products. Since, in physiological context, the protein corona constitutes the interface between the nanoparticle and cells, it plays a fundamental role in nanoparticle-cell association. In this thesis, the physicochemical properties of protein corona were used to develop a model to predict cell association. Lastly, this thesis focuses on the topic of drug resistance in bacteria, which has become a matter of global concern. With bacteria growing resistant to antibiotics at a faster pace than discovery of new antibiotics, information on the response that new bacterial proteins would have to the currently available antibiotics, based on their similarity with the known antibiotic-resistant proteins is necessary. An alignment-free method was developed to improve the resistance profile classification of bacterial proteins based on their physicochemical properties
    corecore