143 research outputs found
Applicability domains of neural networks for toxicity prediction
In this paper, the term "applicability domain" refers to the range of chemical compounds for which the statistical quantitative structure-activity relationship (QSAR) model can accurately predict their toxicity. This is a crucial concept in the development and practical use of these models. First, a multidisciplinary review is provided regarding the theory and practice of applicability domains in the context of toxicity problems using the classical QSAR model. Then, the advantages and improved performance of neural networks (NNs), which are the most promising machine learning algorithms, are reviewed. Within the domain of medicinal chemistry, nine different methods using NNs for toxicity prediction were compared utilizing 29 alternative artificial intelligence (AI) techniques. Similarly, seven NN-based toxicity prediction methodologies were compared to six other AI techniques within the realm of food safety, 11 NN-based methodologies were compared to 16 different AI approaches in the environmental sciences category and four specific NN-based toxicity prediction methodologies were compared to nine alternative AI techniques in the field of industrial hygiene. Within the reviewed approaches, given known toxic compound descriptors and behaviors, we observed a difficulty in being able to extrapolate and predict the effects with untested chemical compounds. Different methods can be used for unsupervised clustering, such as distance-based approaches and consensus-based decision methods. Additionally, the importance of model validation has been highlighted within a regulatory context according to the Organization for Economic Co-operation and Development (OECD) principles, to predict the toxicity of potential new drugs in medicinal chemistry, to determine the limits of detection for harmful substances in food to predict the toxicity limits of chemicals in the environment, and to predict the exposure limits to harmful substances in the workplace. Despite its importance, a thorough application of toxicity models is still restricted in the field of medicinal chemistry and is virtually overlooked in other scientific domains. Consequently, only a small proportion of the toxicity studies conducted in medicinal chemistry consider the applicability domain in their mathematical models, thereby limiting their predictive power to untested drugs. Conversely, the applicability of these models is crucial; however, this has not been sufficiently assessed in toxicity prediction or in other related areas such as food science, environmental science, and industrial hygiene. Thus, this review sheds light on the prevalent use of Neural Networks in toxicity prediction, thereby serving as a valuable resource for researchers and practitioners across these multifaceted domains that could be extended to other fields in future research
A Study of Geometric Semantic Genetic Programming with Linear Scaling
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is a scientific discipline that endeavors to enable computers
to learn without the need for explicit programming. Evolutionary Algorithms (EAs),
a subset of ML algorithms, mimic Darwin’s Theory of Evolution by using natural
selection mechanisms (i.e., survival of the fittest) to evolve a group of individuals
(i.e., possible solutions to a given problem). Genetic Programming (GP) is the most
recent type of EA and it evolves computer programs (i.e., individuals) to map a set of
input data into known expected outputs. Geometric Semantic Genetic Programming
(GSGP) extends this concept by allowing individuals to evolve and vary in the semantic
space, where the output vectors are located, rather than being constrained by syntaxbased
structures. Linear Scaling (LS) is a method that was introduced to facilitate the
task of GP of searching for the best function matching a set of known data. GSGP
and LS have both, independently, shown the ability to outperform standard GP for
symbolic regression. GSGP uses Geometric Semantic Operators (GSOs), different
from the standard ones, without altering the fitness, while LS modifies the fitness
without altering the genetic operators. To the best of our knowledge, there has been
no prior utilization of the combined methodology of GSGP and LS for classification
problems. Furthermore, despite the fact that they have been used together in one
practical regression application, a methodological evaluation of the advantages and
disadvantages of integrating these methods for regression or classification problems
has never been performed. In this dissertation, a study of a system that integrates both
GSGP and LS (GSGP-LS) is presented. The performance of the proposed method, GSGPLS,
was tested on six hand-tailored regression benchmarks, nine real-life regression
problems and three real-life classification problems. The obtained results indicate that
GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected
benefit of this integration. However, for some particularly hard regression datasets,
GSGP-LS overfits training data, being outperformed by GSGP on unseen data. This
contradicts the idea that LS is always beneficial for GP, warning the practitioners about
its risk of overfitting in some specific cases.A Aprendizagem Automática (AA) é uma disciplina científica que se esforça por
permitir que os computadores aprendam sem a necessidade de programação explícita.
Algoritmos Evolutivos (AE),um subconjunto de algoritmos de ML, mimetizam a Teoria
da Evolução de Darwin, usando a seleção natural e mecanismos de "sobrevivência dos
mais aptos"para evoluir um grupo de indivíduos (ou seja, possíveis soluções para
um problema dado). A Programação Genética (PG) é um processo algorítmico que
evolui programas de computador (ou indivíduos) para ligar características de entrada e
saída. A Programação Genética em Geometria Semântica (PGGS) estende esse conceito
permitindo que os indivíduos evoluam e variem no espaço semântico, onde os vetores
de saída estão localizados, em vez de serem limitados por estruturas baseadas em
sintaxe. A Escala Linear (EL) é um método introduzido para facilitar a tarefa da PG de
procurar a melhor função que corresponda a um conjunto de dados conhecidos. Tanto
a PGGS quanto a EL demonstraram, independentemente, a capacidade de superar a
PG padrão para regressão simbólica. A PGGS usa Operadores Semânticos Geométricos
(OSGs), diferentes dos padrões, sem alterar o fitness, enquanto a EL modifica o fitness
sem alterar os operadores genéticos. Até onde sabemos, não houve utilização prévia
da metodologia combinada de PGGS e EL para problemas de classificação. Além disso,
apesar de terem sido usados juntos em uma aplicação prática de regressão, nunca foi
realizada uma avaliação metodológica das vantagens e desvantagens da integração
desses métodos para problemas de regressão ou classificação. Nesta dissertação, é
apresentado um estudo de um sistema que integra tanto a PGGS quanto a EL (PGGSEL).
O desempenho do método proposto, PGGS-EL, foi testado em seis benchmarks de
regressão personalizados, nove problemas de regressão da vida real e três problemas
de classificação da vida real. Os resultados obtidos indicam que o PGGS-EL supera
o PGGS na maioria dos casos, confirmando o benefício esperado desta integração.
No entanto, para alguns conjuntos de dados de regressão particularmente difíceis, o
PGGS-EL faz overfit aos dados de treino, obtendo piores resultados em comparação com
PGGS em dados não vistos. Isso contradiz a ideia de que a EL é sempre benéfica para
a PG, alertando os praticantes sobre o risco de overfitting em alguns casos específicos
Computational pharmacology and computational chemistry of 4-hydroxyisoleucine: Physicochemical, pharmacokinetic, and DFT-based approaches
Computational pharmacology and chemistry of drug-like properties along with pharmacokinetic studies have made it more amenable to decide or predict a potential drug candidate. 4-Hydroxyisoleucine is a pharmacologically active natural product with prominent antidiabetic properties. In this study, ADMETLab 2.0 was used to determine its important drug-related properties. 4-Hydroxyisoleucine is compliant with important drug-like physicochemical properties and pharma giants’ drug-ability rules like Lipinski’s, Pfizer, and GlaxoSmithKline (GSK) rules. Pharmacokinetically, it has been predicted to have satisfactory cell permeability. Blood–brain barrier permeation may add central nervous system (CNS) effects, while a very slight probability of being CYP2C9 substrate exists. None of the well-known toxicities were predicted in silico, being congruent with wet lab results, except for a “very slight risk” for respiratory toxicity predicted. The molecule is non ecotoxic as analyzed with common indicators such as bioconcentration and LC50 for fathead minnow and daphnia magna. The toxicity parameters identified 4-hydroxyisoleucine as non-toxic to androgen receptors, PPAR-γ, mitochondrial membrane receptor, heat shock element, and p53. However, out of seven parameters, not even a single toxicophore was found. The density functional theory (DFT) study provided support to the findings obtained from drug-like property predictions. Hence, it is a very logical approach to proceed further with a detailed pharmacokinetics and drug development process for 4-hydroxyisoleucine
The Design, Synthesis, and Biological Evaluation of Compounds with Medicinal Value
The book explores issues concerning the design, synthetic methods and biological evaluation of molecules of pharmaceutical interest
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is becoming part of our lives, from face recognition to sensors of the latest cars. However, the construction of its pipelines is a time-consuming and expensive process, even for experts that have the knowledge in ML algorithms, due to the several options for each step. To overcome this issue, Automated ML (AutoML) was introduced, automating some steps of this process. One of its recent algorithms is Tree-Based Pipeline Optimization Tool (TPOT), an Evolutionary Algorithm (EA) that automatically designs and optimizes ML pipelines using Genetic Programming (GP). Another recent algorithm is Geometric Semantic Genetic Programming (GSGP), an EA characterized by using the semantics, the vector of outputs of a program on the different training data, and by searching directly in the space of semantics of the program through geometric semantic operators, leading to a unimodal fitness landscape. In this work, a new version of TPOT was created, called TPOT-GSGP, where GSGP is one of the options for model selection. This new algorithm was implemented in Python, only for regression problems and using Negative Mean Absolute Error as measurement error. Five case studies were used to compare the performance of three algorithms: TPOT-GSGP, the original TPOT, and GSGP. Additionally, the statistical significance of the difference on the last generation’s score for each combination of two algorithms was checked with Wilcoxon tests. There was not a single algorithm that outperformed the others in all datasets, sometimes it was TPOT-GSGP and others TPOT, depending on the case study and on the score that was analysed (learning or test). It was concluded that every time GSGP is chosen as root 50% of the times or more, TPOT-GSGP outperformed TPOT on the test set. Therefore, the advantages of this new algorithm can be extraordinary with its development and adjustment in future work
The Potential of Dietary Antioxidants
Oxidative stress causes chronic diseases such as cardiovascular disease, cancer, Alzheimer, chronic obstructive pulmonary, and neurodegenerative pathologies. Antioxidant systems defend human cells from free radicals. They act by stopping free radicals, decreasing their development, and quenching the formed ROS and RNS. The antioxidant molecules are classified into primary and secondary defense molecules. The primary antioxidant molecules (i.e., vitamins C and E, ubiquinone, and glutathione) reduce oxidation effects by moving a proton to the free radical species or electron donors, or by terminating the chain reactions The secondary antioxidants (i.e., N-acetyl cysteine and lipoic acid) act as cofactors for some enzyme systems or neutralize the production of free radicals by transition metals. This work comprises original research papers and reviews on antioxidant molecules in food, the agricultural practices that maximize their levels in plants, the potential preventive effects of selected classes of antioxidant molecules, their potential use in functional foods, and the pharmaceutical delivery systems that maximize their potential activity when used as supplements
- …