72 research outputs found
A Python graphical user interface for molecular descriptors based on RDKit
Funding Information:
It was also funded by the project “NORTE‐01‐0247‐FEDER‐047212”, supported by Northern Portugal Regional Operational Programme (Norte2020), under the Portugal 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and the Portuguese National Innovation Agency (ANI).
Publisher Copyright:
© 2023 Wiley-VCH GmbH.GUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures. A command line interface (CLI) is also provided for the calculation of grid representations. The source code is available at https://github.com/jairesdesousa/guidemol.preprintpublishe
Forecasting Demand in the Pharmaceutical Industry Using Machine Learning
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Driven Marketing, specialization in Data Science for MarketingThis study delves into the exploitation of three machine learning models, namely the Extreme Gradient
Boosting (XGBoost), the Long Short-Term Memory (LSTM), and the novel Prophet algorithm, to surpass
the challenge of demand forecast within the pharmaceutical industry. Following the CRISP-DM
framework, we enabled accurate sales forecasting by studying, treating, transforming, and training a
dataset containing historical sales data from a major Portuguese pharmaceutical company. Our
findings align with the literature, underlying the robustness of the XGBoost and the inefficacy of the
LSTM for the delineated task, considering the singularities of the provided data. Furthermore, this
research highlights the potential of the Prophet for both its effectiveness and efficiency. This endeavor
allowed us to reinforce the literature’s conviction of the need for product-specific forecasting,
showcasing that no single model achieves the best accuracy for all drugs
Atomic Descriptors and Molecular Operators
Funding Information: This work was supported by the Associate Laboratory for Green Chemistry (LAQV), which is financed by national funds from the Funda\u00E7\u00E3o para a Ci\u00EAncia e Tecnologia (FCT/MECI), Portugal, under grants LA/P/0008/2020 DOI 10.54499/LA/P/0008/2020, UIDP/50006/2020 DOI 10.54499/UIDP/50006/2020, and UIDB/50006/2020 DOI 10.54499/UIDB/50006/2020. This work was co-funded by the European Union through scholarships awarded to N.B. and X.G. by the Erasmus Mundus Joint Masters ChEMoinformaticsplus project (program ERASMUS2027, ERASMUS-EDU-2021-PEX-EMJM-MOB; project number 101050809). Publisher Copyright: © 2024 by the authors.A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.publishersversionpublishe
Síntese de N-arilaziridinas quirais
Dissertação apresentada para obtenção do
Grau de Doutor em Química,especialidade de Química Orgânica,pela Universidade Nova de Lisboa,Faculdade de Ciências e TecnologiaFoi estudada a verificação de enantiosselectividade ou diastereosselectividade na síntese
de aziridinas a partir de olefinas electrodeficientes e ácidos N-aril-hidroxâmicos na presença de uma base.
A aziridinação das olefinas quirais acrilato de (-)-8-fenilmentol e acrilato de (-)-2,5sultamabornano deu origem a excessos diastereoisoméricos inferiores a 50%. Num dos casos foi possível separar os dois diastereoisómeros e, após metanólise, obter cada uma das carbometoxi-aziridinas respectivas (enantioméricas) enantiomericamentepuras.
Foram testados ácidos N-fenil-hidroxâmicos quirais derivados do ácido dehidroabiético,
ácido canfânico e ácido de Mosher tendo ocorrido aziridinação apenas com os
dois primeiros e com e.e. não superior a 18%.
A utilização de reagentes não quirais num meio heterogéneo (base aquosa / solvente
orgânico) com catálise de transferência de fase quiral por sais quaternários de cinchonina permitiu obter aziridinas com e.e. até 62%. Foram estudados os factores que influenciam a reacção, nomeadamente as estruturas da olefina, ácido hidroxâmico e catalisador, tipo de base, solvente e temperatura. Observou-se nomeadamente que os sais de cinchonidina dão origem aos mesmos enantiómeros maioritários. Foi proposto um modelo explicativo das enantiosselectividades observadas, no qual é essencial a intervenção do grupo vinilo do
catalisador.
Cristalografia de raios X de uma aziridina derivada do acrilato de (-)-2,5sultamabornano
permitiu deduzir a configuração absoluta da carbometoxi-aziridina obtida a partir dela, por metanólise, assim como de outras carbometoxi-, carboetoxi- e carboterbutoxiaziridinas,com base em considerações mecanísticas, rotação óptica e espectroscopia de 1H RMN com complexos quirais de lantanídeos.JNICT, Bolsa de Doutoramento 2658/93 dos Programas Ciência e Praxis XX
Implementação de exposições virtuais em ambiente tridimensional em museus de ciência e técnica
Tese de Mestrado. Multimédia. Faculdade de Engenharia. Universidade do Porto. 201
Automatic assignment of absolute configuration from 1D NMR data
Opposite enantiomers exhibit different NMR properties in the presence of an external common chiral element, and a chiral molecule exhibits different NMR properties in the presence of external enantiomeric chiral elements. Automatic prediction of such differences, and comparison with experimental values, leads to the assignment of the absolute configuration. Here two cases are reported, one using a dataset of 80 chiral secondary alcohols esterified with (R)-MTPA and the corresponding 1H NMR chemical shifts and the other with 94 13C NMR chemical shifts of chiral secondary alcohols in two enantiomeric chiral solvents. For the first application, counterpropagation neural networks were trained to predict the sign of the difference between chemical shifts of opposite stereoisomers. The neural networks were trained to process the chirality code of the alcohol as the input, and to give the NMR property as the output. In the second application, similar neural networks were employed, but the property to predict was the difference of chemical shifts in the
two enantiomeric solvents. For independent test sets of 20 objects, 100% correct predictions were obtained in both applications concerning the sign of the chemical shifts differences. Additionally,
with the second dataset, the difference of chemical shifts in the two enantiomeric solvents was quantitatively predicted, yielding r2 0.936 for the test set between the predicted and experimental values
Theoretical and experimental studies of aryl-bithiophene based push-pull pi-conjugated heterocyclic systems bearing cyanoacetic or rhodanine-3-acetic acid acceptors for SHG nonlinear optical applications
A series of push-pull aryl-bithiophene based systems 2-3 were designed and synthesized in order to understand how structural modifications influence the electronic, linear and nonlinear optical properties. The push-pull conjugated chromophores 2-3 bear a bithiophene spacer conjugated with a phenyl ring functionalized with N,N-dialkylamino electron-donor groups together with cyanoacetic or rhodanine-3-acetic acid acceptor groups. Theoretical (DFT calculations) and experimental studies were carried out to obtain information on conformation, electronic structure, electron distribution, dipolar moment, and molecular nonlinearity response of the push-pull bithiophene derivatives. This multidisciplinary study revealed that chromophore 2e exhibits the highest value for hyperpolarizability beta (10440 × 10-30 esu) due to the strong electron donating ability of the N,N-diethylamino group, and the ethyne linker that not only lengthens the pi- conjugation path but also grants less distortion to the system.Thanks are due to Fundação para a Ciência e Tecnologia (FCT) for a
PhD grant to S. S. M. Fernandes (SFRH/BD/87786/2012) and FEDERCOMPETE
for financial support through the CQ/UM (Ref. UID/QUI/
00686/2013 and UID/QUI/0686/2016). The NMR spectrometer Bruker
Avance III 400 is part of the National NMR Network and was purchased
within the framework of the National Program for Scientific Reequipment,
contract REDE/1517/RMN/2005 with funds from POCI
2010 (FEDER) and FCT. The pulsed laser system was acquired within
the framework of the grant (PTDC/CTM/105597/2008) from the
Fundação para a Ciência e Tecnologia (FCT) with funding from FEDERCOMPETE.
This work was also supported by the Associated Laboratory
for Sustainable Chemistry - Clean Processes and Technologies - LAQV
which is financed by Portuguese national funds from FCT/MEC (UID/
QUI/50006/2013) and co-financed by the ERDF under the PT2020
Partnership Agreement (POCI-01-0145-FEDER–007265).info:eu-repo/semantics/publishedVersio
An Arduino-Based Talking Calorimeter for Inclusive Lab Activities
UID/QUI/50006/2019 POCI-01-0145-FEDER-007265 PEest/UID/CEC/04516/2019 02/2018/ARNI (International Academic Mobility Program.PMAI) 10/2018/PROPPIT (Course Completion Work Program.PROTCC)This work describes a simple talking calorimeter for the visually impaired based on the Arduino Uno without any shield. An electronic interface was designed using a Wheatstone bridge, a thermistor, or an operational amplifier (opamp). The temperature values are communicated by a loudspeaker connected to pulse-width modulation (PWM) digital output pins 3 and 11 of the Arduino Uno. The system is based on the Talkie library for Arduino Uno. This library was developed using Linear Predictive Coding and includes about 1000 English words. Two new Talkie libraries were constructed, one for Portuguese and another for German. This device can be easily implemented in any teaching laboratory with extremely reduced costs.publishersversionpublishe
QSPR Modeling of Liquid-liquid Equilibria in Two-phase Systems of Water and Ionic Liquid
UIDB/50006/2020The increasing application of new ionic liquids (IL) creates the need of liquid-liquid equilibria data for both miscible and quasi-immiscible systems. In this study, equilibrium concentrations at different temperatures for ionic liquid+water two-phase systems were modeled using a Quantitative-Structure-Property Relationship (QSPR) method. Data on equilibrium concentrations were taken from the ILThermo Ionic Liquids database, curated and used to make models that predict the weight fraction of water in ionic liquid rich phase and ionic liquid in the aqueous phase as two separate properties. The major modeling challenge stems from the fact that each single IL is characterized by several data points, since equilibrium concentrations are temperature dependent. Thus, new approaches for the detection of potential data point outliers, testing set selection, and quality prediction have been developed. Training set comprised equilibrium concentration data for 67 and 68 ILs in case of water in IL and IL in water modeling, respectively. SiRMS, MOLMAPS, Rcdk and Chemaxon descriptors were used to build Random Forest models for both properties. Models were subjected to the Y-scrambling test for robustness assessment. The best models have also been validated using an external test set that is not part of the ILThermo database. A two-phase equilibrium diagram for one of the external test set IL is presented for better visualization of the results and potential derivation of tie lines.authorsversioninpres
- …