Search CORE

122 research outputs found

Molecular Interactions in Chromatographic Retention: A Tool for QSRR/QSPR/QSAR Studies

Author: Berenice da Silva Junkes
Carlos Alberto Kuhnen
Rosendo Augusto Yunes
Vilma Edite Fonseca Heinzen
Publication venue: 'IntechOpen'
Publication date: 29/02/2012
Field of study

IntechOpen

Quantitative structure-(chromatographic) retention relationships

Author: Héberger Károly
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Crossref

Repository of the Academy's Library

FLUFF-BALL, a Fuzzy Superposition and QSAR Technique - Towards an Automated Computational Detection of Biologically Active Compounds Using Multivariate Methods (FLUFF-BALL, sumea superpositio ja QSAR-menetelmä - Tavoitteena bioaktiivisten molekyylien automaattinen tietokoneavusteinen tunnistaminen hyödyntäen monimuuttujamenetelmiä)

Author: Korhonen Samuli-Petrus
Publication venue: University of Kuopio
Publication date
Field of study

UEF Electronic Publications

Comparative Analysis of Predictive Data-Mining Techniques

Author: Nsofor Godswill Chukwugozie
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2006
Field of study

This thesis compares five different predictive data-mining techniques (four linear techniques and one nonlinear technique) on four different and unique data sets: the Boston Housing data sets, a collinear data set (called “the COL” data set in this thesis), an airliner data set (called “the Airliner” data in this thesis) and a simulated data set (called “the Simulated” data in this thesis). These data are unique, having a combination of the following characteristics: few predictor variables, many predictor variables, highly collinear variables, very redundant variables and presence of outliers. The natures of these data sets are explored and their unique qualities defined. This is called data pre-processing and preparation. To a large extent, this data processing helps the miner/analyst to make a choice of the predictive technique to apply. The big problem is how to reduce these variables to a minimal number that can completely predict the response variable. Different data-mining techniques, including multiple linear regression MLR, based on the ordinary least-square approach; principal component regression (PCR), an unsupervised technique based on the principal component analysis; ridge regression, which uses the regularization coefficient (a smoothing technique); the Partial Least Squares (PLS, a supervised technique), and the Nonlinear Partial Least Squares (NLPLS), which uses some neural network functions to map nonlinearity into models, were applied to each of the data sets. Each technique has different methods of usage; these different methods were used on each data set first and the best method in each technique was noted and used for global comparison with other techniques for the same data set. Based on the five model adequacy measuring criteria used, the PLS outperformed all the other techniques for the Boston housing data set. It used only the first nine factors and gave an MSE of 21.1395, a condition number less than 29, and a modified coefficient of efficiency, E-mod, of 0.4408. The closest models to this are the models built with all the variables in MLR, all PCs in PCR, and all factors in PLS. Using only the mean absolute error (MAE), the ridge regression with a regularization parameter of 1 outperformed all other models, but the condition number (CN) of the PLS (nine factors) was better. With the COL data, which is highly collinear data set, the best model, based on the condition number (\u3c100) and MSE (57.8274) was the PLS with two factors. If the selection is based on the MSE only, the ridge regression with an alpha value of 3.08 would be the best because it gave an MSE of 31.8292. The NLPLS was not considered even though it gave an MSE of 22.7552 because NLPLS mapped nonlinearity into the model and in this case, the solution was not stable. With the Airliner data set, which is also a highly ill-conditioned data set with redundant input variables, the ridge regression with regularization coefficient of 6.65 outperformed all the other models (with an MSE of 2.874 and condition number of 61.8195). This gave a good compromise between smoothing and bias. The lease MSE and MAE were recorded in PLS (all factors), PCR (all PCs), and MLR (all variables), but the condition numbers were far above 100. For the Simulated data set, the best model was the optimal PLS (eight factors) model with an MSE of 0.0601, an MAE of 0.1942 and a condition number of 12.2668. The MSE and MAE were the same for the PCR model built with PCs that accounted for 90% of the variation in the data, but the condition numbers were all more than 1000. The PLS, in most cases, gave better models both in the case of ill-conditioned data sets and also for data sets with redundant input variables. The principal component regression and the ridge regression, which are methods that basically deal with the highly ill-conditioned data matrix, performed well also in those data sets that were ill-conditioned

University of Tennessee, Knoxville: Trace

Impact of Urban Surface Characteristics and Socio-Economic Variables on the Spatial Variation of Land Surface Temperature in Lagos City, Nigeria

Author: DMSLB Dissanayake
Hepi H. Handayani
Manjula Ranagalage
Takehiro MORIMOTO
Yuji Murayama
村山祐司
森本健弘
Publication venue: 'MDPI AG'
Publication date: 01/12/2018
Field of study

The urban heat island (UHI) and its consequences have become a key research focus of various disciplines because of its negative externalities on urban ecology and the total livability of cities. Identifying spatial variation of the land surface temperature (LST) provides a clear picture to understand the UHI phenomenon, and it will help to introduce appropriate mitigation technique to address the advanced impact of UHI. Hence, the aim of the research is to examine the spatial variation of LST concerning the UHI phenomenon in rapidly urbanizing Lagos City. Four variables were examined to identify the impact of urban surface characteristics and socio-economic activities on LST. The gradient analysis was employed to assess the distribution outline of LST from the city center point to rural areas over the vegetation and built-up areas. Partial least square (PLS) regression analysis was used to assess the correlation and statistically significance of the variables. Landsat data captured in 2002 and 2013 were used as primary data sources and other gridded data, such as PD and FFCOE, were employed. The results of the analyses show that the distribution pattern of the LST in 2002 and 2013 has changed over the study period as results of changing urban surface characteristics (USC) and the influence of socio-economic activities. LST has a strong positive relationship with NDBI and a strong negative relationship with NDVI. The rapid development of Lagos City has been directly affected by conversion more green areas to build up areas over the time, and it has resulted in formulating more surface urban heat island (SUHI). Further, the increasing population and their socio-economic activities including industrialization and infrastructure development have also caused a significant impact on LST changes. We recommend that the results of this research be used as a proxy tool to introduce appropriate landscape and town planning in a sustainable viewpoint to make healthier and livable urban environments in Lagos City, Nigeria

Tsukuba Repository

Regulation of Microclimatic Conditions inside Native Beehives and Its Relationship with Climate in Southern Spain

Author: Flores Serrano J.M.
Gil-Lebrero Sergio
Gámiz-López Victoria
Navas González Francisco Javier
Quiles-Latorre Francisco Javier
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

In this study, the Wbee Sensor System was used to record data from 10 Iberian beehives for two years in southern Spain. These data were used to identify potential conditioning climatic factors of the internal regulatory behavior of the hive and its weight. Categorical principal components analysis (CATPCA) was used to determine the minimum number of those factors able to capture the maximum percentage of variability in the data recorded. Then, categorical regression (CATREG) was used to select the factors that were linearly related to hive internal humidity, temperature and weight to issue predictive regression equations in Iberian bees. Average relative humidity values of 51.7% ± 10.4 and 54.2% ± 11.7 were reported for humidity in the brood nest and in the food area, while average temperatures were 34.3 °C ± 1.5 in the brood nest and 29.9 °C ± 5.8 in the food area. Average beehive weight was 38.2 kg ± 13.6. Some of our data, especially those related to humidity, contrast with previously published results for other studies about bees from Central and northern Europe. Conclusively, certain combinations of climatic factors may condition within hive humidity, temperature and hive weight. Southern Iberian honeybees’ brood nest humidity regulatory capacity could be lower than brood nest thermoregulatory capacity, maintaining values close to 34 °C, even in dry conditions

Multidisciplinary Digital Publishing Institute

Repositorio Institucional de la Universidad de Córdoba

Kern-basierte Lernverfahren für das virtuelle Screening

Author: Rupp Matthias
Publication venue
Publication date: 05/02/2010
Field of study

We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.Wir untersuchen moderne Kern-basierte maschinelle Lernverfahren für das Liganden-basierte virtuelle Screening. Insbesondere entwickeln wir einen neuen Graphkern auf Basis iterativer Graphähnlichkeit und optimaler Knotenzuordnungen, setzen die Kernhauptkomponentenanalyse für Projektionsfehler-basiertes Novelty Detection ein, und beschreiben die Entdeckung eines neuen selektiven Agonisten des Peroxisom-Proliferator-aktivierten Rezeptors gamma mit Hilfe von Gauß-Prozess-Regression. Virtuelles Screening ist die rechnergestützte Priorisierung von Molekülen bezüglich einer vorhergesagten Eigenschaft. Es handelt sich um ein Problem der Chemieinformatik, das in der Trefferfindungsphase der Medikamentenentwicklung auftritt. Seine Liganden-basierte Variante beruht auf dem Ähnlichkeitsprinzip, nach dem (strukturell) ähnliche Moleküle tendenziell ähnliche Eigenschaften haben. In unserer Beschreibung des Lösungsansatzes mit Kern-basierten Lernverfahren betonen wir die Bedeutung molekularer Repräsentationen, einschließlich der auf ihnen definierten (Un)ähnlichkeitsmaße. Wir untersuchen Effekte in hochdimensionalen chemischen Deskriptorräumen, ihre Auswirkungen auf Ähnlichkeits-basierte Verfahren und geben einen Literaturüberblick zu Empfehlungen zur retrospektiven Validierung, einschließlich eines Beispiel-Workflows. Graphkerne sind formale Ähnlichkeitsmaße, die inneren Produkten entsprechen und direkt auf Graphen, z.B. annotierten molekularen Strukturgraphen, definiert werden. Wir geben einen Literaturüberblick über Graphkerne, insbesondere solche, die auf zufälligen Irrfahrten, Subgraphen und optimalen Knotenzuordnungen beruhen. Indem wir letztere mit einem Ansatz zur iterativen Graphähnlichkeit kombinieren, entwickeln wir den iterative similarity optimal assignment Graphkern. Wir beschreiben einen iterativen Algorithmus, zeigen dessen Konvergenz sowie die Eindeutigkeit der Lösung, und geben eine obere Schranke für die Anzahl der benötigten Iterationen an. In einer retrospektiven Studie zeigte unser Graphkern konsistent bessere Ergebnisse als chemische Deskriptoren und andere, auf optimalen Knotenzuordnungen basierende Graphkerne. Chemische Datensätze liegen oft auf Mannigfaltigkeiten niedrigerer Dimensionalität als der umgebende chemische Deskriptorraum. Dimensionsreduktionsmethoden erlauben die Identifikation dieser Mannigfaltigkeiten und stellen dadurch deskriptive Modelle der Daten zur Verfügung. Für spektrale Methoden auf Basis der Kern-Hauptkomponentenanalyse ist der Projektionsfehler ein quantitatives Maß dafür, wie gut neue Daten von solchen Modellen beschrieben werden. Dies kann zur Identifikation von Molekülen verwendet werden, die strukturell unähnlich zu den Trainingsdaten sind, und erlaubt so Projektionsfehler-basiertes Novelty Detection für virtuelles Screening mit ausschließlich positiven Beispielen. Wir führen eine Machbarkeitsstudie zur Lernbarkeit des Konzepts von Fettsäuren durch die Hauptkomponentenanalyse durch. Der Peroxisom-Proliferator-aktivierte Rezeptor (PPAR) ist ein im Zellkern vorkommender Rezeptor, der den Fett- und Zuckerstoffwechsel reguliert. Er spielt eine wichtige Rolle in der Entwicklung von Krankheiten wie Typ-2-Diabetes und Dyslipidämie. Wir etablieren ein Gauß-Prozess-Regressionsmodell für PPAR gamma-Agonisten mit chemischen Deskriptoren und unserem Graphkern durch gleichzeitiges Lernen mehrerer Kerne. Das Screening einer kommerziellen Substanzbibliothek und die anschließende Testung 15 ausgewählter Substanzen in einem Zell-basierten Transaktivierungsassay ergab vier aktive Substanzen. Eine davon, ein Naturstoff mit Cyclobutan-Grundgerüst, ist ein voller selektiver PPAR gamma-Agonist (EC50 = 10 +/- 0,2 muM, inaktiv auf PPAR alpha und PPAR beta/delta bei 10 muM). Unsere Studie liefert einen neuen PPAR gamma-Agonisten, legt den Wirkmechanismus eines bioaktiven Naturstoffs offen, und erlaubt Rückschlüsse auf die Naturstoffursprünge von Pharmakophormustern in synthetischen Liganden

Hochschulschriftenserver - Universität Frankfurt am Main