53 research outputs found

    Combinatorial k-means clustering as a machine learning tool applied to diabetes mellitus type 2

    Full text link
    A new original procedure based on k-means clustering is designed to find the most appropriate clinical variables able to efficiently separate into groups similar patients diagnosed with diabetes mellitus type 2 (DMT2) and underlying diseases (arterial hypertonia (AH), ischemic heart disease (CHD), diabetic polyneuropathy (DPNP), and diabetic microangiopathy (DMA)). Clustering is a machine learning tool for discovering structures in datasets. Clustering has been proven to be efficient for pattern recognition based on clinical records. The considered combinatorial k-means procedure explores all possible k-means clustering with a determined number of descriptors and groups. The predetermined conditions for the partitioning were as follows: every single group of patients included patients with DMT2 and one of the underlying diseases; each subgroup formed in such a way was subject to partitioning into three patterns (good health status, medium health status, and degenerated health status); optimal descriptors for each disease and groups. The selection of the best clustering is obtained through the parameter called global variance, defined as the sum of all variance values of all clinical variables of all the clusters. The best clinical parameters are found by minimizing this global variance. This methodology has to identify a set of variables that are assumed to separate each underlying disease efficiently in three different subgroups of patients. The hierarchical clustering obtained for these four underlying diseases could be used to build groups of patients with correlated clinical data. The proposed methodology gives surmised results from complex data based on a relationship with the health status of the group and draws a picture of the prediction rate of the ongoing health status

    Calculating the partition coefficients of organic solvents in octanol/water and octanol/air

    Get PDF
    Partition coefficients define how a solute is distributed between two immiscible phases at equilibrium. The experimental estimation of partition coefficients in a complex system can be an expensive, difficult, and time-consuming process. Here a computational strategy to predict the distributions of a set of solutes in two relevant phase equilibria is presented. The octanol/water and octanol/air partition coefficients are predicted for a group of polar solvents using density functional theory (DFT) calculations in combination with a solvation model based on density (SMD) and are in excellent agreement with experimental data. Thus, the use of quantum-chemical calculations to predict partition coefficients from free energies should be a valuable alternative for unknown solvents. The obtained results indicate that the SMD continuum model in conjunction with any of the three DFT functionals (B3LYP, M06-2X, and M11) agrees with the observed experimental values. The ighest correlation to experimental data for the octanol/water partition coefficients was reached by the M11 functional; for the octanol/air partition coefficient, the M06-2X functional yielded the best performance. To the best of our knowledge, this is the first computational approach for the rediction of octanol/air partition coefficients by DFT calculations, which has remarkable accuracy and precision

    Impact of selected parameters of the fermentation process of wine and wine itself on the biogenic amines content: Evaluation by application of chemometric tools

    Get PDF
    The demand for safer foods has promoted more research into biogenic amines (BAs) over the past few years, however, there are still some questions that remain unanswered. Despite the fact that BAs are present in wine and can cause toxic effect to the body, a shared regulation limiting the amounts of BAs in wine is still lacking. A detailed understanding of their presence in wine is also important for the food trade sector. Therefore, the aim of this work was to determine the level of selected BAs in wine samples origin from Poland. Thereafter, the evaluation of correlation between concentration of BAs and selected parameters including pH, alcohol content and fermentation temperature by application of chemometric analysis was carried out. The BAs were determined by application of previously developed SPME-GC–MS methodology characterized by low detection limits ranged from 0.009 μg/L (tyramine) to 0.155 μg/L (histamine). Data obtained in this study show that none of the wine samples surpassed the toxic levels reported for BAs in the literature (the total BAs content was ranged from 7 to 2174 μg/L), therefore, these wines appear to be safe as regards the risk associated with the intake of potentially toxic BAs. Moreover, several correlations between occurrence, concentration of biogenic amines, important factors of winemaking process as well as physico-chemical parameters of wine were indicated. Even though information on BAs is currently not included in wine composition databases, information on their existence, distribution, concentration and knowledge of existing relationships between BAs and other wine parameters is crucial and may be useful for the food industry, health professionals and consumers

    Searching for solvents with an increased carbon dioxide solubility using multivariate statistics

    Get PDF
    Ionic liquids (ILs) are used in various fields of chemistry. One of them is CO2 capture, a process that is quite well described. The solubility of CO2 in ILs can be used as a model to investigate gas absorption processes. The aim is to find the relationships between the solubility of CO2 and other variables—physicochemical properties and parameters related to greenness. In this study, 12 variables are used to describe a dataset consisting of 26 ILs and 16 molecular solvents. We used a cluster analysis, a principal component analysis, and a K-means hierarchical clustering to find the patterns in the dataset and the discriminators between the clusters of compounds. The results showed that ILs and molecular solvents form two well-separated groups, and the variables were well separated into greenness-related and physicochemical properties. Such patterns suggest that the modeling of greenness properties and of the solubility of CO2 on physicochemical properties can be difficult

    Multivariate analysis for the classification of copper lead and copper zinc glasses

    Get PDF
    The similarity patterns in the physicochemical properties of copper-lead and copper-zinc borate glasses were identified by means of finding similarity within the objects of study using multivariate statistical analysis. As exploratory methods of multivariate analysis, cluster analysis, principal components analysis, and two-way clustering were applied for a set of copper-lead and copper-zinc borate glasses. Specific correlations among the physicochemical properties of copper glasses were interpreted. In particular, the effect of Pb and Zn doping metal ion in copper glasses in the structural and mechanical properties is identified. Interestingly, the degree of lead content determines two kinds of glasses with specific physicochemical properties

    Applying discriminant and cluster analysis to separate allergenic from non-allergenic proteins

    Get PDF
    As a result of increased healthcare requirements and the introduction of genetically modified foods, the problem of allergies is becoming a growing health problem. The concept of allergies has prompted the use of new methods such as genomics and proteomics to uncover the nature of allergies. In the present study, a selection of 1400 food proteins was analysed by PLS-DA (Partial Least Square-based Discriminant Analysis) after suitable transformation of structural parameters into uniform vectors. Then, the resulting strings of different length were converted into vectors with equal length by Auto and Cross-Covariance (ACC) analysis. Hierarchical and non-hierarchical (K-means) Cluster Analysis (CA) was also performed in order to reach a certain level of separation within a small training set of plant proteins (16 allergenic and 16 non-allergenic) using a new three-dimensional descriptor based on surface protein properties in combination with amino acid hydrophobicity scales. The novelty of the approach in protein differentiation into allergenic and non-allergenic classes is described in the article. The general goal of the present study was to show the effectiveness of a traditional chemometric method for classification (PLS-DA) and the options of Cluster Analysis (CA) to separate by multivariate statistical methods allergenic from non-allergenic proteins

    Boron oxide glasses and nanocomposites: synthetic, structural and statistical approach

    Get PDF
    Three different precursors of boron-aqua and glycerol solutions of boric acid and ethanol solution of trimethyl borate were used for the preparation of organic-inorganic advanced materials. The films and bulk materials samples were heat treated at 100, 400, 800 °C for 2 h. The hybrid samples were stable and transparent until 100 °C. The further increase of temperature to 400 °C led to destruction of samples, and at 800 °C they were molten. The structural changes during the pyrolysis were studied by Fourier transform infrared spectroscopy, differential thermal analysis, and X-ray diffraction. Details of surface morphology were observed by scanning electron microscopy. The obtained BO 3 and BO 4 groups were identified in the molten materials after pyrolysis. The quantities and order of borate structural units as well as residual carbon in the networks depended on boron precursor type. PVA/PEG/B 2 O 3 hybrid materials were proved to be appropriate precursors for synthesizing borate and carboborate glass and carbon/borate glass anocomposites. To access the impact of the experimental conditions on the structural changes of the nanocomposites, cluster analysis of the IR-spectral data was used as a classification method

    Vibrational Analysis of Manganese(II) Oxalates Hydrates: An In Silico Statistical Approach

    Get PDF
    The experimental and computational vibrational study for three different manganese(II) oxalates hydrates was explored. The elucidation of IR and Raman spectra were discussed based on their structural singularity; in the same way, they establish some interesting relations between them in the field of computational and statistical approaches. The density functional theory (DFT) computational approach was conducted for accurate prediction and interpretation of the intermolecular effects based on experimental and calculated IR and Raman spectra in the solid-state data in combination with multivariate statistical technique. The proposed computational scheme was also explored for the case of the isolated-molecule model. The goals of the study were to access the accuracy of the proposed procedure for solid-state calculations along with electron calculations for the isolated molecules and to reveal the similarities within the groups of objects by the cluster analysis (CA) techniques and two-way CA for the data. The presented simulation procedure should be very valuable for exploring and to classify other oxalate compounds

    Advanced spectrophotometric chemometric methods for resolving the binary mixture of doxylamine succinate and pyridoxine hydrochloride

    Get PDF
    The prediction power of partial least squares (PLS) and multivariate curve resolution-alternating least squares (MCR-ALS) methods have been studied for simultaneous quantitative analysis of the binary drug combination – doxylamine succinate and pyridoxine hydrochloride. Analysis of first-order UV overlapped spectra was performed using different PLS models – classical PLS1 and PLS2 as well as partial robust M-regression (PRM). These linear models were compared to MCR-ALS with equality and correlation constraints (MCR-ALS-CC). All techniques operated within the full spectral region and extracted maximum information for the drugs analysed. The developed chemometric methods were validated on external sample sets and were applied to the analyses of pharmaceutical formulations. The obtained statistical parameters were satisfactory for calibration and validation sets. All developed methods can be successfully applied for simultaneous spectrophotometric determination of doxylamine and pyridoxine both in laboratory-prepared mixtures and commercial dosage forms

    Diabetes mellitus type 2: Exploratory data analysis based on clinical reading

    Get PDF
    Diabetes mellitus type 2 (DMT2) is a severe and complex health problem. It is the most common type of diabetes. DMT2 is a chronic metabolic disorder that affects the way your body metabolizes sugar. With DMT2, your body either resists the effects of insulin or does not produce sufficient insulin to continue normal glucose levels. DMT2 is a disease that requires a multifactorial approach of controlling that includes lifestyle change and pharmacotherapy. Less than ideal management increases the risk of developing complications and comorbidities such as cardiovascular disease and numerous social and economic penalties. That is why the studies dedicated to the pathophysiological mechanisms and the treatment of DMT2 are extremely numerous and diverse. In this study, exploratory data analysis approaches are applied for the treatment of clinical and anthropometric readings of patients with DMT2. Since multivariate statistics is a well-known method for classification, modeling and interpretation of large collections of data, the major aim of the present study was to reveal latent relations between the objects of the investigation (group of patients and control group) and the variables describing the objects (clinical and anthropometric parameters). In the proposed method by the application of hierarchical cluster analysis and principal component analysis it is possible to identify reduced number of parameters which appear to be the most significant discriminant parameters to distinguish between four patterns of patients with DMT2. However, there is still lack of multivariate statistical studies using DMT2 data sets to assess different aspects of the problem like optimal rapid monitoring of the patients or specific separation of patients into patterns of similarity related to their health status which could be of help in preparation of data bases for DMT2 patients. The outcome from the study could be of custom for the selection of significant tests for rapid monitoring of patients and more detailed approach to the health status of DMT2 patients
    • …
    corecore