13 research outputs found

    Improved probabilistic distance based locality preserving projections method to reduce dimensionality in large datasets

    Get PDF
    In this paper, a dimensionality reduction is achieved in large datasets using the proposed distance based Non-integer Matrix Factorization (NMF) technique, which is intended to solve the data dimensionality problem. Here, NMF and distance measurement aim to resolve the non-orthogonality problem due to increased dataset dimensionality. It initially partitions the datasets, organizes them into a defined geometric structure and it avoids capturing the dataset structure through a distance based similarity measurement. The proposed method is designed to fit the dynamic datasets and it includes the intrinsic structure using data geometry. Therefore, the complexity of data is further avoided using an Improved Distance based Locality Preserving Projection. The proposed method is evaluated against existing methods in terms of accuracy, average accuracy, mutual information and average mutual information

    Path Clustering: Grouping in a Efficient Way Complex Data Distributions

    Get PDF
    This work proposes an algorithm that uses paths based on tile segmentation to build complex clusters. After allocating data items (points) to geometric shapes in tile format, the complexity of our algorithm is related to the number of tiles instead of the number of points. The main novelty is the way our algorithm goes through the grids, saving time and providing good results. It does not demand any configuration parameters from users, making easier to use than other strategies. Besides, the algorithm does not create overlapping clusters, which simplifies the interpretation of results

    Machine-Learning Analysis of Serum Proteomics in Neuropathic Pain after Nerve Injury in Breast Cancer Surgery Points at Chemokine Signaling via SIRT2 Regulation

    Get PDF
    Background: Persistent postsurgical neuropathic pain (PPSNP) can occur after intraoperative damage to somatosensory nerves, with a prevalence of 29–57% in breast cancer surgery. Proteomics is an active research field in neuropathic pain and the first results support its utility for establishing diagnoses or finding therapy strategies. Methods: 57 women (30 non-PPSNP/27 PPSNP) who had experienced a surgeon-verified intercostobrachial nerve injury during breast cancer surgery, were examined for patterns in 74 serum proteomic markers that allowed discrimination between subgroups with or without PPSNP. Serum samples were obtained both before and after surgery. Results: Unsupervised data analyses, including principal component analysis and self-organizing maps of artificial neurons, revealed patterns that supported a data structure consistent with pain-related subgroup (non-PPSPN vs. PPSNP) separation. Subsequent supervised machine learning-based analyses revealed 19 proteins (CD244, SIRT2, CCL28, CXCL9, CCL20, CCL3, IL.10RA, MCP.1, TRAIL, CCL25, IL10, uPA, CCL4, DNER, STAMPB, CCL23, CST5, CCL11, FGF.23) that were informative for subgroup separation. In cross-validated training and testing of six different machine-learned algorithms, subgroup assignment was significantly better than chance, whereas this was not possible when training the algorithms with randomly permuted data or with the protein markers not selected. In particular, sirtuin 2 emerged as a key protein, presenting both before and after breast cancer treatments in the PPSNP compared with the non-PPSNP subgroup. Conclusions: The identified proteins play important roles in immune processes such as cell migration, chemotaxis, and cytokine-signaling. They also have considerable overlap with currently known targets of approved or investigational drugs. Taken together, several lines of unsupervised and supervised analyses pointed to structures in serum proteomics data, obtained before and after breast cancer surgery, that relate to neuroinflammatory processes associated with the development of neuropathic pain after an intraoperative nerve lesion

    Machine-Learning Analysis of Serum Proteomics in Neuropathic Pain after Nerve Injury in Breast Cancer Surgery Points at Chemokine Signaling via SIRT2 Regulation

    Get PDF
    Background: Persistent postsurgical neuropathic pain (PPSNP) can occur after intraoperative damage to somatosensory nerves, with a prevalence of 29–57% in breast cancer surgery. Proteomics is an active research field in neuropathic pain and the first results support its utility for establishing diagnoses or finding therapy strategies. Methods: 57 women (30 non-PPSNP/27 PPSNP) who had experienced a surgeon-verified intercostobrachial nerve injury during breast cancer surgery, were examined for patterns in 74 serum proteomic markers that allowed discrimination between subgroups with or without PPSNP. Serum samples were obtained both before and after surgery. Results: Unsupervised data analyses, including principal component analysis and self-organizing maps of artificial neurons, revealed patterns that supported a data structure consistent with pain-related subgroup (non-PPSPN vs. PPSNP) separation. Subsequent supervised machine learning-based analyses revealed 19 proteins (CD244, SIRT2, CCL28, CXCL9, CCL20, CCL3, IL.10RA, MCP.1, TRAIL, CCL25, IL10, uPA, CCL4, DNER, STAMPB, CCL23, CST5, CCL11, FGF.23) that were informative for subgroup separation. In cross-validated training and testing of six different machine-learned algorithms, subgroup assignment was significantly better than chance, whereas this was not possible when training the algorithms with randomly permuted data or with the protein markers not selected. In particular, sirtuin 2 emerged as a key protein, presenting both before and after breast cancer treatments in the PPSNP compared with the non-PPSNP subgroup. Conclusions: The identified proteins play important roles in immune processes such as cell migration, chemotaxis, and cytokine-signaling. They also have considerable overlap with currently known targets of approved or investigational drugs. Taken together, several lines of unsupervised and supervised analyses pointed to structures in serum proteomics data, obtained before and after breast cancer surgery, that relate to neuroinflammatory processes associated with the development of neuropathic pain after an intraoperative nerve lesion

    A Data Science-Based Analysis Points at Distinct Patterns of Lipid Mediator Plasma Concentrations in Patients With Dementia

    Get PDF
    Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers

    A Data Science-Based Analysis Points at Distinct Patterns of Lipid Mediator Plasma Concentrations in Patients With Dementia

    Get PDF
    Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers

    Ocular, Neural, and Cellular Biodistribution of Multifunctional Antioxidants

    Get PDF
    Aging is a complex biological process which stems from a growing imbalance between the regenerative capacity of an organism and endogenous as well as exogenous damaging factors. This imbalance leads to the slow deterioration of individual cells, organs, and eventually the entire organism. The free radical theory of aging combines the evolutionary and mechanistic aspects of aging, postulating that the innate process is caused by deleterious, irreversible, and inevitable changes in biological systems caused by oxidative damage that accumulates over the lifespan. Evidence of this phenomenon is supported by the pathogenesis of age-related diseases, such as age-related macular degeneration and Alzheimer’s disease, which show that there is an age-related decrease of cellular antioxidant defenses. This results in the dyshomeostasis of redox-active metals, such as iron, copper, and zinc, and in turn exacerbates the oxidative stress induced by reactive oxygen species and free radicals such as superoxide, hydrogen peroxide, and the hydroxyl radical. Our laboratory has developed two series of multifunctional antioxidants (MFAOs), the JHX and HK series, which can simultaneously chelate biologically active transition metals and scavenge free radicals. These orally-active compounds have demonstrated therapeutic effects against age-related eye diseases, such as cataract and macular degeneration. Despite their efficacy, little is known about the ocular biodistribution of these orally-administered molecules. I have conducted a biodistribution study of 24 such molecules. These included the MFAOs, their monofunctional free radical scavenging (FRS) and biologically active transition metal chelating (CHL) analogs, as well as their nonfunctional (NF) analogs in Sprague Dawley rats. In Chapter Two, I demonstrate that all compounds can be detected unmetabolized in the cornea, iris with the ciliary body, lens, neural retina, retinal pigmented epithelium with the choroid, brain, sciatic nerve, kidney, and liver. In Chapter Three, I describe the predictive models of ocular, neural, and visceral tissue distribution, which I developed based on the biodistribution data from Chapter Two, using hierarchical cluster analysis (HCA) and quantitative structure activity relationship analysis (QSAR). The results indicated that both HCA and QSAR analysis yielded many predictive models which agree with other reported trends of drug delivery to ocular, neural, and visceral tissues. In Chapter Four, I present my investigation into the potential pharmacological chaperone activity of two oxysterols, lanosterol and 25-hydroxycholesterol, to three model αB-crystallin chaperone proteins in silico and compare their binding against the MFAOs. Our results confirm that the oxysterols fail to meet the predictive binding threshold, indicating weak binding affinity to the model αB-crystallin proteins. However, their predicted Kd values matched experimentally reported values. The MFAOs exceeded the threshold for predictive binding and support previous in vivo studies which suggest our molecules may have some chaperone activity. Finally, in Chapter Five, I will present several synthetic approaches for the preparation of various novel triphenylphosphonium-linked (TPP) JHXseries compounds. I will also discuss their in vitro evaluation in HEI-OC1 inner ear cells. Since mitochondrial dysfunction is linked to neurodegeneration, we hypothesized that directly linking a mitochondria-targeting moiety to our compounds would increase their potency by quenching free radicals at their main generation source. Our results indicate that the TPP compounds do not adversely affect mitochondria as shown using a viability assay and Rhodamine-123 fluorescence stain

    Projection-Based Clustering through Self-Organization and Swarm Intelligence

    Get PDF
    It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures. The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining

    Analysis of the trajectory of socio-economic development of regions of the Russian Federation with the use of Machine Learning Methods

    Full text link
    Актуальность и важность изучения явления дифференциации населения по доходам во многом определяются его связью с уровнем экономического развития территории. Итак, с одной стороны, основой рассматриваемой дифференциации является процесс распределения совокупного дохода (валовой добавленной стоимости) между отдельными домохозяйствами; с другой стороны, динамика экономического развития страны в целом и отдельных ее регионов во многом определяется эффективностью этого распределения, в том числе субъективными ощущениями населения относительно его (распределения) справедливости. Цель работы основана на разработке подхода, анализирующего пространственную дифференциацию доходов населения с использованием методов машинного обучения. Объектом работы является траектория социально-экономического развития регионов. Предметом является применение методов машинного обучения для анализа пространственной дифференциации доходов населения России. Научная новизна заключается в разработке методики анализа пространственной дифференциации доходов населения и оценки ее влияния на экономическое развитие регионов. Практическая значимость работы заключается в том, что данная работа позволяет сформулировать характеристики социально-экономического развития групп регионов, на основе которых формируются их стратегии развития и инвестиционная политика в соответствующих сферах жизни субъектов. Российской Федерации. Разработанная методология кластерного анализа позволяет формировать устойчивые региональные кластеры по уровню социально-экономического развития субъектов Российской Федерации. Проведенная кластеризация с учетом степени дифференциации доходов регионов может быть использована при реализации кластерно-ориентированной государственной политики по поддержке опережающего развития субъектов.The relevance and importance of the study of the phenomenon of population differentiation by income are largely determined by its relationship with the level of economic development of the territory. So, on the one hand, the basis of the differentiation under consideration is the process of distribution of total income (gross value added) between individual households; on the other hand, the dynamics of economic development of the country as a whole and its individual regions are largely determined by the effectiveness of this distribution, including the subjective feelings of the population regarding its (distribution) fairness. The aim of the work is based on the development of an approach that analyzes spatial differentiation of incomes of the population with the use of machine learning methods. The object of the work is the trajectory of socio-economic development of regions. The subject is the application of machine learning methods to analyze the spatial differentiation of incomes of the Russian population. The scientific novelty lies in the development of a methodology for analyzing the spatial differentiation of incomes of the population and assessing its impact on the economic development of regions. The practical significance of the work lies in the fact that this work allows us to formulate the characteristics of the socio-economic development of groups of regions, based on which their development strategies and investment policy are formed in the relevant spheres of life of the subjects of the Russian Federation. The developed methodology of cluster analysis makes it possible to form stable regional clusters according to the socio-economic development of the subjects of the Russian Federation. The performed clustering, considering the degree of income differentiation of the regions, can be used in the implementation of cluster-oriented state policy to support the accelerated development of the subjects
    corecore