493 research outputs found
Data compression and regression based on local principal curves.
Frequently the predictor space of a multivariate regression problem of the type y = m(x_1, …, x_p ) + ε is intrinsically one-dimensional, or at least of far lower dimension than p. Usual modeling attempts such as the additive model y = m_1(x_1) + … + m_p (x_p ) + ε, which try to reduce the complexity of the regression problem by making additional structural assumptions, are then inefficient as they ignore the inherent structure of the predictor space and involve complicated model and variable selection stages. In a fundamentally different approach, one may consider first approximating the predictor space by a (usually nonlinear) curve passing through it, and then regressing the response only against the one-dimensional projections onto this curve. This entails the reduction from a p- to a one-dimensional regression problem.
As a tool for the compression of the predictor space we apply local principal curves. Taking things on from the results presented in Einbeck et al. (Classification – The Ubiquitous Challenge. Springer, Heidelberg, 2005, pp. 256–263), we show how local principal curves can be parametrized and how the projections are obtained. The regression step can then be carried out using any nonparametric smoother. We illustrate the technique using data from the physical sciences
Recommended from our members
The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect
hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design
and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications.
Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical
compounds.
Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets
Prediction of persistent post-surgery pain by preoperative cold pain sensitivity : biomarker development with machine-learning-derived analysis
Background. To prevent persistent post-surgery pain, early identification of patients at high risk is a clinical need. Supervised machine-learning techniques were used to test how accurately the patients' performance in a preoperatively performed tonic cold pain test could predict persistent post-surgery pain. Methods. We analysed 763 patients from a cohort of 900 women who were treated for breast cancer, of whom 61 patients had developed signs of persistent pain during three yr of follow-up. Preoperatively, all patients underwent a cold pain test (immersion of the hand into a water bath at 2-4 degrees C). The patients rated the pain intensity using a numerical ratings scale (NRS) from 0 to 10. Supervised machine-learning techniques were used to construct a classifier that could predict patients at risk of persistent pain. Results. Whether or not a patient rated the pain intensity at NRS=10 within less than 45 s during the cold water immersion test provided a negative predictive value of 94.4% to assign a patient to the "persistent pain" group. If NRS=10 was never reached during the cold test, the predictive value for not developing persistent pain was almost 97%. However, a low negative predictive value of 10% implied a high false positive rate. Conclusion. Results provide a robust exclusion of persistent pain in women with an accuracy of 94.4%. Moreover, results provide further support for the hypothesis that the endogenous pain inhibitory system may play an important role in the process of pain becoming persistent.Peer reviewe
Characterization of clastic sedimentary enviroments by clustering algorithm and several statistical approaches — case study, Sava Depression in Northern Croatia
Abstract
This study demonstrates a method to identify and characterize some facies of turbiditic depositional environments. The study area is a hydrocarbon field in the Sava Depression (Northern Croatia). Its Upper Miocene reservoirs have been proved to represent a lacustrine turbidite system. In the workflow, first an unsupervised neural network was applied as clustering method for two sandstone reservoirs. The elements of the input vectors were the basic petrophysical parameters. In the second step autocorrelation surfaces were used to reveal the hidden anisotropy of the grid. This anisotropy is supposed to identify the main continuity directions in the geometrical analyses of sandstone bodies. Finally, in the description of clusters several parametric and nonparametric statistics were used to characterize the identified facies. Obtained results correspond to the previously published interpretation of those reservoir facies
Exchange flow between open water and floating vegetation
This study describes the exchange flow between a region with open water and a region with a partial-depth porous obstruction, which represents the thermally-driven exchange that occurs between open water and floating vegetation. The partial-depth porous obstruction represents the root layer, which does not penetrate to the bed. Initially, a vertical wall separates the two regions, with fluid of higher density in the obstructed region and fluid of lower density in the open region. This density difference represents the influence of differential solar heating due to shading by the vegetation. For a range of root density and root depths, the velocity distribution is measured in the lab using PIV. When the vertical wall is removed, the less dense water flows into the obstructed region at the surface. This surface flow bifurcates into two layers, one flowing directly through the root layer and one flowing beneath the root layer. A flow directed out of the vegetated region occurs at the bed. A model is developed that predicts the flow rates within each layer based on energy considerations. The experiments and model together suggest that at time- and length-scales relevant to the field, the flow structure for any root layer porosity approaches that of a fully blocked layer, for which the exchange flow occurs only beneath the root layer.National Science Foundation (U.S.) (grant EAR0509658
Hebbian STDP in mushroom bodies facilitates the synchronous flow of olfactory information in locusts
Odour representations in insects undergo progressive transformations and decorrelatio from the receptor array to the presumed site of odour learning, the mushroom body. There, odours are represented by sparse assemblies of Kenyon cells in a large population. Using intracellular recordings in vivo, we examined transmission and plasticity at the synapse made by Kenyon cells onto downstream targets in locusts. We find that these individual synapses are excitatory and undergo hebbian spike-timing dependent plasticity (STDP) on a ±25 ms timescale. When placed in the context of odour-evoked Kenyon cell activity (a 20-Hz oscillatory population discharge), this form of STDP enhances the synchronization of the Kenyon cells’ targets and thus helps preserve the propagation of the odour-specific codes through the olfactory system
Data compression and regression based on local principal curves
Frequently the predictor space of a multivariate regression problem of the type y = m(x_1, …, x_p ) + ε is intrinsically one-dimensional, or at least of far lower dimension than p. Usual modeling attempts such as the additive model y = m_1(x_1) + … + m_p (x_p ) + ε, which try to reduce the complexity of the regression problem by making additional structural assumptions, are then inefficient as they ignore the inherent structure of the predictor space and involve complicated model and variable selection stages. In a fundamentally different approach, one may consider first approximating the predictor space by a (usually nonlinear) curve passing through it, and then regressing the response only against the one-dimensional projections onto this curve. This entails the reduction from a p- to a one-dimensional regression problem. As a tool for the compression of the predictor space we apply local principal curves. Taking things on from the results presented in Einbeck et al. (Classification – The Ubiquitous Challenge. Springer, Heidelberg, 2005, pp. 256–263), we show how local principal curves can be parametrized and how the projections are obtained. The regression step can then be carried out using any nonparametric smoother. We illustrate the technique using data from the physical sciences
A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes
The physiology of hibernation among painted turtles: the midland painted turtle (Chrysemys picta marginata
Abstract Midland painted turtles from Michigan were submerged at 3°C in normoxic and anoxic water. In predive, and in turtles submerged for up to 150 days, plasma PO 2 , PCO 2 , pH, [ , total Mg, total Ca, lactate, glucose, and osmolality were measured; hematocrit and mass were determined, and plasma [HCO 3 − ] was calculated. Anoxic turtles developed a severe metabolic acidosis, accumulating lactate from a predive value of 4.4 mmol/L to a 150-day value of 185 mmol/L, associated with a fall in pH from 7.983 to 7.189. To buffer lactate increase, total calcium and magnesium rose from 3.7 and 2.6 to 58.9 and 11.8 mmol/L, respectively. Plasma [HCO 3 − ] was titrated from 39.2 to 4.8 mmol/L in anoxic turtles. Turtles in normoxic water had only minor disturbances of their acid -base and ionic statuses, associated with a much smaller increase of lactate to 23 mmol/L; there was a marked increase in hematocrit from 29.1% to 42.1%. We suggest that it is ecologic, rather than phylogenetic, relationships that determine the responses of painted turtles to prolonged submergence associated with hibernation
- …
