Search CORE

1,652 research outputs found

Weighted k-Nearest-Neighbor Techniques and Ordinal Classification

Author: Hechenbichler K.
Schliep K.
Publication venue
Publication date: 01/01/2004
Field of study

In the field of statistical discrimination k-nearest neighbor classification is a well-known, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection to LOESS, a local regression technique. In addition we show possibilities to use nearest neighbor for classification in the case of an ordinal class structure. Empirical studies show the advantages of the new techniques

Open Access LMU

Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle

Author: Fan Xiaodan
Liu Jun S.
Pyne Saumyadipta
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 09/11/2010
Field of study

The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed genes. To address the problem, we introduce a Bayesian model to integrate multiple independent microarray data sets from three recent genome-wide cell cycle studies on fission yeast. A hierarchical model was used for data integration. In order to facilitate an efficient Monte Carlo sampling from the joint posterior distribution, we develop a novel Metropolis--Hastings group move. A surprising finding from our integrated analysis is that more than 40% of the genes in fission yeast are significantly periodically expressed, greatly enhancing the reported 10--15% of the genes in the current literature. It calls for a reconsideration of the periodically expressed gene detection problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Random Forest variable importance with missing data

Author: Hapfelmeier Alexander
Hothorn Torsten
Ulm Kurt
Publication venue
Publication date: 15/02/2012
Field of study

Random Forests are commonly applied for data prediction and interpretation. The latter purpose is supported by variable importance measures that rate the relevance of predictors. Yet existing measures can not be computed when data contains missing values. Possible solutions are given by imputation methods, complete case analysis and a newly suggested importance measure. However, it is unknown to what extend these approaches are able to provide a reliable estimate of a variables relevance. An extensive simulation study was performed to investigate this property for a variety of missing data generating processes. Findings and recommendations: Complete case analysis should not be applied as it inappropriately penalized variables that were completely observed. The new importance measure is much more capable to reflect decreased information exclusively for variables with missing values and should therefore be used to evaluate actual data situations. By contrast, multiple imputation allows for an estimation of importances one would potentially observe in complete data situations

Open Access LMU

Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data

Author: A Schwarzenberg-Czerny
A Subramanian
A Tarczynski
A Tarczynski
Andrew Gracey
C Andersson
C Zhou
D Johansson
D Liu
E Glynn
F Hampel
G Bretthorst
H Kim
Harri Lähdesmäki
J Chen
J Scargle
L Tatum
L Zhao
llya Shmulevich
M Ahdesmäki
M Priestley
M Rasile
M Schena
M Schimmel
Miika Ahdesmäki
O Kaleva
Olli Yli-Harja
P Brockwell
P Djurić
P Frick
P Huber
P Huber
P Laguna
P Rousseeuw
P Rousseeuw
P Rousseeuw
P Rousseeuw
R Duda
R Fisher
R Klevecz
R Pearson
R Singh
S Wichert
The MathWorks Inc
U de Lichtenberg
X Lu
Y Benjamini
Y Luan
Y Qi
Publication venue: BioMed Central
Publication date: 01/07/2007
Field of study

Abstract Background In practice many biological time series measurements, including gene microarrays, are conducted at time points that seem to be interesting in the biologist's opinion and not necessarily at fixed time intervals. In many circumstances we are interested in finding targets that are expressed periodically. To tackle the problems of uneven sampling and unknown type of noise in periodicity detection, we propose to use robust regression. Methods The aim of this paper is to develop a general framework for robust periodicity detection and review and rank different approaches by means of simulations. We also show the results for some real measurement data. Results The simulation results clearly show that when the sampling of time series gets more and more uneven, the methods that assume even sampling become unusable. We find that M-estimation provides a good compromise between robustness and computational efficiency. Conclusion Since uneven sampling occurs often in biological measurements, the robust methods developed in this paper are expected to have many uses. The regression based formulation of the periodicity detection problem easily adapts to non-uniform sampling. Using robust regression helps to reject inconsistently behaving data points. Availability The implementations are currently available for Matlab and will be made available for the users of R as well. More information can be found in the web-supplement <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Microarray Analysis in Drug Discovery and Biomarker Identification

Author: Joseph S. Verducci
Yushi Liu
Publication venue: 'IntechOpen'
Publication date: 16/05/2012
Field of study

IntechOpen