4,455 research outputs found
MissForest - nonparametric missing value imputation for mixed-type data
Modern data acquisition based on high-throughput technology is often facing
the problem of missing data. Algorithms commonly used in the analysis of such
large-scale data often depend on a complete set. Missing value imputation
offers a solution to this problem. However, the majority of available
imputation methods are restricted to one type of variable only: continuous or
categorical. For mixed-type data the different types are usually handled
separately. Therefore, these methods ignore possible relations between variable
types. We propose a nonparametric method which can cope with different types of
variables simultaneously. We compare several state of the art methods for the
imputation of missing values. We propose and evaluate an iterative imputation
method (missForest) based on a random forest. By averaging over many unpruned
classification or regression trees random forest intrinsically constitutes a
multiple imputation scheme. Using the built-in out-of-bag error estimates of
random forest we are able to estimate the imputation error without the need of
a test set. Evaluation is performed on multiple data sets coming from a diverse
selection of biological fields with artificially introduced missing values
ranging from 10% to 30%. We show that missForest can successfully handle
missing values, particularly in data sets including different types of
variables. In our comparative study missForest outperforms other methods of
imputation especially in data settings where complex interactions and nonlinear
relations are suspected. The out-of-bag imputation error estimates of
missForest prove to be adequate in all settings. Additionally, missForest
exhibits attractive computational efficiency and can cope with high-dimensional
data.Comment: Submitted to Oxford Journal's Bioinformatics on 3rd of May 201
Automated extraction of knowledge for model-based diagnostics
The concept of accessing computer aided design (CAD) design databases and extracting a process model automatically is investigated as a possible source for the generation of knowledge bases for model-based reasoning systems. The resulting system, referred to as automated knowledge generation (AKG), uses an object-oriented programming structure and constraint techniques as well as internal database of component descriptions to generate a frame-based structure that describes the model. The procedure has been designed to be general enough to be easily coupled to CAD systems that feature a database capable of providing label and connectivity data from the drawn system. The AKG system is capable of defining knowledge bases in formats required by various model-based reasoning tools
Gorceixia decurrens (Compositae: Vernonieae): nova espécie para o estado da Bahia, Brasil
Recent fieldwork and collections have added a new genus to the Compositae flora of Bahia State, Brazil. Gorceixia decurrens is newly recorded for the State from Caatinga woodland along the lower part of the Estrada Real, in the municipality of Rio de Contas. A full description is provided, its distribution and conservation status discussed; likely affinities in the Vernonieae are discussed with the conclusion that it belongs to the subtribe Piptocarphinae.Trabalhos de campo e coletas recentes na Bahia, Brasil, permitiram adicionar mais um gênero para a flora do Estado. Gorceixia decurrens foi coletada em 2001, em área de Caatinga Arbórea, na base da Estrada Real no município a Rio de Contas. É fornecida uma completa descrição da espécie e discussão sobre sua distribuição e estado de conservação. Também, é apresentada discussão do posicionamento do gênero monotípico nas Vernonieae concluindo-se que o mesmo pertence à subtribo Piptocarphinae
High angular resolution mm- and submm-observations of dense molecular gas in M82
Researchers observed CO(7-6), CO(3-2), HCN(3-2) and HCO+(3-2) line emission toward the starburst nucleus of M82 and have obtained an upper limit to H13CN(3-2). These are the first observations of the CO(7-6), HCN(3-2) and HCO+(3-2) lines in any extragalactic source. Researchers took the CO(7-6) spectrum in January 1988 at the Infrared Telescope Facility (IRTF) with the Max Planck Institute for Extraterrestrial Physics/Univ. of California, Berkeley 800 GHz Heterodyne Receiver. In March 1989 researchers used the Institute for Radio Astronomy in the Millimeter range (IRAM) 30 m telescope to observe the CO(3-2) line with the new MPE 350 GHz Superconductor Insulator Superconductor (SIS) receiver and the HCN(3-2) and HCO+(3-2) lines with the (IRAM) 230 GHz SIS receiver (beam 12" FWHM, Blundell et al. 1988). The observational parameters are summarized
The validity and reliability of 1-Hz and 5-Hz global positioning systems for linear, multidirectional, and soccer-specific activities
- …