73 research outputs found
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
A Review of Classification Problems and Algorithms in Renewable Energy Applications
Classification problems and their corresponding solving approaches constitute one of the
fields of machine learning. The application of classification schemes in Renewable Energy (RE) has
gained significant attention in the last few years, contributing to the deployment, management and
optimization of RE systems. The main objective of this paper is to review the most important
classification algorithms applied to RE problems, including both classical and novel algorithms.
The paper also provides a comprehensive literature review and discussion on different classification
techniques in specific RE problems, including wind speed/power prediction, fault diagnosis in
RE systems, power quality disturbance classification and other applications in alternative RE systems.
In this way, the paper describes classification techniques and metrics applied to RE problems,
thus being useful both for researchers dealing with this kind of problem and for practitioners
of the field
Gene selection for classification of microarray data based on the Bayes error
<p>Abstract</p> <p>Background</p> <p>With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy.</p> <p>Results</p> <p>In this study, we propose a new method, Based Bayes error Filter (BBF), to select relevant genes and remove redundant genes in classification analyses of microarray data. The effectiveness and accuracy of this method is demonstrated through analyses of five publicly available microarray datasets. The results show that our gene selection method is capable of achieving better accuracies than previous studies, while being able to effectively select relevant genes, remove redundant genes and obtain efficient and small gene sets for sample classification purposes.</p> <p>Conclusion</p> <p>The proposed method can effectively identify a compact set of genes with high classification accuracy. This study also indicates that application of the Bayes error is a feasible and effective wayfor removing redundant genes in gene selection.</p
A support vector machine hydrometeor classification algorithm for dual-polarization radar
An algorithm based on a support vector machine (SVM) is proposed for hydrometeor classification. The training phase is driven by the output of a fuzzy logic hydrometeor classification algorithm, i.e., the most popular approach for hydrometer classification algorithms used for ground-based weather radar. The performance of SVM is evaluated by resorting to a weather scenario, generated by a weather model; the corresponding radar measurements are obtained by simulation and by comparing results of SVM classification with those obtained by a fuzzy logic classifier. Results based on the weather model and simulations show a higher accuracy of the SVM classification. Objective comparison of the two classifiers applied to real radar data shows that SVM classification maps are spatially more homogenous (textural indices, energy, and homogeneity increases by 21% and 12% respectively) and do not present non-classified data. The improvements found by SVM classifier, even though it is applied pixel-by-pixel, can be attributed to its ability to learn from the entire hyperspace of radar measurements and to the accurate training. The reliability of results and higher computing performance make SVM attractive for some challenging tasks such as its implementation in Decision Support Systems for helping pilots to make optimal decisions about changes in the flight route caused by unexpected adverse weather
Shape Retrieval Methods for Architectural 3D Models
This thesis introduces new methods for content-based retrieval of architecture-related 3D models. We thereby consider two different overall types of architectural 3D models. The first type consists of context objects that are used for detailed design and decoration of 3D building model drafts. This includes e.g. furnishing for interior design or barriers and fences for forming the exterior environment. The second type consists of actual building models. To enable efficient content-based retrieval for both model types that is tailored to the user requirements of the architectural domain, type-specific algorithms must be developed. On the one hand, context objects like furnishing that provide similar functions (e.g. seating furniture) often share a similar shape. Nevertheless they might be considered to belong to different object classes from an architectural point of view (e.g. armchair, elbow chair, swivel chair). The differentiation is due to small geometric details and is sometimes only obvious to an expert from the domain. Building models on the other hand are often distinguished according to the underlying floor- and room plans. Topological floor plan properties for example serve as a starting point for telling apart residential and commercial buildings. The first contribution of this thesis is a new meta descriptor for 3D retrieval that combines different types of local shape descriptors using a supervised learning approach. The approach enables the differentiation of object classes according to small geometric details and at the same time integrates expert knowledge from the field of architecture. We evaluate our approach using a database containing arbitrary 3D models as well as on one that only consists of models from the architectural domain. We then further extend our approach by adding a sophisticated shape descriptor localization strategy. Additionally, we exploit knowledge about the spatial relationship of object components to further enhance the retrieval performance. In the second part of the thesis we introduce attributed room connectivity graphs (RCGs) as a means to characterize a 3D building model according to the structure of its underlying floor plans. We first describe how RCGs are inferred from a given building model and discuss how substructures of this graph can be queried efficiently. We then introduce a new descriptor denoted as Bag-of-Attributed-Subgraphs that transforms attributed graphs into a vector-based representation using subgraph embeddings. We finally evaluate the retrieval performance of this new method on a database consisting of building models with different floor plan types. All methods presented in this thesis are aimed at an as automated as possible workflow for indexing and retrieval such that only minimum human interaction is required. Accordingly, only polygon soups are required as inputs which do not need to be manually repaired or structured. Human effort is only needed for offline groundtruth generation to enable supervised learning and for providing information about the orientation of building models and the unit of measurement used for modeling
- …