48,680 research outputs found
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
Supporting Data mining of large databases by visual feedback queries
In this paper, we describe a query system that provides visual relevance feedback in querying large databases. Our goal is to support the process of data mining by representing as many data items as possible on the display. By arranging and coloring the data items as pixels according to their relevance for the query, the user gets a visual impression of the resulting data set. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. Furthermore, by using multiple windows for different parts of a complex query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. Our system allows to represent the largest amount of data that can be visualized on current display technology, provides valuable feedback in querying the database, and allows the user to find results which, otherwise, would remain hidden in the database
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
Improvements on coronal hole detection in SDO/AIA images using supervised classification
We demonstrate the use of machine learning algorithms in combination with
segmentation techniques in order to distinguish coronal holes and filaments in
SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques
(intensity-based thresholding, SPoCA), we prepared data sets of manually
labeled coronal hole and filament channel regions present on the Sun during the
time range 2011 - 2013. By mapping the extracted regions from EUV observations
onto HMI line-of-sight magnetograms we also include their magnetic
characteristics. We computed shape measures from the segmented binary maps as
well as first order and second order texture statistics from the segmented
regions in the EUV images and magnetograms. These attributes were used for data
mining investigations to identify the most performant rule to differentiate
between coronal holes and filament channels. We applied several classifiers,
namely Support Vector Machine, Linear Support Vector Machine, Decision Tree,
and Random Forest and found that all classification rules achieve good results
in general, with linear SVM providing the best performances (with a true skill
statistic of ~0.90). Additional information from magnetic field data
systematically improves the performance across all four classifiers for the
SPoCA detection. Since the calculation is inexpensive in computing time, this
approach is well suited for applications on real-time data. This study
demonstrates how a machine learning approach may help improve upon an
unsupervised feature extraction method.Comment: in press for SWS
- âŠ