7,471 research outputs found
Examining Variations of Prominent Features in Genre Classification.
This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. In this paper we use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification.
Inexpensive fusion methods for enhancing feature detection
Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere
Machine Learning Classification of SDSS Transient Survey Images
We show that multiple machine learning algorithms can match human performance
in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS)
supernova survey into real objects and artefacts. This is a first step in any
transient science pipeline and is currently still done by humans, but future
surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate
fully machine-enabled solutions. Using features trained from eigenimage
analysis (principal component analysis, PCA) of single-epoch g, r and
i-difference images, we can reach a completeness (recall) of 96 per cent, while
only incorrectly classifying at most 18 per cent of artefacts as real objects,
corresponding to a precision (purity) of 84 per cent. In general, random
forests performed best, followed by the k-nearest neighbour and the SkyNet
artificial neural net algorithms, compared to other methods such as na\"ive
Bayes and kernel support vector machine. Our results show that PCA-based
machine learning can match human success levels and can naturally be extended
by including multiple epochs of data, transient colours and host galaxy
information which should allow for significant further improvements, especially
at low signal-to-noise.Comment: 14 pages, 8 figures. In this version extremely minor adjustments to
the paper were made - e.g. Figure 5 is now easier to view in greyscal
Evaluation of Statistical Features for Medical Image Retrieval
In this paper we present a complete system allowing the classification of medical images in order to detect possible diseases present in them. The proposed method is developed in two distinct stages: calculation of descriptors and their classification. In the first stage we compute a vector of thirty-three statistical features: seven are related to statistics
of the first level order, fifteen to that of second level where thirteen are calculated by means of co-occurrence matrices and two with absolute gradient; the last thirteen finally are calculated using run-length matrices. In the second phase, using the descriptors already calculated, there is the actual image classification. Naive Bayes, RBF, Support VectorMa-
chine, K-Nearest Neighbor, Random Forest and Random Tree classifiers are used. The results obtained from the proposed system show that the analysis carried out both on textured and on medical images lead to have a high accuracy
- …