5,302 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Learning Analogies and Semantic Relations
We present an algorithm for learning from unlabeled text, based on the
Vector Space Model (VSM) of information retrieval, that can solve verbal
analogy questions of the kind found in the Scholastic Aptitude Test (SAT).
A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D";
for example, mason:stone::carpenter:wood. SAT analogy questions provide
a word pair, A:B, and the problem is to select the most analogous word
pair, C:D, from a set of five choices. The VSM algorithm correctly
answers 47% of a collection of 374 college-level analogy questions
(random guessing would yield 20% correct). We motivate this research by
relating it to work in cognitive science and linguistics, and by applying
it to a difficult problem in natural language processing, determining
semantic relations in noun-modifier pairs. The problem is to classify a
noun-modifier pair, such as "laser printer", according to the semantic
relation between the noun (printer) and the modifier (laser). We use a
supervised nearest-neighbour algorithm that assigns a class to a given
noun-modifier pair by finding the most analogous noun-modifier pair in
the training data. With 30 classes of semantic relations, on a collection
of 600 labeled noun-modifier pairs, the learning algorithm attains an F
value of 26.5% (random guessing: 3.3%). With 5 classes of semantic
relations, the F value is 43.2% (random: 20%). The performance is
state-of-the-art for these challenging problems
Large scale biomedical texts classification: a kNN and an ESA-based approaches
With the large and increasing volume of textual data, automated methods for
identifying significant topics to classify textual documents have received a
growing interest. While many efforts have been made in this direction, it still
remains a real challenge. Moreover, the issue is even more complex as full
texts are not always freely available. Then, using only partial information to
annotate these documents is promising but remains a very ambitious issue.
MethodsWe propose two classification methods: a k-nearest neighbours
(kNN)-based approach and an explicit semantic analysis (ESA)-based approach.
Although the kNN-based approach is widely used in text classification, it needs
to be improved to perform well in this specific classification problem which
deals with partial information. Compared to existing kNN-based methods, our
method uses classical Machine Learning (ML) algorithms for ranking the labels.
Additional features are also investigated in order to improve the classifiers'
performance. In addition, the combination of several learning algorithms with
various techniques for fixing the number of relevant topics is performed. On
the other hand, ESA seems promising for this classification task as it yielded
interesting results in related issues, such as semantic relatedness computation
between texts and text classification. Unlike existing works, which use ESA for
enriching the bag-of-words approach with additional knowledge-based features,
our ESA-based method builds a standalone classifier. Furthermore, we
investigate if the results of this method could be useful as a complementary
feature of our kNN-based approach.ResultsExperimental evaluations performed on
large standard annotated datasets, provided by the BioASQ organizers, show that
the kNN-based method with the Random Forest learning algorithm achieves good
performances compared with the current state-of-the-art methods, reaching a
competitive f-measure of 0.55% while the ESA-based approach surprisingly
yielded reserved results.ConclusionsWe have proposed simple classification
methods suitable to annotate textual documents using only partial information.
They are therefore adequate for large multi-label classification and
particularly in the biomedical domain. Thus, our work contributes to the
extraction of relevant information from unstructured documents in order to
facilitate their automated processing. Consequently, it could be used for
various purposes, including document indexing, information retrieval, etc.Comment: Journal of Biomedical Semantics, BioMed Central, 201
Deep Learning using K-space Based Data Augmentation for Automated Cardiac MR Motion Artefact Detection
Quality assessment of medical images is essential for complete automation of
image processing pipelines. For large population studies such as the UK
Biobank, artefacts such as those caused by heart motion are problematic and
manual identification is tedious and time-consuming. Therefore, there is an
urgent need for automatic image quality assessment techniques. In this paper,
we propose a method to automatically detect the presence of motion-related
artefacts in cardiac magnetic resonance (CMR) images. As this is a highly
imbalanced classification problem (due to the high number of good quality
images compared to the low number of images with motion artefacts), we propose
a novel k-space based training data augmentation approach in order to address
this problem. Our method is based on 3D spatio-temporal Convolutional Neural
Networks, and is able to detect 2D+time short axis images with motion artefacts
in less than 1ms. We test our algorithm on a subset of the UK Biobank dataset
consisting of 3465 CMR images and achieve not only high accuracy in detection
of motion artefacts, but also high precision and recall. We compare our
approach to a range of state-of-the-art quality assessment methods.Comment: Accepted for MICCAI2018 Conferenc
Pemilihan kerjaya di kalangan pelajar aliran perdagangan sekolah menengah teknik : satu kajian kes
This research is a survey to determine the career chosen of form four student
in commerce streams. The important aspect of the career chosen has been divided
into three, first is information about career, type of career and factor that most
influence students in choosing a career. The study was conducted at Sekolah
Menengah Teknik Kajang, Selangor Darul Ehsan. Thirty six form four students was
chosen by using non-random sampling purpose method as respondent. All
information was gather by using questionnaire. Data collected has been analyzed in
form of frequency, percentage and mean. Results are performed in table and graph.
The finding show that information about career have been improved in students
career chosen and mass media is the main factor influencing students in choosing
their career
EEG sleep stages identification based on weighted undirected complex networks
Sleep scoring is important in sleep research because any errors in the scoring of the patient's sleep electroencephalography (EEG) recordings can cause serious problems such as incorrect diagnosis, medication errors, and misinterpretations of patient's EEG recordings. The aim of this research is to develop a new automatic method for EEG sleep stages classification based on a statistical model and weighted brain networks.
Methods
each EEG segment is partitioned into a number of blocks using a sliding window technique. A set of statistical features are extracted from each block. As a result, a vector of features is obtained to represent each EEG segment. Then, the vector of features is mapped into a weighted undirected network. Different structural and spectral attributes of the networks are extracted and forwarded to a least square support vector machine (LS-SVM) classifier. At the same time the network's attributes are also thoroughly investigated. It is found that the network's characteristics vary with their sleep stages. Each sleep stage is best represented using the key features of their networks.
Results
In this paper, the proposed method is evaluated using two datasets acquired from different channels of EEG (Pz-Oz and C3-A2) according to the R&K and the AASM without pre-processing the original EEG data. The obtained results by the LS-SVM are compared with those by NaĆÆve, k-nearest and a multi-class-SVM. The proposed method is also compared with other benchmark sleep stages classification methods. The comparison results demonstrate that the proposed method has an advantage in scoring sleep stages based on single channel EEG signals.
Conclusions
An average accuracy of 96.74% is obtained with the C3-A2 channel according to the AASM standard, and 96% with the Pz-Oz channel based on the R&K standard
Classification of sporting activities using smartphone accelerometers
In this paper we present a framework that allows for the automatic identification of sporting activities using commonly available smartphones. We extract discriminative informational features from smartphone accelerometers using the Discrete Wavelet Transform (DWT). Despite the poor quality of their accelerometers, smartphones were used as capture devices due to their prevalence in todayās society. Successful classification on this basis potentially makes the technology accessible to both elite and non-elite athletes. Extracted features are used to train different categories of classifiers. No one classifier family has a reportable direct advantage in activity classification problems to date; thus we examine classifiers from each of the most widely used classifier families. We investigate three classification approaches; a commonly used SVM-based approach, an optimized classification model and a fusion of classifiers. We also investigate the effect of changing several of the DWT input parameters, including mother wavelets, window lengths and DWT decomposition levels. During the course of this work we created a challenging
sports activity analysis dataset, comprised of soccer and field-hockey activities. The average maximum F-measure accuracy of 87% was achieved using a fusion of classifiers, which was 6% better than a single classifier model and 23% better than a standard SVM approach
- ā¦