11,312 research outputs found
Generalization of form in visual pattern classification.
Human observers were trained to criterion in classifying compound Gabor signals with sym- metry relationships, and were then tested with each of 18 blob-only versions of the learning set. General- ization to dark-only and light-only blob versions of the learning signals, as well as to dark-and-light blob versions was found to be excellent, thus implying virtually perfect generalization of the ability to classify mirror-image signals. The hypothesis that the learning signals are internally represented in terms of a 'blob code' with explicit labelling of contrast polarities was tested by predicting observed generalization behaviour in terms of various types of signal representations (pixelwise, Laplacian pyramid, curvature pyramid, ON/OFF, local maxima of Laplacian and curvature operators) and a minimum-distance rule. Most representations could explain generalization for dark-only and light-only blob patterns but not for the high-thresholded versions thereof. This led to the proposal of a structure-oriented blob-code. Whether such a code could be used in conjunction with simple classifiers or should be transformed into a propo- sitional scheme of representation operated upon by a rule-based classification process remains an open question
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
On the Impact of Entity Linking in Microblog Real-Time Filtering
Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 -
17, 201
Mosquito Detection with Neural Networks: The Buzz of Deep Learning
Many real-world time-series analysis problems are characterised by scarce
data. Solutions typically rely on hand-crafted features extracted from the time
or frequency domain allied with classification or regression engines which
condition on this (often low-dimensional) feature vector. The huge advances
enjoyed by many application domains in recent years have been fuelled by the
use of deep learning architectures trained on large data sets. This paper
presents an application of deep learning for acoustic event detection in a
challenging, data-scarce, real-world problem. Our candidate challenge is to
accurately detect the presence of a mosquito from its acoustic signature. We
develop convolutional neural networks (CNNs) operating on wavelet
transformations of audio recordings. Furthermore, we interrogate the network's
predictive power by visualising statistics of network-excitatory samples. These
visualisations offer a deep insight into the relative informativeness of
components in the detection problem. We include comparisons with conventional
classifiers, conditioned on both hand-tuned and generic features, to stress the
strength of automatic deep feature learning. Detection is achieved with
performance metrics significantly surpassing those of existing algorithmic
methods, as well as marginally exceeding those attained by individual human
experts.Comment: For data and software related to this paper, see
http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201
Augmenting Translation Lexica by Learning Generalised Translation Patterns
Bilingual Lexicons do improve quality: of parallel corpora alignment, of newly extracted
translation pairs, of Machine Translation, of cross language information retrieval, among
other applications. In this regard, the first problem addressed in this thesis pertains to
the classification of automatically extracted translations from parallel corpora-collections
of sentence pairs that are translations of each other. The second problem is concerned
with machine learning of bilingual morphology with applications in the solution of first
problem and in the generation of Out-Of-Vocabulary translations.
With respect to the problem of translation classification, two separate classifiers for
handling multi-word and word-to-word translations are trained, using previously extracted
and manually classified translation pairs as correct or incorrect. Several insights
are useful for distinguishing the adequate multi-word candidates from those that are
inadequate such as, lack or presence of parallelism, spurious terms at translation ends
such as determiners, co-ordinated conjunctions, properties such as orthographic similarity
between translations, the occurrence and co-occurrence frequency of the translation
pairs. Morphological coverage reflecting stem and suffix agreements are explored as key
features in classifying word-to-word translations. Given that the evaluation of extracted
translation equivalents depends heavily on the human evaluator, incorporation of an
automated filter for appropriate and inappropriate translation pairs prior to human evaluation
contributes to tremendously reduce this work, thereby saving the time involved
and progressively improving alignment and extraction quality. It can also be applied
to filtering of translation tables used for training machine translation engines, and to
detect bad translation choices made by translation engines, thus enabling significative
productivity enhancements in the post-edition process of machine made translations.
An important attribute of the translation lexicon is the coverage it provides. Learning
suffixes and suffixation operations from the lexicon or corpus of a language is an extensively
researched task to tackle out-of-vocabulary terms. However, beyond mere words
or word forms are the translations and their variants, a powerful source of information
for automatic structural analysis, which is explored from the perspective of improving
word-to-word translation coverage and constitutes the second part of this thesis. In this
context, as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries,
an approach to automatically induce segmentation and learn bilingual morph-like units by identifying and pairing word stems and suffixes is proposed, using the bilingual
corpus of translations automatically extracted from aligned parallel corpora, manually
validated or automatically classified. Minimally supervised technique is proposed to enable
bilingual morphology learning for language pairs whose bilingual lexicons are highly
defective in what concerns word-to-word translations representing inflection diversity.
Apart from the above mentioned applications in the classification of machine extracted
translations and in the generation of Out-Of-Vocabulary translations, learned bilingual
morph-units may also have a great impact on the establishment of correspondences of
sub-word constituents in the cases of word-to-multi-word and multi-word-to-multi-word
translations and in compression, full text indexing and retrieval applications
- …