11,312 research outputs found

    Generalization of form in visual pattern classification.

    Get PDF
    Human observers were trained to criterion in classifying compound Gabor signals with sym- metry relationships, and were then tested with each of 18 blob-only versions of the learning set. General- ization to dark-only and light-only blob versions of the learning signals, as well as to dark-and-light blob versions was found to be excellent, thus implying virtually perfect generalization of the ability to classify mirror-image signals. The hypothesis that the learning signals are internally represented in terms of a 'blob code' with explicit labelling of contrast polarities was tested by predicting observed generalization behaviour in terms of various types of signal representations (pixelwise, Laplacian pyramid, curvature pyramid, ON/OFF, local maxima of Laplacian and curvature operators) and a minimum-distance rule. Most representations could explain generalization for dark-only and light-only blob patterns but not for the high-thresholded versions thereof. This led to the proposal of a structure-oriented blob-code. Whether such a code could be used in conjunction with simple classifiers or should be transformed into a propo- sitional scheme of representation operated upon by a rule-based classification process remains an open question

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    On the Impact of Entity Linking in Microblog Real-Time Filtering

    Full text link
    Microblogging is a model of content sharing in which the temporal locality of posts with respect to important events, either of foreseeable or unforeseeable nature, makes applica- tions of real-time filtering of great practical interest. We propose the use of Entity Linking (EL) in order to improve the retrieval effectiveness, by enriching the representation of microblog posts and filtering queries. EL is the process of recognizing in an unstructured text the mention of relevant entities described in a knowledge base. EL of short pieces of text is a difficult task, but it is also a scenario in which the information EL adds to the text can have a substantial impact on the retrieval process. We implement a start-of-the-art filtering method, based on the best systems from the TREC Microblog track realtime adhoc retrieval and filtering tasks , and extend it with a Wikipedia-based EL method. Results show that the use of EL significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 - 17, 201

    Mosquito Detection with Neural Networks: The Buzz of Deep Learning

    Full text link
    Many real-world time-series analysis problems are characterised by scarce data. Solutions typically rely on hand-crafted features extracted from the time or frequency domain allied with classification or regression engines which condition on this (often low-dimensional) feature vector. The huge advances enjoyed by many application domains in recent years have been fuelled by the use of deep learning architectures trained on large data sets. This paper presents an application of deep learning for acoustic event detection in a challenging, data-scarce, real-world problem. Our candidate challenge is to accurately detect the presence of a mosquito from its acoustic signature. We develop convolutional neural networks (CNNs) operating on wavelet transformations of audio recordings. Furthermore, we interrogate the network's predictive power by visualising statistics of network-excitatory samples. These visualisations offer a deep insight into the relative informativeness of components in the detection problem. We include comparisons with conventional classifiers, conditioned on both hand-tuned and generic features, to stress the strength of automatic deep feature learning. Detection is achieved with performance metrics significantly surpassing those of existing algorithmic methods, as well as marginally exceeding those attained by individual human experts.Comment: For data and software related to this paper, see http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201

    Augmenting Translation Lexica by Learning Generalised Translation Patterns

    Get PDF
    Bilingual Lexicons do improve quality: of parallel corpora alignment, of newly extracted translation pairs, of Machine Translation, of cross language information retrieval, among other applications. In this regard, the first problem addressed in this thesis pertains to the classification of automatically extracted translations from parallel corpora-collections of sentence pairs that are translations of each other. The second problem is concerned with machine learning of bilingual morphology with applications in the solution of first problem and in the generation of Out-Of-Vocabulary translations. With respect to the problem of translation classification, two separate classifiers for handling multi-word and word-to-word translations are trained, using previously extracted and manually classified translation pairs as correct or incorrect. Several insights are useful for distinguishing the adequate multi-word candidates from those that are inadequate such as, lack or presence of parallelism, spurious terms at translation ends such as determiners, co-ordinated conjunctions, properties such as orthographic similarity between translations, the occurrence and co-occurrence frequency of the translation pairs. Morphological coverage reflecting stem and suffix agreements are explored as key features in classifying word-to-word translations. Given that the evaluation of extracted translation equivalents depends heavily on the human evaluator, incorporation of an automated filter for appropriate and inappropriate translation pairs prior to human evaluation contributes to tremendously reduce this work, thereby saving the time involved and progressively improving alignment and extraction quality. It can also be applied to filtering of translation tables used for training machine translation engines, and to detect bad translation choices made by translation engines, thus enabling significative productivity enhancements in the post-edition process of machine made translations. An important attribute of the translation lexicon is the coverage it provides. Learning suffixes and suffixation operations from the lexicon or corpus of a language is an extensively researched task to tackle out-of-vocabulary terms. However, beyond mere words or word forms are the translations and their variants, a powerful source of information for automatic structural analysis, which is explored from the perspective of improving word-to-word translation coverage and constitutes the second part of this thesis. In this context, as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries, an approach to automatically induce segmentation and learn bilingual morph-like units by identifying and pairing word stems and suffixes is proposed, using the bilingual corpus of translations automatically extracted from aligned parallel corpora, manually validated or automatically classified. Minimally supervised technique is proposed to enable bilingual morphology learning for language pairs whose bilingual lexicons are highly defective in what concerns word-to-word translations representing inflection diversity. Apart from the above mentioned applications in the classification of machine extracted translations and in the generation of Out-Of-Vocabulary translations, learned bilingual morph-units may also have a great impact on the establishment of correspondences of sub-word constituents in the cases of word-to-multi-word and multi-word-to-multi-word translations and in compression, full text indexing and retrieval applications
    • …
    corecore