10 research outputs found

    Towards improving WEBSOM with multi-word expressions

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaLarge quantities of free-text documents are usually rich in information and covers several topics. However, since their dimension is very large, searching and filtering data is an exhaustive task. A large text collection covers a set of topics where each topic is affiliated to a group of documents. This thesis presents a method for building a document map about the core contents covered in the collection. WEBSOM is an approach that combines document encoding methods and Self-Organising Maps (SOM) to generate a document map. However, this methodology has a weakness in the document encoding method because it uses single words to characterise documents. Single words tend to be ambiguous and semantically vague, so some documents can be incorrectly related. This thesis proposes a new document encoding method to improve the WEBSOM approach by using multi word expressions (MWEs) to describe documents. Previous research and ongoing experiments encourage us to use MWEs to characterise documents because these are semantically more accurate than single words and more descriptive

    Advances in dissimilarity-based data visualisation

    Get PDF
    Gisbrecht A. Advances in dissimilarity-based data visualisation. Bielefeld: Universitätsbibliothek Bielefeld; 2015

    Dissimilarity-based learning for complex data

    Get PDF
    Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner. This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following: (I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices. (II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task. (III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available. (IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education

    Analysis and Detection of Outliers in GNSS Measurements by Means of Machine Learning Algorithms

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Studies on dimension reduction and feature spaces :

    Get PDF
    Today's world produces and stores huge amounts of data, which calls for methods that can tackle both growing sizes and growing dimensionalities of data sets. Dimension reduction aims at answering the challenges posed by the latter. Many dimension reduction methods consist of a metric transformation part followed by optimization of a cost function. Several classes of cost functions have been developed and studied, while metrics have received less attention. We promote the view that metrics should be lifted to a more independent role in dimension reduction research. The subject of this work is the interaction of metrics with dimension reduction. The work is built on a series of studies on current topics in dimension reduction and neural network research. Neural networks are used both as a tool and as a target for dimension reduction. When the results of modeling or clustering are represented as a metric, they can be studied using dimension reduction, or they can be used to introduce new properties into a dimension reduction method. We give two examples of such use: visualizing results of hierarchical clustering, and creating supervised variants of existing dimension reduction methods by using a metric that is built on the feature space of a neural network. Combining clustering with dimension reduction results in a novel way for creating space-efficient visualizations, that tell both about hierarchical structure and about distances of clusters. We study feature spaces used in a recently developed neural network architecture called extreme learning machine. We give a novel interpretation for such neural networks, and recognize the need to parameterize extreme learning machines with the variance of network weights. This has practical implications for use of extreme learning machines, since the current practice emphasizes the role of hidden units and ignores the variance. A current trend in the research of deep neural networks is to use cost functions from dimension reduction methods to train the network for supervised dimension reduction. We show that equally good results can be obtained by training a bottlenecked neural network for classification or regression, which is faster than using a dimension reduction cost. We demonstrate that, contrary to the current belief, using sparse distance matrices for creating fast dimension reduction methods is feasible, if a proper balance between short-distance and long-distance entries in the sparse matrix is maintained. This observation opens up a promising research direction, with possibility to use modern dimension reduction methods on much larger data sets than which are manageable today

    Value of Mineralogical Monitoring for the Mining and Minerals Industry In memory of Prof. Dr. Herbert Pöllmann

    Get PDF
    This Special Issue, focusing on the value of mineralogical monitoring for the mining and minerals industry, should include detailed investigations and characterizations of minerals and ores of the following fields for ore and process control: Lithium ores—determination of lithium contents by XRD methods; Copper ores and their different mineralogy; Nickel lateritic ores; Iron ores and sinter; Bauxite and bauxite overburden; Heavy mineral sands. The value of quantitative mineralogical analysis, mainly by XRD methods, combined with other techniques for the evaluation of typical metal ores and other important minerals, will be shown and demonstrated for different minerals. The different steps of mineral processing and metal contents bound to different minerals will be included. Additionally, some processing steps, mineral enrichments, and optimization of mineral determinations using XRD will be demonstrated. Statistical methods for the treatment of a large set of XRD patterns of ores and mineral concentrates, as well as their value for the characterization of mineral concentrates and ores, will be demonstrated. Determinations of metal concentrations in minerals by different methods will be included, as well as the direct prediction of process parameters from raw XRD data

    Smart process monitoring of machining operations

    Get PDF
    The following thesis explores the possibilities to applying artificial intelligence techniques in the field of sensory monitoring in the manufacturing sector. There are several case studies considered in the research activity. The first case studies see the implementation of supervised and unsupervised neural networks to monitoring the condition of a grinding wheel. The monitoring systems have acoustic emission sensors and a piezoelectric sensor capable to measuring electromechanical impedance. The other case study is the use of the bees' algorithm to determine the wear of a tool during the cutting operations of a steel cylinder. A script permits this operation. The script converts the images into a numerical matrix and allows the bees to correctly detect tool wear

    The Acoustic Correlates of Stress-Shifting Suffixes in Native and Nonnative English

    Get PDF
    Although laboratory phonology techniques have been widely employed to discover the interplay between the acoustic correlates of English Lexical Stress (ELS)–fundamental frequency, duration, and intensity - studies on ELS in polysyllabic words are rare, and cross-linguistic acoustic studies in this area are even rarer. Consequently, the effects of language experience on L2 lexical stress acquisition are not clear. This investigation of adult Arabic (Saudi Arabian) and Mandarin (Mainland Chinese) speakers analyzes their ELS production in tokens with seven different stress-shifting suffixes; i.e., Level 1 [+cyclic] derivations to phonologists. Stress productions are then systematically analyzed and compared with those of speakers of Midwest American English using the acoustic phonetic software, Praat. In total, one hundred subjects participated in the study, spread evenly across the three language groups, and 2,125 vowels in 800 spectrograms were analyzed (excluding stress placement and pronunciation errors). Nonnative speakers completed a sociometric survey prior to recording so that statistical sampling techniques could be used to evaluate acquisition of accurate ELS production. The speech samples of native speakers were analyzed to provide norm values for cross-reference and to provide insights into the proposed Salience Hierarchy of the Acoustic Correlates of Stress (SHACS). The results support the notion that a SHACS does exist in the L1 sound system, and that native-like command of this system through accurate ELS production can be acquired by proficient L2 learners via increased L2 input. Other findings raise questions as to the accuracy of standard American English dictionary pronunciations as well as the generalizability of claims made about the acoustic properties of tonic accent shift
    corecore