10 research outputs found

    Transductive Learning for Spatial Data Classification

    Full text link
    Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the nave Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented

    Development of a Machine Learning-Based Financial Risk Control System

    Get PDF
    With the gradual end of the COVID-19 outbreak and the gradual recovery of the economy, more and more individuals and businesses are in need of loans. This demand brings business opportunities to various financial institutions, but also brings new risks. The traditional loan application review is mostly manual and relies on the business experience of the auditor, which has the disadvantages of not being able to process large quantities and being inefficient. Since the traditional audit processing method is no longer suitable some other method of reducing the rate of non-performing loans and detecting fraud in applications is urgently needed by financial institutions. In this project, a financial risk control model is built by using various machine learning algorithms. The model is used to replace the traditional manual approach to review loan applications. It improves the speed of review as well as the accuracy and approval rate of the review. Machine learning algorithms were also used in this project to create a loan user scorecard system that better reflects changes in user information compared to the credit card systems used by financial institutions today. In this project, the data imbalance problem and the performance improvement problem are also explored

    Content-based Information Retrieval via Nearest Neighbor Search

    Get PDF
    Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process. On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data\u27s similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches. On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK

    Reliability of Extreme Learning Machines

    Get PDF
    Neumann K. Reliability of Extreme Learning Machines. Bielefeld: Bielefeld University Library; 2014.The reliable application of machine learning methods becomes increasingly important in challenging engineering domains. In particular, the application of extreme learning machines (ELM) seems promising because of their apparent simplicity and the capability of very efficient processing of large and high-dimensional data sets. However, the ELM paradigm is based on the concept of single hidden-layer neural networks with randomly initialized and fixed input weights and is thus inherently unreliable. This black-box character usually repels engineers from application in potentially safety critical tasks. The problem becomes even more severe since, in principle, only sparse and noisy data sets can be provided in such domains. The goal of this thesis is therefore to equip the ELM approach with the abilities to perform in a reliable manner. This goal is approached in three aspects by enhancing the robustness of ELMs to initializations, make ELMs able to handle slow changes in the environment (i.e. input drifts), and allow the incorporation of continuous constraints derived from prior knowledge. It is shown in several diverse scenarios that the novel ELM approach proposed in this thesis ensures a safe and reliable application while simultaneously sustaining the full modeling power of data-driven methods

    Proceedings. 16. Workshop Computational Intelligence, Dortmund, 29. Nov.-1. Dez. 2006

    Get PDF
    These proceedings contain the papers of the 16th Workshop Computational Intelligence. It was organized by the Working Group 5.14 of the VDI/VDE-Gesellschaft fĂĽr Mess- und Automatisierungstechnik (GMA) and the Working Group Fuzzy-Systems and Soft-Computing of the Gesellschaft fĂĽr Informatik (GI)

    Histograms: An educational eye

    Get PDF
    Many high-school students are not able to draw justified conclusions from statistical data in histograms. A literature review showed that most misinterpretations of histograms are related to difficulties with two statistical key concepts: data and distribution. The review also pointed to a lack of knowledge about students’ strategies when solving histogram tasks. As the literature provided little guidance for the design of lesson materials, several studies were conducted in preparation. In a first study, five solution strategies were found through qualitative analysis of students’ gazes when solving histograms and case-value plot tasks. Quantitative analysis of several histogram tasks through a mathematical model and a machine learning algorithm confirmed these results, which implied that these strategies could reliably and automatically be identified. Literature also suggested that dotplot tasks can support students’ learning to interpret histograms. Therefore, gazes on histogram tasks were compared before and after students solved dotplot tasks. The "after" tasks contained more gazes associated with correct strategies and fewer gazes associated with incorrect strategies. Although answers did not improve significantly, students’ verbal descriptions suggest that some students changed to a correct strategy. Newly designed materials thus started with dotplot tasks. From the previous studies, we conjectured that students lacked embodied experiences with actions related to histograms. Designed from an embodied instrumentation perspective, the tested materials provide starting points for scaling up. Together, the studies address the knowledge gaps identified in the literature. The studies contribute to knowledge about learning histograms and use in statistics education of eye-tracking research, interpretable models and machine learning algorithms, and embodied instrumentation design
    corecore