Search CORE

5,554 research outputs found

Clustering: Methodology, hybrid systems, visualization, validation and implementation

Author: Lam Dao Minh
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2016
Field of study

Unsupervised learning is one of the most important steps of machine learning applications. Besides its ability to obtain the insight of the data distribution, unsupervised learning is used as a preprocessing step for other machine learning algorithm. This dissertation investigates the application of unsupervised learning into various types of data for many machine learning tasks such as clustering, regression and classification. The dissertation is organized into three papers. In the first paper, unsupervised learning is applied to mixed categorical and numerical feature data type to transform the data objects from the mixed type feature domain into a new sparser numerical domain. By making use of the data fusion capacity of adaptive resonance theory clustering, the approach is able to reduce the distinction between the numerical and categorical features. The second paper presents a novel method to improve the performance of wind forecast by clustering the time series of the surrounding wind mills into the similar group by using hidden Markov model clustering and using the clustering information to enhance the forecast. A fast forecast method is also introduced by using extreme learning machine which can be trained by analytic form to choose the optimal value of past samples for prediction and appropriate size of the neural network. In the third paper, unsupervised learning is used to automatically learn the feature from the dataset itself without human design of sophisticated feature extractors. The paper points out that by using unsupervised feature learning with multi-quadric radial basis function extreme learning machine the performance of the classifier is better than several other supervised learning methods. The paper further improves the speed of training the neural network by presenting an algorithm that runs parallel on GPU --Abstract, page iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Clustering Data of Mixed Categorical and Numerical Type with Unsupervised Feature Learning

Author: Lam Dao
Wei Mingzhen
Wunsch Donald C.
Publication venue: Scholars\u27 Mine
Publication date: 01/09/2015
Field of study

Mixed-type categorical and numerical data are a challenge in many applications. This general area of mixed-type data is among the frontier areas, where computational intelligence approaches are often brittle compared with the capabilities of living creatures. In this paper, unsupervised feature learning (UFL) is applied to the mixed-type data to achieve a sparse representation, which makes it easier for clustering algorithms to separate the data. Unlike other UFL methods that work with homogeneous data, such as image and video data, the presented UFL works with the mixed-type data using fuzzy adaptive resonance theory (ART). UFL with fuzzy ART (UFLA) obtains a better clustering result by removing the differences in treating categorical and numeric features. The advantages of doing this are demonstrated with several real-world data sets with ground truth, including heart disease, teaching assistant evaluation, and credit approval. The approach is also demonstrated on noisy, mixed-type petroleum industry data. UFLA is compared with several alternative methods. To the best of our knowledge, this is the first time UFL has been extended to accomplish the fusion of mixed data types

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Automatic Bayesian Density Analysis

Author: Ghahramani Zoubin
Kersting Kristian
Molina Alejandro
Peharz Robert
Valera Isabel
Vergari Antonio
Publication venue
Publication date: 01/01/2019
Field of study

Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

arXiv.org e-Print Archive

TUbiblio

Pure OAI Repository

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications

Unsupervised Learning for Understanding Student Achievement in a Distance Learning Setting

Author: d'Aquin Mathieu
Liu Shuangyan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/06/2017
Field of study

Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important role in obtaining successful learning outcomes, as well as external factors such as regions where they come from and the learning environment that they can access. Identifying the relationships between student characteristics and distance learning outcomes is a central issue in learning analytics. This paper presents a study that applies unsupervised learning for identifying how demographic characteristics of students and their engagement in online learning activities can affect their learning achievement. We utilise the K-Prototypes clustering method to identify groups of students based on demographic characteristics and interactions with online learning environments, and also investigate the learning achievement of each group. Knowing these groups of students who have successful or poor learning outcomes can aid faculty for designing online courses that adapt to different students' needs. It can also assist students in selecting online courses that are appropriate to them

Crossref

Irish Universities

Open Research Online (The Open University)

Access to Research at National University of Ireland, Galway

A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs

Author: Castineira David
Darabi Hamed
Esmaeilzadeh Soheil
Hetz Gill
Olalotiti-lawal Feyisayo
Salehi Amir
Publication venue: 'Society of Petroleum Engineers (SPE)'
Publication date: 01/01/2019
Field of study

Representing the reservoir as a network of discrete compartments with neighbor and non-neighbor connections is a fast, yet accurate method for analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale compartments with distinct static and dynamic properties is an integral part of such high-level reservoir analysis. In this work, we present a hybrid framework specific to reservoir analysis for an automatic detection of clusters in space using spatial and temporal field data, coupled with a physics-based multiscale modeling approach. In this work a novel hybrid approach is presented in which we couple a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs. This research also adds to the literature by presenting a comprehensive work on spatio-temporal clustering for reservoir studies applications that well considers the clustering complexities, the intrinsic sparse and noisy nature of the data, and the interpretability of the outcome. Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin

arXiv.org e-Print Archive

Crossref