112 research outputs found

    Discrete optimization methods to fit piecewise affine models to data points

    Get PDF
    Fitting piecewise affine models to data points is a pervasive task in many scientific disciplines. In this work, we address the k-Piecewise Affine Model Fitting with Piecewise Linear Separability problem (k-PAMF-PLS) where, given a set of m points {a1,…,am}?Rn{a1,…,am}?Rn and the corresponding observations {b1,…,bm}?R{b1,…,bm}?R, we have to partition the domain RnRn into k piecewise linearly (or affinely) separable subdomains and to determine an affine submodel (function) for each of them so as to minimize the total linear fitting error w.r.t. the observations bi.To solve k-PAMF-PLS to optimality, we propose a mixed-integer linear programming (MILP) formulation where symmetries are broken by separating shifted column inequalities. For medium-to-large scale instances, we develop a four-step heuristic involving, among others, a point reassignment step based on the identification of critical points and a domain partition step based on multicategory linear classification. Differently from traditional approaches proposed in the literature for similar fitting problems, in both our exact and heuristic methods the domain partitioning and submodel fitting aspects are taken into account simultaneously.Computational experiments on real-world and structured randomly generated instances show that, with our MILP formulation with symmetry breaking constraints, we can solve to proven optimality many small-size instances. Our four-step heuristic turns out to provide close-to-optimal solutions for small-size instances, while allowing to tackle instances of much larger size. The experiments also show that the combined impact of the main features of our heuristic is quite substantial when compared to standard variants not including them. We conclude with an application to the identification of dynamical piecewise affine systems for which we obtain promising results of comparable quality with those achieved with state-of-the-art methods from the literature on benchmark data sets

    Classification software technique assessment

    Get PDF
    A catalog of software options is presented for the use of local user communities to obtain software for analyzing remotely sensed multispectral imagery. The resources required to utilize a particular software program are described. Descriptions of how a particular program analyzes data and the performance of that program for an application and data set provided by the user are shown. An effort is made to establish a statistical performance base for various software programs with regard to different data sets and analysis applications, to determine the status of the state-of-the-art

    Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

    Get PDF
    Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied

    A computational intelligence based prediction model for flight departure delays

    Get PDF
    Abstract : Flight departure delays are a major problem at OR Tambo International airport (ORTIA). There is a high delay for flights to depart, especially at the beginning of the month and at the end of the month. The increasing demand for flights departing at ORTIA often leads to a negative effect on business deals, individuals’ health, job opportunities and tourists. When flights are delayed departing, travellers are notified at the airport every 30 minutes about the status of the flight and the reason the flight is delayed if it is known. This study aims to construct a flight delays prediction model using machine learning algorithms. The flight departures data were obtained from ORTIAs website timetable for departing flight schedules. The flight departure data for ORTIA to any destination (i.e. Johannesburg (JNB) Airport to Cape Town (CPT)) for South African Airways (SAA) airline was used for this study. Machine learning algorithms namely Decision Trees (J48), Support Vector Machine (SVM), K-Means Clustering (K-Means) and Multi-Layered Perceptron (MLP) were used to construct the flight departure delays prediction models. A cross-validation (CV) method was used for evaluating the models. The best prediction model was selected by using a confusion matrix. The results showed that the models constructed using Decision Trees (J48) achieved the best prediction for flight departure delays at 67.144%, while Multi-layered Perceptron (MLP) obtained 67.010%, Support Vector Machine (SVM) obtained 66.249% and K-Means Clustering (K-Means) obtained 61.549%. Travellers wishing to travel from ORTIA can predict flight departure delays using this tool. This tool will allow travellers to enter variables such as month, week of month, day of week and time of day. The entered variables will predict the flight departure status by examining target concepts such as On Time, Delayed and Cancelled. The travellers will only be able to predict flight departures status, although they will not have full knowledge of the flight departures volume. In that case, they will depend on the flight information display system (FIDS) board. This study can predict and empower travellers by providing them with a tool that can determine the punctuality of the flights departing from ORTIA.M.Com. (Information Technology Management

    Doctor of Philosophy

    Get PDF
    dissertationWith the tremendous growth of data produced in the recent years, it is impossible to identify patterns or test hypotheses without reducing data size. Data mining is an area of science that extracts useful information from the data by discovering patterns and structures present in the data. In this dissertation, we will largely focus on clustering which is often the first step in any exploratory data mining task, where items that are similar to each other are grouped together, making downstream data analysis robust. Different clustering techniques have different strengths, and the resulting groupings provide different perspectives on the data. Due to the unsupervised nature i.e., the lack of domain experts who can label the data, validation of results is very difficult. While there are measures that compute "goodness" scores for clustering solutions as a whole, there are few methods that validate the assignment of individual data items to their clusters. To address these challenges we focus on developing a framework that can generate, compare, combine, and evaluate different solutions to make more robust and significant statements about the data. In the first part of this dissertation, we present fast and efficient techniques to generate and combine different clustering solutions. We build on some recent ideas on efficient representations of clusters of partitions to develop a well founded metric that is spatially aware to compare clusterings. With the ability to compare clusterings, we describe a heuristic to combine different solutions to produce a single high quality clustering. We also introduce a Markov chain Monte Carlo approach to sample different clusterings from the entire landscape to provide the users with a variety of choices. In the second part of this dissertation, we build certificates for individual data items and study their influence on effective data reduction. We present a geometric approach by defining regions of influence for data items and clusters and use this to develop adaptive sampling techniques to speedup machine learning algorithms. This dissertation is therefore a systematic approach to study the landscape of clusterings in an attempt to provide a better understanding of the data
    • …
    corecore