127 research outputs found

    Sparse multinomial kernel discriminant analysis (sMKDA)

    No full text
    Dimensionality reduction via canonical variate analysis (CVA) is important for pattern recognition and has been extended variously to permit more flexibility, e.g. by "kernelizing" the formulation. This can lead to over-fitting, usually ameliorated by regularization. Here, a method for sparse, multinomial kernel discriminant analysis (sMKDA) is proposed, using a sparse basis to control complexity. It is based on the connection between CVA and least-squares, and uses forward selection via orthogonal least-squares to approximate a basis, generalizing a similar approach for binomial problems. Classification can be performed directly via minimum Mahalanobis distance in the canonical variates. sMKDA achieves state-of-the-art performance in terms of accuracy and sparseness on 11 benchmark datasets

    Learning by correlation for computer vision applications: from Kernel methods to deep learning

    Get PDF
    Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications

    Variations of Particle Swarm Optimization for Obtaining Classification Rules Applied to Credit Risk in Financial Institutions of Ecuador

    Get PDF
    Knowledge generated using data mining techniques is of great interest for organizations, as it facilitates tactical and strategic decision making, generating a competitive advantage. In the special case of credit granting organizations, it is important to clearly define rejection/approval criteria. In this direction, classification rules are an appropriate tool, provided that the rule set has low cardinality and that the antecedent of the rules has few conditions. This paper analyzes different solutions based on Particle Swarm Optimization (PSO) techniques, which are able to construct a set of classification rules with the aforementioned characteristics using information from the borrower and the macroeconomic environment at the time of granting the loan. In addition, to facilitate the understanding of the model, fuzzy logic is incorporated into the construction of the antecedent. To reduce the search time, the particle swarm is initialized by a competitive neural network. Different variants of PSO are applied to three databases of financial institutions in Ecuador. The first institution specializes in massive credit placement. The second institution specializes in consumer credit and business credit lines. Finally, the third institution is a savings and credit cooperative. According to our results, the incorporation of fuzzy logic generates rule sets with greater precision.Instituto de Investigación en Informátic

    ASSESSMENT AND PREDICTION OF CARDIOVASCULAR STATUS DURING CARDIAC ARREST THROUGH MACHINE LEARNING AND DYNAMICAL TIME-SERIES ANALYSIS

    Get PDF
    In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic signals. Ventricular Fibrillation (VF) is a common presenting dysrhythmia in the setting of cardiac arrest whose main treatment is defibrillation through direct current countershock to achieve return of spontaneous circulation. However, often defibrillation is unsuccessful and may even lead to the transition of VF to more nefarious rhythms such as asystole or pulseless electrical activity. Multiple methods have been proposed for predicting defibrillation success based on examination of the VF waveform. To date, however, no analytical technique has been widely accepted. For a given desired sensitivity, the proposed model provides a significantly higher accuracy and specificity as compared to the state-of-the-art. Notably, within the range of 80-90% of sensitivity, the method provides about 40% higher specificity. This means that when trained to have the same level of sensitivity, the model will yield far fewer false positives (unnecessary shocks). Also introduced is a new model that predicts recurrence of arrest after a successful countershock is delivered. To date, no other work has sought to build such a model. I validate the method by reporting multiple performance metrics calculated on (blind) test sets

    A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis

    Get PDF
    Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets

    Cascade of classifier ensembles for reliable medical image classification

    Get PDF
    Medical image analysis and recognition is one of the most important tools in modern medicine. Different types of imaging technologies such as X-ray, ultrasonography, biopsy, computed tomography and optical coherence tomography have been widely used in clinical diagnosis for various kinds of diseases. However, in clinical applications, it is usually time consuming to examine an image manually. Moreover, there is always a subjective element related to the pathological examination of an image. This produces the potential risk of a doctor to make a wrong decision. Therefore, an automated technique will provide valuable assistance for physicians. By utilizing techniques from machine learning and image analysis, this thesis aims to construct reliable diagnostic models for medical image data so as to reduce the problems faced by medical experts in image examination. Through supervised learning of the image data, the diagnostic model can be constructed automatically. The process of image examination by human experts is very difficult to simulate, as the knowledge of medical experts is often fuzzy and not easy to be quantified. Therefore, the problem of automatic diagnosis based on images is usually converted to the problem of image classification. For the image classification tasks, using a single classifier is often hard to capture all aspects of image data distributions. Therefore, in this thesis, a classifier ensemble based on random subspace method is proposed to classify microscopic images. The multi-layer perceptrons are used as the base classifiers in the ensemble. Three types of feature extraction methods are selected for microscopic image description. The proposed method was evaluated on two microscopic image sets and showed promising results compared with the state-of-art results. In order to address the classification reliability in biomedical image classification problems, a novel cascade classification system is designed. Two random subspace based classifier ensembles are serially connected in the proposed system. In the first stage of the cascade system, an ensemble of support vector machines are used as the base classifiers. The second stage consists of a neural network classifier ensemble. Using the reject option, the images whose classification results cannot achieve the predefined rejection threshold at the current stage will be passed to the next stage for further consideration. The proposed cascade system was evaluated on a breast cancer biopsy image set and two UCI machine learning datasets, the experimental results showed that the proposed method can achieve high classification reliability and accuracy with small rejection rate. Many computer aided diagnosis systems face the problem of imbalance data. The datasets used for diagnosis are often imbalanced as the number of normal cases is usually larger than the number of the disease cases. Classifiers that generalize over the data are not the most appropriate choice in such an imbalanced situation. To tackle this problem, a novel one-class classifier ensemble is proposed. The Kernel Principle Components are selected as the base classifiers in the ensemble; the base classifiers are trained by different types of image features respectively and then combined using a product combining rule. The proposed one-class classifier ensemble is also embedded into the cascade scheme to improve classification reliability and accuracy. The proposed method was evaluated on two medical image sets. Favorable results were obtained comparing with the state-of-art results

    Finding spectral features for the early identification of biotic stress in plants

    Get PDF
    Early detection of biotic stress in plants is vital for precision crop protection, but hard to achieve. Prediction of plant diseases or weeds at an early stage has significant influence on the extent and effectiveness of crop protection measures. The precise measure depends on specific weeds and plant diseases and their economic thresholds. Weeds and plant diseases at an early stage, however, are difficult to identify. Non-invasive optical sensors with high resolution are promising for early detection of biotic stress. The data of these sensors, e.g. hyperspectral or fluorescence signatures, contain relevant information about the occurrence of pathogens. Shape parameters, derived from bispectral images, have enormous potential for an early identification of weeds in crops. The analysis of this high dimensional data for an identification of weeds and pathogens as early as possible is demanding as the sensor signal is affected by many influencing factors. Nevertheless, advanced methods of machine learning facilitate the interpretation of these signals. Whereas traditional statistics estimate the posterior probability of the class by probability distribution, machine learning methods provide algorithms for optimising prediction accuracy by the discriminant function. Machine learning methods with robust training algorithms play a key role in handling non-linear classification problems. This thesis presents an approach which integrates modern sensor techniques and advanced machine learning methods for an early detection and differentiation of plant diseases and weeds. Support vector machines (SVMs) equipped with non-linear kernels prove as effective and robust classifiers. Furthermore, it is shown that even a presymptomatic identification based on the combination of spectral vegetation indices is realised. Using well-established data analysis methods of this scientific field, this has not achieved so far. Identifying disease specific features from the underlying original high dimensional sensor data selection is conducted. The high dimensionality of data affords a careful selection of relevant and non-redundant features depending on classification problem and feature properties. In the case of fluorescence signatures an extraction of new features is necessary. In this context modelling of signal noise by an analytical description of the spectral signature improves the accuracy of classification substantially. In the case of weed discrimination accuracy is improved by exploiting the hierarchy of weed species. This thesis outlines the potential of SVMs, feature construction and feature selection for precision crop protection. A problem-specific extraction and selection of relevant features, in combination with task-oriented classification methods, is essential for robust identification of pathogens and weeds as early as possible.Früherkennung von biotischem Pflanzenstress ist für den Präzisionspflanzenschutz wesentlich, aber schwierig zu erreichen. Die Vorhersage von Pflanzenkrankheiten und Unkräutern in einem frühen Entwicklungsstadium hat signifikanten Einfluss auf das Ausmaß und die Effektivität einer Pflanzenschutzmaßnahme. Aufgrund der Abhängigkeit einer Maßnahme von der Art der Pflanzenkrankheit oder des Unkrauts und ihrer ökonomischer Schadschwelle ist eine präzise Identifizierung der Schadursache essentiell, aber gerade im Frühstadium durch die Ähnlichkeit der Schadbilder problematisch. Nicht-invasive optische Sensoren mit hoher Auflösung sind vielversprechend für eine Früherkennung von biotischem Pflanzenstress. Daten dieser Sensoren, beispielsweise Hyperspektral- oder Fluoreszenzspektren, enthalten relevante Informationen über das Auftreten von Pathogenen; Formparameter, abgeleitet aus bispektralen Bildern, zeigen großes Potential für die Früherkennung von Unkräutern in Kulturpflanzen. Die Analyse dieser hochdimensionalen Sensordaten unter Berücksichtigung vielfältiger Faktoren ist eine anspruchsvolle Herausforderung. Moderne Methoden des maschinellen Lernens bieten hier zielführende Möglichkeiten. Während die traditionelle Statistik die a-posteriori Wahrscheinlichkeit der Klasse basierend auf Wahrscheinlichkeitsverteilungen schätzt, verwenden maschinelle Lernverfahren Algorithmen für eine Optimierung der Vorhersagegenauigkeit auf Basis diskriminierender Funktionen. Grundlage zur Bearbeitung dieser nicht-linearen Klassi kationsprobleme sind robuste maschinelle Lernverfahren. Die vorliegende Dissertationsschrift zeigt, dass die Integration moderner Sensortechnik mit fortgeschrittenen Methoden des maschinellen Lernens eine Erkennung und Differenzierung von Pflanzenkrankheiten und Unkräutern ermöglicht. Einen wesentlichen Beitrag für eine effektive und robuste Klassifikation leisten Support Vektor Maschinen (SVMs) mit nicht-linearen Kernels. Weiterhin wird gezeigt, dass SVMs auf Basis spektraler Vegetationsindizes die Detektion von Pflanzenkrankheiten noch vor Auftreten visuell wahrnehmbarer Symptome ermöglichen. Dies wurde mit bekannten Verfahren noch nicht erreicht. Zur Identifikation krankheitsspezifischer Merkmale aus den zugrunde liegenden originären hochdimensionalen Sensordaten wurden Merkmale konstruiert und selektiert. Die Selektion ist sowohl vom Klassifikationsproblem als auch von den Eigenschaften der Merkmale abhängig. Im Fall von Fluoreszenzspektren war eine Extraktion von neuen Merkmalen notwendig. In diesem Zusammenhang trägt die Modellierung des Signalrauschens durch eine analytische Beschreibung der spektralen Signatur zur deutlichen Verbesserung der Klassifikationsgenauigkeit bei. Im Fall der Differenzierung von unterschiedlichen Unkräutern erhöht die Ausnutzung der Hierarchie der Unkrautarten die Genauigkeit signifikant. Diese Arbeit zeigt das Potential von Support Vektor Maschinen, Merkmalskonstruktion und Selektion für den Präzisionspflanzenschutz. Eine problemspezifische Extraktion und Selektion relevanter Merkmale in Verbindung mit sachbezogenen Klassifikationsmethoden ermöglichen eine robuste Identifikation von Pathogenen und Unkräutern zu einem sehr frühen Zeitpunkt

    Applications of artificial intelligence techniques to a spacecraft control problem

    Get PDF
    Artificial intelligence applied to spacecraft control proble

    Positive Definite Matrices: Compression, Decomposition, Eigensolver, and Concentration

    Get PDF
    For many decades, the study of positive-definite (PD) matrices has been one of the most popular subjects among a wide range of scientific researches. A huge mass of successful models on PD matrices has been proposed and developed in the fields of mathematics, physics, biology, etc., leading to a celebrated richness of theories and algorithms. In this thesis, we draw our attention to a general class of PD matrices that can be decomposed as the sum of a sequence of positive-semidefinite matrices. For this class of PD matrices, we will develop theories and algorithms on operator compression, multilevel decomposition, eigenpair computation, and spectrum concentration. We divide these contents into three main parts. In the first part, we propose an adaptive fast solver for the preceding class of PD matrices which includes the well-known graph Laplacians. We achieve this by establishing an adaptive operator compression scheme and a multiresolution matrix factorization algorithm which have nearly optimal performance on both complexity and well-posedness. To develop our methods, we introduce a novel notion of energy decomposition for PD matrices and two important local measurement quantities, which provide theoretical guarantee and computational guidance for the construction of an appropriate partition and a nested adaptive basis. In the second part, we propose a new iterative method to hierarchically compute a relatively large number of leftmost eigenpairs of a sparse PD matrix under the multiresolution matrix compression framework. We exploit the well-conditioned property of every decomposition components by integrating the multiresolution framework into the Implicitly Restarted Lanczos method. We achieve this combination by proposing an extension-refinement iterative scheme, in which the intrinsic idea is to decompose the target spectrum into several segments such that the corresponding eigenproblem in each segment is well-conditioned. In the third part, we derive concentration inequalities on partial sums of eigenvalues of random PD matrices by introducing the notion of k-trace. For this purpose, we establish a generalized Lieb's concavity theorem, which extends the original Lieb's concavity theorem from the normal trace to k-traces. Our argument employs a variety of matrix techniques and concepts, including exterior algebra, mixed discriminant, and operator interpolation.</p

    Joint University Program for Air Transportation Research, 1989-1990

    Get PDF
    Research conducted during the academic year 1989-90 under the NASA/FAA sponsored Joint University Program for Air Transportation research is discussed. Completed works, status reports and annotated bibliographies are presented for research topics, which include navigation, guidance and control theory and practice, aircraft performance, human factors, and expert systems concepts applied to airport operations. An overview of the year's activities for each university is also presented
    corecore