12 research outputs found

    ECG Classification with an Adaptive Neuro-Fuzzy Inference System

    Get PDF
    Heart signals allow for a comprehensive analysis of the heart. Electrocardiography (ECG or EKG) uses electrodes to measure the electrical activity of the heart. Extracting ECG signals is a non-invasive process that opens the door to new possibilities for the application of advanced signal processing and data analysis techniques in the diagnosis of heart diseases. With the help of today’s large database of ECG signals, a computationally intelligent system can learn and take the place of a cardiologist. Detection of various abnormalities in the patient’s heart to identify various heart diseases can be made through an Adaptive Neuro-Fuzzy Inference System (ANFIS) preprocessed by subtractive clustering. Six types of heartbeats are classified: normal sinus rhythm, premature ventricular contraction (PVC), atrial premature contraction (APC), left bundle branch block (LBBB), right bundle branch block (RBBB), and paced beats. The goal is to detect important characteristics of an ECG signal to determine if the patient’s heartbeat is normal or irregular. The results from three trials indicate an average accuracy of 98.10%, average sensitivity of 94.99%, and average specificity of 98.87%. These results are comparable to two artificial neural network (ANN) algorithms: gradient descent and Levenberg Marquardt, as well as the ANFIS preprocessed by grid partitioning

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Text Categorization for Intellectual Property

    Get PDF
    This study investigates the effect of training different categorization algorithms on various patent document representations

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Advanced Fault Diagnosis and Health Monitoring Techniques for Complex Engineering Systems

    Get PDF
    Over the last few decades, the field of fault diagnostics and structural health management has been experiencing rapid developments. The reliability, availability, and safety of engineering systems can be significantly improved by implementing multifaceted strategies of in situ diagnostics and prognostics. With the development of intelligence algorithms, smart sensors, and advanced data collection and modeling techniques, this challenging research area has been receiving ever-increasing attention in both fundamental research and engineering applications. This has been strongly supported by the extensive applications ranging from aerospace, automotive, transport, manufacturing, and processing industries to defense and infrastructure industries

    Distributed multi-label learning on Apache Spark

    Get PDF
    This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes the instances across multiple nodes, and an approximate local sensitive hashing method that builds multiple hash tables to index the data. The results indicated that the predictions of the tree-based method are on par with those of an exact method while reducing the execution times in all the scenarios. The aforementioned method is then used to evaluate the quality of a selected feature subset. The optimal adaptation for a multi-label feature selection criterion is discussed and two distributed feature selection methods for multi-label problems are proposed: a method that selects the feature subset that maximizes the Euclidean norm of individual information measures, and a method that selects the subset of features maximizing the geometric mean. The results indicate that each method excels in different scenarios depending on type of features and the number of labels. Rigorous experimental studies and statistical analyses over many multi-label metrics and datasets confirm that the proposals achieve better performances and provide better scalability to bigger data than the methods compared in the state of the art

    Deep Learning-Based Machinery Fault Diagnostics

    Get PDF
    This book offers a compilation for experts, scholars, and researchers to present the most recent advancements, from theoretical methods to the applications of sophisticated fault diagnosis techniques. The deep learning methods for analyzing and testing complex mechanical systems are of particular interest. Special attention is given to the representation and analysis of system information, operating condition monitoring, the establishment of technical standards, and scientific support of machinery fault diagnosis

    Aprendizaje multi-etiqueta distribuido en Apache Spark

    Get PDF
    This thesis proposes a series of multi-label learning algorithms for classication and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up the multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes the instances across multiple nodes, and an approximate local sensitive hashing method that builds multiple hash tables to index the data. The results indicated that the predictions of the tree-based method are on par with those of an exact method while reducing the execution times in all the scenarios. The aforementioned method is then used to evaluate the quality of a selected feature subset. The optimal adaptation for a multi-label feature selection criterion is discussed and two distributed feature selection methods for multi-label problems are proposed: a method that selects the feature subset that maximizes the Euclidean norm of the individual information measures, and a method selects the subset of features that maximize the geometrical mean. The results indicate that each method excels in di_erent scenarios depending on type of features and the number of labels. Rigorous experimental studies and statistical analyses over many multi-label metrics and datasets con_rm that the proposals achieve better performances and provide better scalability to bigger data than the methods compared in the state of the art.Esta Tesis Doctoral propone unos algoritmos de clasificación y selección de atributos para aprendizaje multi-etiqueta distribuidos implementados en Apache Spark. Cinco estrategias para determinar la arquitectura óptima para acelerar el aprendizaje multi-etiqueta son presentadas. Estas estrategias varían desde la paralelización local utilizando hilos hasta la distribución de la computación utilizando espacios de memoria compartidos o independientes. Ha sido demostrado que la estrategia óptima permite ejecutar cientos de veces más rápido que el método de referencia. Se proponen tres métodos distribuidos de \k nearest neighbors" multi-etiqueta sobre la arquitectura de Spark seleccionada: un método exacto que computa iterativamente las distancias, un método aproximado que usa un árbol para indexar las instancias, y un método aproximado que utiliza tablas hash para indexar las instancias. Los resultados indican que las predicciones del método basado en árboles son equivalente a aquellas producidas por un método exacto a la vez que reduce los tiempos de ejecución en todos los escenarios. Dicho método es utilizado para evaluar la calidad de un subconjunto de atributos. Se discute el criterio para seleccionar atributos en problemas multi-etiqueta, proponiendo: un método que selecciona el subconjunto de atributos cuyas medidas de información individuales poseen la mayor norma Euclídea, y un método que selecciona el subconjunto de atributos con la mayor media geométrica. Los resultados indican que cada método destaca en escenarios diferentes dependiendo del tipo de atributos y el número de etiquetas. Los estudios experimentales y análisis estadísticos utilizando múltiples métricas y datos multi-etiqueta confirman que nuestras propuestas alcanzan un mejor rendimiento y proporcionan una mejor escalabilidad para datos de gran tamaño respecto a los métodos de referencia

    Apprentissage supervisés sous contraintes

    Full text link
    As supervised learning occupies a larger and larger place in our everyday life, it is met with more and more constrained settings. Dealing with those constraints is a key to fostering new progress in the field, expanding ever further the limit of machine learning---a likely necessary step to reach artificial general intelligence. Supervised learning is an inductive paradigm in which time and data are refined into knowledge, in the form of predictive models. Models which can sometimes be, it must be conceded, opaque, memory demanding and energy consuming. Given this setting, a constraint can mean any number of things. Essentially, a constraint is anything that stand in the way of supervised learning, be it the lack of time, of memory, of data, or of understanding. Additionally, the scope of applicability of supervised learning is so vast it can appear daunting. Usefulness can be found in areas including medical analysis and autonomous driving---areas for which strong guarantees are required. All those constraints (time, memory, data, interpretability, reliability) might somewhat conflict with the traditional goal of supervised learning. In such a case, finding a balance between the constraints and the standard objective is problem-dependent, thus requiring generic solutions. Alternatively, concerns might arise after learning, in which case solutions must be developed under sub-optimal conditions, resulting in constraints adding up. An example of such situations is trying to enforce reliability once the data is no longer available. After detailing the background (what is supervised learning and why is it difficult, what algorithms will be used, where does it land in the broader scope of knowledge) in which this thesis integrates itself, we will discuss four different scenarios. The first one is about trying to learn a good decision forest model of a limited size, without learning first a large model and then compressing it. For that, we have developed the Globally Induced Forest (GIF) algorithm, which mixes local and global optimizations to produce accurate predictions under memory constraints in reasonable time. More specifically, the global part allows to sidestep the redundancy inherent in traditional decision forests. It is shown that the proposed method is more than competitive with standard tree-based ensembles under corresponding constraints, and can sometimes even surpass much larger models. The second scenario corresponds to the example given above: trying to enforce reliability without data. More specifically, the focus in on out-of-distribution (OOD) detection: recognizing samples which do not come from the original distribution the model was learned from. Tackling this problem with utter lack of data is challenging. Our investigation focuses on image classification with convolutional neural networks. Indicators which can be computed alongside the prediction with little additional cost are proposed. These indicators prove useful, stable and complementary for OOD detection. We also introduce a surprisingly simple, yet effective summary indicator, shown to perform well across several networks and datasets. It can easily be tuned further as soon as samples become available. Overall, interesting results can be reached in all but the most severe settings, for which it was a priori doubtful to come up with a data-free solution. The third scenario relates to transferring the knowledge of a large model in a smaller one in the absence of data. To do so, we propose to leverage a collection of unlabeled data which are easy to come up with in domains such as image classification. Two schemes are proposed (and then analyzed) to provide optimal transfer. Firstly, we proposed a biasing mechanism in the choice of unlabeled data to use so that the focus is on the more relevant samples. Secondly, we designed a teaching mechanism, applicable for almost all pairs of large and small networks, which allows for a much better knowledge transfer between the networks. Overall, good results are obtainable in decent time provided the collection of data actually contains relevant samples. The fourth scenario tackles the problem of interpretability: what knowledge can be gleaned more or less indirectly from data. We discuss two subproblems. The first one is to showcase that GIFs (cf. supra) can be used to derive intrinsically interpretable models. The second consists in a comparative study between methods and types of models (namely decision forests and neural networks) for the specific purpose of quantifying how much each variable is important in a given problem. After a preliminary study on benchmark datasets, the analysis turns to a concrete biological problem: inferring gene regulatory network from data. An ambivalent conclusion is reached: neural networks can be made to perform better than decision forests at predicting in almost all instances but struggle to identify the relevant variables in some situations. It would seem that better (motivated) methods need to be proposed for neural networks, especially in the face of highly non-linear problems
    corecore