86 research outputs found

    Novel machine learning methods based on information theory

    Get PDF
    [Resumo] A aprendizaxe automática é a área da intelixencia artificial e da ciencia da computación que estuda algoritmos que aprenden a partir de datos, fan prediccións e producen comportamentos baseados en exemplos. Esta tesis desenvolve novos métodos de aprendiza.xe automática baseados en teoría da información [TI) e en information Iheoretic learning (ITL): (1) En primeiro lugar, utilizase TI para selección de características. Específicamente, se desenvolveo dous novos algoritmos. O primeiro ten en conta o coste (computacional. económico, etc.) de cada caracteristica -ademais da relevancia-. O segundo fai uso do concepto de enremble. moi comÚD en escenarios de clasificación, pero moi poueo explorado na literatura de selección de características. (2) En segundo lugar. se poden empregar conceptos de TI e ITL como unha función de erro alternativa. o cal permite a exploración doutro campo da literatura non moi estudado: a aproximación de modelado local. Específicamente, desenvólvese un novo algoritmo para clasificación. Este algoritmo está baseado na combinación de redes de neuronas por medio de modelado local e técnicas baseadas en ITL.[Resumen] El aprendizaje automático es el área de la inteligencia artificial y la ciencia de la computación que estudia los algoritmos que aprenden a partir de datos, realizan predicciones y producen comportamientos basados en ejemplos. Esta tesis desarrolla nuevos métodos de aprendizaje automático basados en teoría de la información (TI) y en information theoretic learning (ITL): (1) En primer lugar, se utiliza TI para selección de características. Específicamente, se desarrollan dos nuevos algoritmos. El primero tiene en cuenta el coste (computacional, económico, etc.) de cada característica -además de la relevancia-. El segundo hace uso del concepto de ensemble, muy común en escenarios de clasificación, pero muy poco explorado en la literatura de selección de características. (2) En segundo lugar. se pueden emplear conceptos de TI e ITL como una función de error alternativa, lo cual permite la exploración de otro campo de la literatura no muy estudiado: la aproximación de modelado local. Especificamente, se desarrolla un nuevo algoritmo para clasificación. Este algoritmo está basado en la combinación de redes de neuronas por medio de modelado local y técnicas basadas en ITL.[Abstract] Machine learning is the area of artificial intelligence and computer science that studies algorithms that can learn from data, make predictions, and produce behaviors based on examples. This thesis develops new methods of machine learning based on infonnation theory (IT) and information tbeoretic leaming (ITL): (1) On the one band, IT is used for feature selection. Specifically, two new algorithms are developed. The first one takes into account the cost (computational, economic, etc.) of each feature -besides its relevance-. Tbe second one makes use of the concept of ensemble, quite common for c1assification scenarios, but very little explored in the literature of feature selection. (2) 0n the other band, IT and ITL concepts can be employed as an altemative crror function, thus allowing the exploration of another not very well studied field in the literature: {he local modeling approach. Specifically, a new algorithm for classification is developed. This algorithm is based on the combination of neural networks by means of local modeling and techniques based on ITL

    Sensory fusion applied to power system state estimation considering information theory concepts

    Get PDF
    With the increasing integration of synchronized phasor measurement units (PMU) in power grids appears the necessity to create methods capable to merge the information obtained from different classes of sensors, namely PMU and the conventional sensors already integrated in the SCADA system. Thus, this dissertation proposes a sensory fusion that guarantees the previous requirement. Beyond that, this thesis proposes an application of concepts related with Information theory to the state estimation problem. This application aims to propose a robust state estimator without the necessity of previous treatment of the acquired data

    Learning Dynamic Systems for Intention Recognition in Human-Robot-Cooperation

    Get PDF
    This thesis is concerned with intention recognition for a humanoid robot and investigates how the challenges of uncertain and incomplete observations, a high degree of detail of the used models, and real-time inference may be addressed by modeling the human rationale as hybrid, dynamic Bayesian networks and performing inference with these models. The key focus lies on the automatic identification of the employed nonlinear stochastic dependencies and the situation-specific inference

    Adaptive Feature Engineering Modeling for Ultrasound Image Classification for Decision Support

    Get PDF
    Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually significantly underrepresented compared to the non-target class. This makes it difficult to train standard classification models like Support Vector Machine (SVM), Decision Trees, and Nearest Neighbor techniques on biomedical datasets because they assume an equal class distribution or an equal misclassification cost. Resampling techniques by either oversampling the minority class or under-sampling the majority class have been proposed to mitigate the class imbalance problem but with minimal success. We propose a method of resolving the class imbalance problem with the design of a novel data-adaptive feature engineering model for extracting, selecting, and transforming textural features into a feature space that is inherently relevant to the application domain. We hypothesize that by maximizing the variance and preserving as much variability in well-engineered features prior to applying a classifier model will boost the differentiation of the thyroid nodules (benign or malignant) through effective model building. Our proposed a hybrid approach of applying Regression and Rule-Based techniques to build our Feature Engineering and a Bayesian Classifier respectively. In the Feature Engineering model, we transformed images pixel intensity values into a high dimensional structured dataset and fitting a regression analysis model to estimate relevant kernel parameters to be applied to the proposed filter method. We adopted an Elastic Net Regularization path to control the maximum log-likelihood estimation of the Regression model. Finally, we applied a Bayesian network inference to estimate a subset for the textural features with a significant conditional dependency in the classification of the thyroid lesion. This is performed to establish the conditional influence on the textural feature to the random factors generated through our feature engineering model and to evaluate the success criterion of our approach. The proposed approach was tested and evaluated on a public dataset obtained from thyroid cancer ultrasound diagnostic data. The analyses of the results showed that the classification performance had a significant improvement overall for accuracy and area under the curve when then proposed feature engineering model was applied to the data. We show that a high performance of 96.00% accuracy with a sensitivity and specificity of 99.64%) and 90.23% respectively was achieved for a filter size of 13 × 13

    Predictive Modelling Approach to Data-driven Computational Psychiatry

    Get PDF
    This dissertation contributes with novel predictive modelling approaches to data-driven computational psychiatry and offers alternative analyses frameworks to the standard statistical analyses in psychiatric research. In particular, this document advances research in medical data mining, especially psychiatry, via two phases. In the first phase, this document promotes research by proposing synergistic machine learning and statistical approaches for detecting patterns and developing predictive models in clinical psychiatry data to classify diseases, predict treatment outcomes or improve treatment selections. In particular, these data-driven approaches are built upon several machine learning techniques whose predictive models have been pre-processed, trained, optimised, post-processed and tested in novel computationally intensive frameworks. In the second phase, this document advances research in medical data mining by proposing several novel extensions in the area of data classification by offering a novel decision tree algorithm, which we call PIDT, based on parameterised impurities and statistical pruning approaches toward building more accurate decision trees classifiers and developing new ensemblebased classification methods. In particular, the experimental results show that by building predictive models with the novel PIDT algorithm, these models primarily led to better performance regarding accuracy and tree size than those built with traditional decision trees. The contributions of the proposed dissertation can be summarised as follow. Firstly, several statistical and machine learning algorithms, plus techniques to improve these algorithms, are explored. Secondly, prediction modelling and pattern detection approaches for the first-episode psychosis associated with cannabis use are developed. Thirdly, a new computationally intensive machine learning framework for understanding the link between cannabis use and first-episode psychosis was introduced. Then, complementary and equally sophisticated prediction models for the first-episode psychosis associated with cannabis use were developed using artificial neural networks and deep learning within the proposed novel computationally intensive framework. Lastly, an efficient novel decision tree algorithm (PIDT) based on novel parameterised impurities and statistical pruning approaches is proposed and tested with several medical datasets. These contributions can be used to guide future theory, experiment, and treatment development in medical data mining, especially psychiatry

    Spectral Analysis for Signal Detection and Classification : Reducing Variance and Extracting Features

    Get PDF
    Spectral analysis encompasses several powerful signal processing methods. The papers in this thesis present methods for finding good spectral representations, and methods both for stationary and non-stationary signals are considered. Stationary methods can be used for real-time evaluation, analysing shorter segments of an incoming signal, while non-stationary methods can be used to analyse the instantaneous frequencies of fully recorded signals. All the presented methods aim to produce spectral representations that have high resolution and are easy to interpret. Such representations allow for detection of individual signal components in multi-component signals, as well as separation of close signal components. This makes feature extraction in the spectral representation possible, relevant features include the frequency or instantaneous frequency of components, the number of components in the signal, and the time duration of the components. Two methods that extract some of these features automatically for two types of signals are presented in this thesis. One adapted to signals with two longer duration frequency modulated components that detects the instantaneous frequencies and cross-terms in the Wigner-Ville distribution, the other for signals with an unknown number of short duration oscillations that detects the instantaneous frequencies in a reassigned spectrogram. This thesis also presents two multitaper methods that reduce the influence of noise on the spectral representations. One is designed for stationary signals and the other for non-stationary signals with multiple short duration oscillations. Applications for the methods presented in this thesis include several within medicine, e.g. diagnosis from analysis of heart rate variability, improved ultrasound resolution, and interpretation of brain activity from the electroencephalogram

    Relevant data representation by a Kernel-based framework

    Get PDF
    Nowadays, the analysis of a large amount of data has emerged as an issue of great interest taking increasing place in the scientific community, especially in automation, signal processing, pattern recognition, and machine learning. In this sense, the identification, description, classification, visualization, and clustering of events or patterns are important problems for engineering developments and scientific issues, such as biology, medicine, economy, artificial vision, artificial intelligence, and industrial production. Nonetheless, it is difficult to interpret the available information due to its complexity and a large amount of obtained features. In addition, the analysis of the input data requires the development of methodologies that allow to reveal the relevant behaviors of the studied process, particularly, when such signals contain hidden structures varying over a given domain, e.g., space and/or time. When the analyzed signal contains such kind of properties, directly applying signal processing and machine learning procedures without considering a suitable model that deals with both the statistical distribution and the data structure, can lead in unstable performance results. Regarding this, kernel functions appear as an alternative approach to address the aforementioned issues by providing flexible mathematical tools that allow enhancing data representation for supporting signal processing and machine learning systems. Moreover, kernelbased methods are powerful tools for developing better-performing solutions by adapting the kernel to a given problem, instead of learning data relationships from explicit raw vector representations. However, building suitable kernels requires some user prior knowledge about input data, which is not available in most of the practical cases. Furthermore, using the definitions of traditional kernel methods directly, possess a challenging estimation problem that often leads to strong simplifications that restrict the kind of representation that we can use on the data. In this study, we propose a data representation framework based on kernel methods to learn automatically relevant sample relationships in learning systems. Namely, the proposed framework is divided into five kernel-based approaches, which aim to compute relevant data representations by adapting them according to both the imposed sample relationships constraints and the learning scenario (unsupervised or supervised task). First, we develop a kernel-based representation approach that allows revealing the main input sample relations by including relevant data structures using graph-based sparse constraints. Thus, salient data structures are highlighted aiming to favor further unsupervised clustering stages. This approach can be viewed as a graph pruning strategy within a spectral clustering framework which allows enhancing both the local and global data consistencies for a given input similarity matrix. Second, we introduce a kernel-based representation methodology that captures meaningful data relations in terms of their statistical distribution. Thus, an information theoretic learning (ITL) based penalty function is introduced to estimate a kernel-based similarity that maximizes the whole information potential variability. So, we seek for a reproducing kernel Hilbert space (RKHS) that spans the widest information force magnitudes among data points to support further clustering stages. Third, an entropy-like functional on positive definite matrices based on Renyi’s definition is adapted to develop a kernel-based representation approach which considers the statistical distribution and the salient data structures. Thereby, relevant input patterns are highlighted in unsupervised learning tasks. Particularly, the introduced approach is tested as a tool to encode relevant local and global input data relationships in dimensional reduction applications. Fourth, a supervised kernel-based representation is introduced via a metric learning procedure in RKHS that takes advantage of the user-prior knowledge, when available, regarding the studied learning task. Such an approach incorporates the proposed ITL-based kernel functional estimation strategy to adapt automatically the relevant representation using both the supervised information and the input data statistical distribution. As a result, relevant sample dependencies are highlighted by weighting the input features that mostly encode the supervised learning task. Finally, a new generalized kernel-based measure is proposed by taking advantage of different RKHSs. In this way, relevant dependencies are highlighted automatically by considering the input data domain-varying behavior and the user-prior knowledge (supervised information) when available. The proposed measure is an extension of the well-known crosscorrentropy function based on Hilbert space embeddings. Throughout the study, the proposed kernel-based framework is applied to biosignal and image data as an alternative to support aided diagnosis systems and image-based object analysis. Indeed, the introduced kernel-based framework improve, in most of the cases, unsupervised and supervised learning performances, aiding researchers in their quest to process and to favor the understanding of complex dataResumen: Hoy en día, el análisis de datos se ha convertido en un tema de gran interés para la comunidad científica, especialmente en campos como la automatización, el procesamiento de señales, el reconocimiento de patrones y el aprendizaje de máquina. En este sentido, la identificación, descripción, clasificación, visualización, y la agrupación de eventos o patrones son problemas importantes para desarrollos de ingeniería y cuestiones científicas, tales como: la biología, la medicina, la economía, la visión artificial, la inteligencia artificial y la producción industrial. No obstante, es difícil interpretar la información disponible debido a su complejidad y la gran cantidad de características obtenidas. Además, el análisis de los datos de entrada requiere del desarrollo de metodologías que permitan revelar los comportamientos relevantes del proceso estudiado, en particular, cuando tales señales contienen estructuras ocultas que varían sobre un dominio dado, por ejemplo, el espacio y/o el tiempo. Cuando la señal analizada contiene este tipo de propiedades, los rendimientos pueden ser inestables si se aplican directamente técnicas de procesamiento de señales y aprendizaje automático sin tener en cuenta la distribución estadística y la estructura de datos. Al respecto, las funciones núcleo (kernel) aparecen como un enfoque alternativo para abordar las limitantes antes mencionadas, proporcionando herramientas matemáticas flexibles que mejoran la representación de los datos de entrada. Por otra parte, los métodos basados en funciones núcleo son herramientas poderosas para el desarrollo de soluciones de mejor rendimiento mediante la adaptación del núcleo de acuerdo al problema en estudio. Sin embargo, la construcción de funciones núcleo apropiadas requieren del conocimiento previo por parte del usuario sobre los datos de entrada, el cual no está disponible en la mayoría de los casos prácticos. Por otra parte, a menudo la estimación de las funciones núcleo conllevan sesgos el modelo, siendo necesario apelar a simplificaciones matemáticas que no siempre son acordes con la realidad. En este estudio, se propone un marco de representación basado en métodos núcleo para resaltar relaciones relevantes entre los datos de forma automática en sistema de aprendizaje de máquina. A saber, el marco propuesto consta de cinco enfoques núcleo, en aras de adaptar la representación de acuerdo a las relaciones impuestas sobre las muestras y sobre el escenario de aprendizaje (sin/con supervisión). En primer lugar, se desarrolla un enfoque de representación núcleo que permite revelar las principales relaciones entre muestras de entrada mediante la inclusión de estructuras relevantes utilizando restricciones basadas en modelado por grafos. Por lo tanto, las estructuras de datos más sobresalientes se destacan con el objetivo de favorecer etapas posteriores de agrupamiento no supervisado. Este enfoque puede ser visto como una estrategia de depuración de grafos dentro de un marco de agrupamiento espectral que permite mejorar las consistencias locales y globales de los datos En segundo lugar, presentamos una metodología de representación núcleo que captura relaciones significativas entre muestras en términos de su distribución estadística. De este modo, se introduce una función de costo basada en aprendizaje por teoría de la información para estimar una similitud que maximice la variabilidad del potencial de información de los datos de entrada. Así, se busca un espacio de Hilbert generado por el núcleo que contenga altas fuerzas de información entre los puntos para favorecer el agrupamiento entre los mismos. En tercer lugar, se propone un esquema de representación que incluye un funcional de entropía para matrices definidas positivas a partir de la definición de Renyi. En este sentido, se pretenden incluir la distribución estadística de las muestras y sus estructuras relevantes. Por consiguiente, los patrones de entrada pertinentes se destacan en tareas de aprendizaje sin supervisión. En particular, el enfoque introducido se prueba como una herramienta para codificar las relaciones locales y globales de los datos en tareas de reducción de dimensión. En cuarto lugar, se introduce una metodología de representación núcleo supervisada a través de un aprendizaje de métrica en el espacio de Hilbert generado por una función núcleo en aras de aprovechar el conocimiento previo del usuario con respecto a la tarea de aprendizaje. Este enfoque incorpora un funcional por teoría de información que permite adaptar automáticamente la representación utilizando tanto información supervisada y la distribución estadística de los datos de entrada. Como resultado, las dependencias entre las muestras se resaltan mediante la ponderación de las características de entrada que codifican la tarea de aprendizaje supervisado. Por último, se propone una nueva medida núcleo mediante el aprovechamiento de diferentes espacios de representación. De este modo, las dependencias más relevantes entre las muestras se resaltan automáticamente considerando el dominio de interés de los datos de entrada y el conocimiento previo del usuario (información supervisada). La medida propuesta es una extensión de la función de cross-correntropia a partir de inmersiones en espacios de Hilbert. A lo largo del estudio, el esquema propuesto se valida sobre datos relacionados con bioseñales e imágenes como una alternativa para apoyar sistemas de apoyo diagnóstico y análisis objetivo basado en imágenes. De hecho, el marco introducido permite mejorar, en la mayoría de los casos, el rendimiento de sistemas de aprendizaje supervisado y no supervisado, favoreciendo la precisión de la tarea y la interpretabilidad de los datosDoctorad
    corecore