65 research outputs found

    Support matrix machine: A review

    Full text link
    Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm

    Exploiting Universum data in AdaBoost using gradient descent

    Full text link
    Recently, Universum data that does not belong to any class of the training data, has been applied for training better classifiers. In this paper, we address a novel boosting algorithm called UAdaBoost that can improve the classification performance of AdaBoost with Universum data. UAdaBoost chooses a function by minimizing the loss for labeled data and Universum data. The cost function is minimized by a greedy, stagewise, functional gradient procedure. Each training stage of UAdaBoost is fast and efficient. The standard AdaBoost weights labeled samples during training iterations while UAdaBoost gives an explicit weighting scheme for Universum samples as well. In addition, this paper describes the practical conditions for the effectiveness of Universum learning. These conditions are based on the analysis of the distribution of ensemble predictions over training samples. Experiments on handwritten digits classification and gender classification problems are presented. As exhibited by our experimental results, the proposed method can obtain superior performances over the standard AdaBoost by selecting proper Universum data. © 2014 Elsevier B.V

    Epilepsy attacks recognition based on 1D octal pattern, wavelet transform and EEG signals

    Get PDF
    Electroencephalogram (EEG) signals have been generally utilized for diagnostic systems. Nowadays artificial intelligence-based systems have been proposed to classify EEG signals to ease diagnosis process. However, machine learning models have generally been used deep learning based classification model to reach high classification accuracies. This work focuses classification epilepsy attacks using EEG signals with a lightweight and simple classification model. Hence, an automated EEG classification model is presented. The used phases of the presented automated EEG classification model are (i) multileveled feature generation using one-dimensional (1D) octal-pattern (OP) and discrete wavelet transform (DWT). Here, main feature generation function is the presented octal-pattern. DWT is employed for level creation. By employing DWT frequency coefficients of the EEG signal is obtained and octal-pattern generates texture features from raw EEG signal and wavelet coefficients. This DWT and octal-pattern based feature generator extracts 128 × 8 = 1024 (Octal-pattern generates 128 features from a signal, 8 signal are used in the feature generation 1 raw EEG and 7 wavelet low-pass filter coefficients). (ii) To select the most useful features, neighborhood component analysis (NCA) is deployed and 128 features are selected. (iii) The selected features are feed to k nearest neighborhood classifier. To test this model, an epilepsy seizure dataset is used and 96.0% accuracy is attained for five categories. The results clearly denoted the success of the presented octal-pattern based epilepsy classification model

    Class-Imbalanced Complementary-Label Learning via Weighted Loss

    Full text link
    Complementary-label learning (CLL) is widely used in weakly supervised classification, but it faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples. In such scenarios, the number of samples in one class is considerably lower than in other classes, which consequently leads to a decline in the accuracy of predictions. Unfortunately, existing CLL approaches have not investigate this problem. To alleviate this challenge, we propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification. To tackle this problem, we propose a novel CLL approach called Weighted Complementary-Label Learning (WCLL). The proposed method models a weighted empirical risk minimization loss by utilizing the class-imbalanced complementary labels, which is also applicable to multi-class imbalanced training samples. Furthermore, we derive an estimation error bound to provide theoretical assurance. To evaluate our approach, we conduct extensive experiments on several widely-used benchmark datasets and a real-world dataset, and compare our method with existing state-of-the-art methods. The proposed approach shows significant improvement in these datasets, even in the case of multiple class-imbalanced scenarios. Notably, the proposed method not only utilizes complementary labels to train a classifier but also solves the problem of class imbalance.Comment: 9 pages, 9 figures, 3 table

    Detection and prediction problems with applications in personalized health care

    Full text link
    The United States health-care system is considered to be unsustainable due to its unbearably high cost. Many of the resources are spent on acute conditions rather than aiming at preventing them. Preventive medicine methods, therefore, are viewed as a potential remedy since they can help reduce the occurrence of acute health episodes. The work in this dissertation tackles two distinct problems related to the prevention of acute disease. Specifically, we consider: (1) early detection of incorrect or abnormal postures of the human body and (2) the prediction of hospitalization due to heart related diseases. The solution to the former problem could be used to prevent people from unexpected injuries or alert caregivers in the event of a fall. The latter study could possibly help improve health outcomes and save considerable costs due to preventable hospitalizations. For body posture detection, we place wireless sensor nodes on different parts of the human body and use the pairwise measurements of signal strength corresponding to all sensor transmitter/receiver pairs to estimate body posture. We develop a composite hypothesis testing approach which uses a Generalized Likelihood Test (GLT) as the decision rule. The GLT distinguishes between a set of probability density function (pdf) families constructed using a custom pdf interpolation technique. The GLT is compared with the simple Likelihood Test and Multiple Support Vector Machines. The measurements from the wireless sensor nodes are highly variable and these methods have different degrees of adaptability to this variability. Besides, these methods also handle multiple observations differently. Our analysis and experimental results suggest that GLT is more accurate and suitable for the problem. For hospitalization prediction, our objective is to explore the possibility of effectively predicting heart-related hospitalizations based on the available medical history of the patients. We extensively explored the ways of extracting information from patients' Electronic Health Records (EHRs) and organizing the information in a uniform way across all patients. We applied various machine learning algorithms including Support Vector Machines, AdaBoost with Trees, and Logistic Regression adapted to the problem at hand. We also developed a new classifier based on a variant of the likelihood ratio test. The new classifier has a classification performance competitive with those more complex alternatives, but has the additional advantage of producing results that are more interpretable. Following this direction of increasing interpretability, which is important in the medical setting, we designed a new method that discovers hidden clusters and, at the same time, makes decisions. This new method introduces an alternating clustering and classification approach with guaranteed convergence and explicit performance bounds. Experimental results with actual EHRs from the Boston Medical Center demonstrate prediction rate of 82% under 30% false alarm rate, which could lead to considerable savings when used in practice

    Semi-supervised machine learning techniques for classification of evolving data in pattern recognition

    Get PDF
    The amount of data recorded and processed over recent years has increased exponentially. To create intelligent systems that can learn from this data, we need to be able to identify patterns hidden in the data itself, learn these pattern and predict future results based on our current observations. If we think about this system in the context of time, the data itself evolves and so does the nature of the classification problem. As more data become available, different classification algorithms are suitable for a particular setting. At the beginning of the learning cycle when we have a limited amount of data, online learning algorithms are more suitable. When truly large amounts of data become available, we need algorithms that can handle large amounts of data that might be only partially labeled as a result of the bottleneck in the learning pipeline from human labeling of the data. An excellent example of evolving data is gesture recognition, and it is present throughout our work. We need a gesture recognition system to work fast and with very few examples at the beginning. Over time, we are able to collect more data and the system can improve. As the system evolves, the user expects it to work better and not to have to become involved when the classifier is unsure about decisions. This latter situation produces additional unlabeled data. Another example of an application is medical classification, where experts’ time is a rare resource and the amount of received and labeled data disproportionately increases over time. Although the process of data evolution is continuous, we identify three main discrete areas of contribution in different scenarios. When the system is very new and not enough data are available, online learning is used to learn after every single example and to capture the knowledge very fast. With increasing amounts of data, offline learning techniques are applicable. Once the amount of data is overwhelming and the teacher cannot provide labels for all the data, we have another setup that combines labeled and unlabeled data. These three setups define our areas of contribution; and our techniques contribute in each of them with applications to pattern recognition scenarios, such as gesture recognition and sketch recognition. An online learning setup significantly restricts the range of techniques that can be used. In our case, the selected baseline technique is the Evolving TS-Fuzzy Model. The semi-supervised aspect we use is a relation between rules created by this model. Specifically, we propose a transductive similarity model that utilizes the relationship between generated rules based on their decisions about a query sample during the inference time. The activation of each of these rules is adjusted according to the transductive similarity, and the new decision is obtained using the adjusted activation. We also propose several new variations to the transductive similarity itself. Once the amount of data increases, we are not limited to the online learning setup, and we can take advantage of the offline learning scenario, which normally performs better than the online one because of the independence of sample ordering and global optimization with respect to all samples. We use generative methods to obtain data outside of the training set. Specifically, we aim to improve the previously mentioned TS Fuzzy Model by incorporating semi-supervised learning in the offline learning setup without unlabeled data. We use the Universum learning approach and have developed a method called UFuzzy. This method relies on artificially generated examples with high uncertainty (Universum set), and it adjusts the cost function of the algorithm to force the decision boundary to be close to the Universum data. We were able to prove the hypothesis behind the design of the UFuzzy classifier that Universum learning can improve the TS Fuzzy Model and have achieved improved performance on more than two dozen datasets and applications. With increasing amounts of data, we use the last scenario, in which the data comprises both labeled data and additional non-labeled data. This setting is one of the most common ones for semi-supervised learning problems. In this part of our work, we aim to improve the widely popular tecjniques of self-training (and its successor help-training) that are both meta-frameworks over regular classifier methods but require probabilistic representation of output, which can be hard to obtain in the case of discriminative classifiers. Therefore, we develop a new algorithm that uses the modified active learning technique Query-by-Committee (QbC) to sample data with high certainty from the unlabeled set and subsequently embed them into the original training set. Our new method allows us to achieve increased performance over both a range of datasets and a range of classifiers. These three works are connected by gradually relaxing the constraints on the learning setting in which we operate. Although our main motivation behind the development was to increase performance in various real-world tasks (gesture recognition, sketch recognition), we formulated our work as general methods in such a way that they can be used outside a specific application setup, the only restriction being that the underlying data evolve over time. Each of these methods can successfully exist on its own. The best setting in which they can be used is a learning problem where the data evolve over time and it is possible to discretize the evolutionary process. Overall, this work represents a significant contribution to the area of both semi-supervised learning and pattern recognition. It presents new state-of-the-art techniques that overperform baseline solutions, and it opens up new possibilities for future research

    Network Parameterisation and Activation Functions in Deep Learning

    Get PDF
    Deep learning, the study of multi-layered artificial neural networks, has received tremendous attention over the course of the last few years. Neural networks are now able to outperform humans in a growing variety of tasks and increasingly have an impact on our day-to-day lives. There is a wide range of potential directions to advance deep learning, two of which we investigate in this thesis:(1) One of the key components of a network are its activation functions. The activations have a big impact on the overall mathematical form of the network. The \textit{first paper} studies generalisation of neural networks with rectified linear activations units (“ReLUs”). Such networks partition the input space into so-called linear regions, which are the maximally connected subsets on which the network is affine. In contrast to previous work, which focused on obtaining estimates of the number of linear regions, we proposed a tropical algebra-based algorithm called TropEx to extract coefficients of the linear regions. Applied to fully-connected and convolutional neural networks, TropEx shows significant differences between the linear regions of these network types. The \textit{second paper} proposes a parametric rational activation function called ERA, which is learnable during network training. Although ERA only adds about ten parameters per layer, the activation significantly increases network expressivity and makes small architectures have a performance close to large ones. ERA outperforms previous activations when used in small architectures. This is relevant because neural networks keep growing larger and larger and the computational resources they require result in greater costs and electricity usage (which in turn increases the CO2 footprint).(2) For a given network architecture, each parameter configuration gives rise to a mathematical function. This functional realisation is far from unique and many different parameterisations can give rise to the same function. Changes to the parameterisation that do not change the function are called symmetries. The \textit{third paper} theoretically studies and classifies all the symmetries of 2-layer networks using the ReLU activation. Finally, the \textit{fourth paper} studies the effect of network parameterisation on network training. We provide a theoretical analysis of the effect that scaling layers have on the gradient updates. This provides a motivation for us to propose a Cooling method, which automatically scales the network parameters during training. Cooling reduces the reliance of the network on specific tricks, in particular the use of a learning rate schedule

    Solución rápida y automática de parámetros hipocentrales para eventos sísmicos, mediante el empleo de técnicas de aprendizaje de máquina

    Get PDF
    La generación de alertas tempranas para sismos es de gran utilidad, en particular para la ciudad de Bogotá-Colombia, dada su importancia social y económica para el país. Con base en la información de la estación sismológica de El Rosal, la cual es una estación de banda ancha y tres componentes, localizada muy cerca de la ciudad, perteneciente al Servicio Geológico Colombiano (SGC) se desarrolló un modelo de regresión basado en máquinas de vectores de soporte (SVM), con un kernel polinomial normalizado, usando como datos de entrada algunas características de la porción inicial de la onda P empleadas en trabajos anteriores tales como la amplitud máxima, los coeficientes de regresión lineal de los mismos, los parámetros de ajuste logarítmico de la envolvente y los valores propios de la relación de las tres componentes del sismograma. El modelo fue entrenado y evaluado aplicando correlación cruzada, permitiendo llevar a cabo el cálculo de la magnitud y la localización de un evento sísmico con una longitud de señal de tan solo cinco segundos. Con el modelo propuesto se logró la determinación de la magnitud local con una precisión de 0.19 unidades de magnitud, la distancia epicentral con una precisión de alrededor de 11 kilómetros, la profundidad hipocentral con una precisión de aproximadamente 40 kilómetros y el azimut de llegada con una precisión de 45°. Las precisiones obtenidas en magnitud y distancia epicentral son mejores que las encontradas en trabajos anteriores, donde se emplean gran número de eventos para la determinación del modelo y en los demás parámetros hipocentrales son del mismo orden. Este trabajo de investigación realiza un aporte considerable en la generación de alertas tempranas para sismos, no solamente para el país sino para cualquier otro lugar donde se deseen implementar los modelos aquí propuestos y es un excelente punto de partida para investigaciones futuras.Abstract. Earthquake early warning alerts generation is very useful, especially for the city of Bogotá-Colombia, given the social and economic importance of this city for the country. Based on the information from the seismological station “El Rosal”, which is a broadband and three components station, located very near the city that belongs to the Servicio Geológico Colombiano (SGC) a Support Vector Machine Regression (SVMR) model was developed, using a Normalized Polynomial Kernel, using as input some characteristics of the initial portion of the P wave used in earlier works such as the maximum amplitude, the linear regression coefficients of such amplitudes, the logarithmic adjustment parameters of the envelope of the waveform and the eigenvalues of the relationship between the three seismogram components of each band. The model was trained and evaluated by applying a cross-correlation strategy, allowing to calculate the magnitude and location of a seismic event with only five seconds of signal. With the proposed model it was possible to estimate local magnitude with an accuracy of 0.19 units of magnitude, epicentral distance with an accuracy of about 11 km, the hipocentral depth with a precision of approximately 40 km and the arrival back-azimut with a precision of 45°. Accuracies obtained in magnitude and epicentral distance are better that those found in earlier works, where a large number of events were used for model determination, and the other hipocentral parameters precisions obtained here are of the same order. This research work makes a considerable contribution in the generation of seismic early warning alerts, not only for the country but for any other place where proposed models here can be applied and is a very good starting point for future research.Doctorad
    corecore