    Adversarial Samples on Android Malware Detection Systems for IoT Systems

    Many IoT(Internet of Things) systems run Android systems or Android-like systems. With the continuous development of machine learning algorithms, the learning-based Android malware detection system for IoT devices has gradually increased. However, these learning-based detection models are often vulnerable to adversarial samples. An automated testing framework is needed to help these learning-based malware detection systems for IoT devices perform security analysis. The current methods of generating adversarial samples mostly require training parameters of models and most of the methods are aimed at image data. To solve this problem, we propose a \textbf{t}esting framework for \textbf{l}earning-based \textbf{A}ndroid \textbf{m}alware \textbf{d}etection systems(TLAMD) for IoT Devices. The key challenge is how to construct a suitable fitness function to generate an effective adversarial sample without affecting the features of the application. By introducing genetic algorithms and some technical improvements, our test framework can generate adversarial samples for the IoT Android Application with a success rate of nearly 100\% and can perform black-box testing on the system

    Standardizing catch per unit effort by machine learning techniques in longline fisheries: a case study of bigeye tuna in the Atlantic Ocean

    Support vector machine (SVM) is shown to have better performance in catch per unit of effort (CPUE) standardization than other methods. The SVM performance highly relates to its parameters selection and has not been discussed in CPUE standardization. Analyzing the influence of parameter selection on SVM performance for CPUE standardization could improve model construction and performance, and thus provide useful information to stock assessment and management. We applied SVM to standardize longline catch per unit fishing effort of fishery data for bigeye tuna (Thunnus obesus) in the tropical fishing area of Atlantic Ocean and evaluated three parameters optimization methods: a Grid Search method, and two improved hybrid algorithms, namely SVMs in combination with the particle swarm optimization (PSO-SVM), and genetic algorithms (GA-SVM), in order to increase the strength of SVM. The mean absolute error (MAE), mean square error (MSE), three types of correlation coefficients and the normalized mean square error (NMSE) were computed to compare the algorithm performances. The PSO-SVM and GA-SVM algorithms had particularly high performances of indicative values in the training data and dataset, and the performances of PSO-SVM were marginally better than GA-SVM. The Grid search algorithm had best performances of indicative values in testing data. In general, PSO was appropriate to optimize the SVM parameters in CPUE standardization. The standardized CPUE was unstable and low from 2007 to 2011, increased during 2011- 2013, then decreased from 2015 to 2017. The abundance index was lower compared with before 2000 and showed a decreasing trend in recent years

    A New Terrain Classification Framework Using Proprioceptive Sensors for Mobile Robots

    Mobile robots that operate in real-world environments interact with the surroundings to generate complex acoustics and vibration signals, which carry rich information about the terrain. This paper presents a new terrain classification framework that utilizes both acoustics and vibration signals resulting from the robot-terrain interaction. As an alternative to handcrafted domain-specific feature extraction, a two-stage feature selection method combining ReliefF and mRMR algorithms was developed to select optimal feature subsets that carry more discriminative information. As different data sources can provide complementary information, a multiclassifier combination method was proposed by considering a priori knowledge and fusing predictions from five data sources: one acoustic data source and four vibration data sources. In this study, four conceptually different classifiers were employed to perform the classification, each with a different number of optimal features. Signals were collected using a tracked robot moving at three different speeds on six different terrains. The new framework successfully improved classification performance of different classifiers using the newly developed optimal feature subsets. The greater improvement was observed for robot traversing at lower speeds

    Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification

    Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionality of feature space. These techniques are expected particularly to improve efficiency, accuracy, and comprehensibility of the classification models in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank a term. Existing feature selection techniques (e.g. RDC, NRDC) consider frequently occurring terms and ignore rarely occurring terms count in a class. However, this study proposes the Improved Relative Discriminative Criterion (IRDC) technique which considers rarely occurring terms count. It is argued that rarely occurring terms count are also meaningful and important as frequently occurring terms in a class. The proposed IRDC is compared to the most recent feature selection techniques RDC and NRDC. The results reveal significant improvement by the proposed IRDC technique for feature selection in terms of precision 27%, recall 30%, macro-average 35% and micro- average 30%. Additionally, this study also proposes a hybrid algorithm named: Ringed Seal Search-Support Vector Machine (RSS-SVM) to improve the generalization and learning capability of the SVM. The proposed RSS-SVM optimizes kernel and penalty parameter with the help of RSS algorithm. The proposed RSS-SVM is compared to the most recent techniques GA-SVM and CS-SVM. The results show significant improvement by the proposed RSS-SVM for classification in terms of accuracy 18.8%, recall 15.68%, precision 15.62% and specificity 13.69%. In conclusion, the proposed IRDC has shown better performance as compare to existing techniques because its capability in considering rare and informative terms. Additionally, the proposed RSS- SVM has shown better performance as compare to existing techniques because it has capability to improve balance between exploration and exploitation

    Feature selection model based on EEG signals for assessing the cognitive workload in drivers

    In recent years, research has focused on generating mechanisms to assess the levels of subjects’ cognitive workload when performing various activities that demand high concentration levels, such as driving a vehicle. These mechanisms have implemented several tools for analyzing the cognitive workload, and electroencephalographic (EEG) signals have been most frequently used due to their high precision. However, one of the main challenges in implementing the EEG signals is finding appropriate information for identifying cognitive states. Here, we present a new feature selection model for pattern recognition using information from EEG signals based on machine learning techniques called GALoRIS. GALoRIS combines Genetic Algorithms and Logistic Regression to create a new fitness function that identifies and selects the critical EEG features that contribute to recognizing high and low cognitive workloads and structures a new dataset capable of optimizing the model’s predictive process. We found that GALoRIS identifies data related to high and low cognitive workloads of subjects while driving a vehicle using information extracted from multiple EEG signals, reducing the original dataset by more than 50% and maximizing the model’s predictive capacity, achieving a precision rate greater than 90%.This work has been funded by the Ministry of Science, Innovation and Universities of Spain under grant number TRA2016-77012-RPeer ReviewedPostprint (published version

    Ensemble learning using multi-objective optimisation for arabic handwritten words

    Arabic handwriting recognition is a dynamic and stimulating field of study within pattern recognition. This system plays quite a significant part in today's global environment. It is a widespread and computationally costly function due to cursive writing, a massive number of words, and writing style. Based on the literature, the existing features lack data supportive techniques and building geometric features. Most ensemble learning approaches are based on the assumption of linear combination, which is not valid due to differences in data types. Also, the existing approaches of classifier generation do not support decision-making for selecting the most suitable classifier, and it requires enabling multi-objective optimisation to handle these differences in data types. In this thesis, new type of feature for handwriting using Segments Interpolation (SI) to find the best fitting line in each of the windows with a model for finding the best operating point window size for SI features. Multi-Objective Ensemble Oriented (MOEO) formulated to control the classifier topology and provide feedback support for changing the classifiers' topology and weights based on the extension of Non-dominated Sorting Genetic Algorithm (NSGA-II). It is designated as the Random Subset based Parents Selection (RSPS-NSGA-II) to handle neurons and accuracy. Evaluation metrics from two perspectives classification and Multiobjective optimization. The experimental design based on two subsets of the IFN/ENIT database. The first one consists of 10 classes (C10) and 22 classes (C22). The features were tested with Support Vector Machine (SVM) and Extreme Learning Machine (ELM). This work improved due to the SI feature. SI shows a significant result with SVM with 88.53% for C22. RSPS for C10 at k=2 achieved 91% accuracy with fewer neurons than NSGA-II, and for C22 at k=10, accuracy has been increased 81% compared to NSGA-II 78%. Future work may consider introducing more features to the system, applying them to other languages, and integrating it with sequence learning for more accuracy

    Development of cognitive workload models to detect driving impairment

    Tesi redactada en castellàDriving a vehicle is a complex activity exposed to continuous changes such as speed limits and vehicular traffic. Drivers require a high degree of concentration when performing this activity, increasing the amount of mental demand known as cognitive workload, causing vehicular accidents to the minimum negligence. In fact, human error is the leading contributing factor in over 90% of road accidents. In recent years, the subjects' cognitive workload levels while driving a vehicle have been predicted using subjective and vehicle performance tools. Other research has emphasized the use and analysis of physiological information, where electroencephalographic (EEG) signals are the most used to identify cognitive states due to their high precision. Although significant progress has been made in this area, these investigations have been based on traditional techniques or data analysis from a specific source due to the information's complexity. A new trend has been opened in the study of the internal behavior of subjects by implementing machine learning techniques to analyze information from various sources. However, there are still several challenges to face in this new line of research. This doctoral thesis presents a new model to predict the states of low and high cognitive workload of subjects when facing scenarios of driving a vehicle called GALoRSI-SVMRBF (Genetic Algorithms and Logistic Regression for the Structuring of Information-Support Vector Machine with Radial Basis Function Kernel). GALoRSI-SVMRBF is developed using machine learning algorithms based on information from EEG signals. Also, the information collected from NASA-TLX, instant online self-assessment and the error rate measure are implemented in the model. First, GALoRSI-SVMRBF proposes a new method for pattern recognition based on feature selection that combines statistical tests, genetic algorithms, and logistic regression. This method consists mainly of selecting an EEG dataset and exploring the information to identify the key features that recognize cognitive states. The selected data are defined as an index for pattern recognition and used to structure a new dataset capable of optimizing the model's learning and classification process. Second, the methodology and development of a classifier for the prediction model are presented, implementing machine learning algorithms. The classifier is developed mainly in two phases, defined as training and testing. Once the prediction model has been developed, this thesis presents the validation phase of GALoRSI-SVMRBF. The validation consists of evaluating the model's adaptability to new datasets, maintaining a high prediction rate. Finally, an analysis of the performance of GALoRSI-SVMRBF is presented. The objective is to know the model's scope and limitations, evaluating various performance metrics to find the optimal configuration for GALoRSI-SVMRBF. We found that GALoRSI-SVMRBF successfully predicts low and high cognitive workload of subjects while driving a vehicle. In general, it is observed that the model uses the information extracted from multiple EEG signals, reducing the original dataset by more than 50%, maximizing its predictive capacity, achieving a precision rate of >90% in the classification of the information. During this thesis, the experiments showed that obtaining a high percentage of prediction depends on several factors, from applying a useful collection technique data until the last step of the prediction model.La conducción de un vehículo es una actividad compleja que está expuesta a demandas que cambian continuamente por diferentes factores, tales como, el límite de velocidad, obstáculos en la vía, tráfico vehicular, entre otros. Al desempeñar esta actividad, los conductores requieren un alto grado de concentración incrementando la cantidad de demanda mental conocida como carga. En los últimos años, se han propuesto mecanismos para monitorear y/o predecir los niveles de carga cognitiva de los sujetos al conducir un vehículo, centrándose en el uso de herramientas subjetivas y de rendimiento vehicular. Otras investigaciones, han enfatizado en el uso y análisis de la información fisiológica, siendo las señales electroencefalográficas (EEG) las más utilizadas para identificar los estados cognitivos por su alta precisión. A pesar del gran avance realizado, estas investigaciones se han basado en técnicas tradicionales o en el análisis de la información proveniente de fuentes específicas para identificar el estado interno del sujeto, obteniendo modelos sobreentrenados o robustos, incrementando el tiempo de análisis afectando el desempeño del modelo. En esta tesis doctoral se presenta un nuevo modelo para predecir los estados de baja y alta carga cognitiva de los sujetos al enfrentarse a escenarios de la conducción de un vehículo denominado GALoRSI-SVMRBF (Genetic Algorithms and Logistic Regression for the Structuring of Information-Support Vector Machine with Radial Basis Function Kernel). GALoRSI-SVMRBF fue desarrollado utilizando los algoritmos de aprendizaje automático y técnicas estadísticas basado en la información proveniente de las señales EEG. Primero, GALoRSI-SVMRBF crea una base de datos extrayendo las características que serán utilizadas en el modelo a través de técnicas estadísticas. Posteriormente, propone un nuevo método para el reconocimiento de patrones basado en la selección de características que combina pruebas estadísticas, algoritmos genéticos y regresión logística. Este método consiste principalmente en seleccionar un conjunto de datos EEG y explorar la combinación de la información para identificar las características claves que contribuyan al reconocimiento de dos estados cognitivos. Después, la información seleccionada es definida como un índice para el reconocimiento de patrones y utilizada para estructurar un nuevo conjunto de datos que soporta información de uno o múltiples canales para optimizar el proceso de aprendizaje y clasificación del modelo. Por último, es desarrollado el clasificador del modelo de predicciones el cual consiste en dos etapas definidas como entrenamiento y prueba. Nosotros encontramos que GALoRSI-SVMRBF predice de manera exitosa la carga cognitiva baja y alta de los sujetos durante la conducción de un vehículo. En general, se observó que el modelo utiliza la información extraída de una o múltiples señales EEG y logrando una tasa de precisión >90% en la clasificación de la informaciónPostprint (published version