736 research outputs found

    Radar-based Feature Design and Multiclass Classification for Road User Recognition

    Full text link
    The classification of individual traffic participants is a complex task, especially for challenging scenarios with multiple road users or under bad weather conditions. Radar sensors provide an - with respect to well established camera systems - orthogonal way of measuring such scenes. In order to gain accurate classification results, 50 different features are extracted from the measurement data and tested on their performance. From these features a suitable subset is chosen and passed to random forest and long short-term memory (LSTM) classifiers to obtain class predictions for the radar input. Moreover, it is shown why data imbalance is an inherent problem in automotive radar classification when the dataset is not sufficiently large. To overcome this issue, classifier binarization is used among other techniques in order to better account for underrepresented classes. A new method to couple the resulting probabilities is proposed and compared to others with great success. Final results show substantial improvements when compared to ordinary multiclass classificationComment: 8 pages, 6 figure

    LDEB -- Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues

    Full text link
    Emotion recognition in conversations (ERC) is vital to the advancements of conversational AI and its applications. Therefore, the development of an automated ERC model using the concepts of machine learning (ML) would be beneficial. However, the conversational dialogues present a unique problem where each dialogue depicts nested emotions that entangle the association between the emotional feature descriptors and emotion type (or label). This entanglement that can be multiplied with the presence of data paucity is an obstacle for a ML model. To overcome this problem, we proposed a novel approach called Label Digitization with Emotion Binarization (LDEB) that disentangles the twists by utilizing the text normalization and 7-bit digital encoding techniques and constructs a meaningful feature space for a ML model to be trained. We also utilized the publicly available dataset called the FETA-DailyDialog dataset for feature learning and developed a hierarchical ERC model using random forest (RF) and artificial neural network (ANN) classifiers. Simulations showed that the ANN-based ERC model was able to predict emotion with the best accuracy and precision scores of about 74% and 76%, respectively. Simulations also showed that the ANN-model could reach a training accuracy score of about 98% with 60 epochs. On the other hand, the RF-based ERC model was able to predict emotions with the best accuracy and precision scores of about 78% and 75%, respectively.Comment: 10 pages, 3 figures, 4 table

    Combining PCFG-LA models with dual decomposition: a case study with function labels and binarization

    Get PDF
    It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suit- able for combination in this way. We experiment with the different models which result from alternative methods of extracting a gram- mar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 – this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model base- line. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to other grammar extraction strategies

    Predicting unstable software benchmarks using static source code features

    Full text link
    Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable

    Time to land prediction in Barcelona-El Prat airport based on machine learning classification models

    Get PDF
    This document contains a method to assess the landing time in Barcelona-El Prat airport using machine learning classification models. The goal of this project is not to predict a continuous variable but if the time to land falls inside one of the following categories: advanced, delayed or planned. Additionally, two categories called very advanced and very delayed have been included containing flights that have had abnormal values on their time to land, either because of very fast approaches or very slow ones. To obtain the data, an ADS-B antenna located on the aerospace and telecommunications engineering school of Castelldefels rooftop has been used. This antenna, captures the signals of all the arrivals into Barcelona which are later decoded thanks to an already existing program written in C#. By means of a custom program written in python, the most relevant characteristics of these flights are extracted and presented in a matrix format so that the different models can understand them. All the data has been scaled and divided in training and test samples. The first ones are used to teach the different models and the second ones are used to measure their efficiency. Six different models have been trained, four of them reached accuracy values over 60%, another one reached a value of 75% and another one could not go over 30%. The best three models, have been adjusted with two different techniques: random search and grid search. It has been verified that a random search is significantly better since the same results can be obtained but it requires much less time and computing resources. Also, different methods to enhance the results of the already adjusted models such as voting classifiers and boosters have been used. Finally, oversampling techniques have been implemented to solve the five categories problem as the extreme cases are very underrepresented. Thanks to this technique, the accuracy of these two categories was improved but in turn the overall accuracy descended. The performance of these models could be upgraded in the future if more flights or more characteristics of those flights were added

    Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives

    Full text link
    Machine Learning has been steadily gaining traction for its use in Anomaly-based Network Intrusion Detection Systems (A-NIDS). Research into this domain is frequently performed using the KDD~CUP~99 dataset as a benchmark. Several studies question its usability while constructing a contemporary NIDS, due to the skewed response distribution, non-stationarity, and failure to incorporate modern attacks. In this paper, we compare the performance for KDD-99 alternatives when trained using classification models commonly found in literature: Neural Network, Support Vector Machine, Decision Tree, Random Forest, Naive Bayes and K-Means. Applying the SMOTE oversampling technique and random undersampling, we create a balanced version of NSL-KDD and prove that skewed target classes in KDD-99 and NSL-KDD hamper the efficacy of classifiers on minority classes (U2R and R2L), leading to possible security risks. We explore UNSW-NB15, a modern substitute to KDD-99 with greater uniformity of pattern distribution. We benchmark this dataset before and after SMOTE oversampling to observe the effect on minority performance. Our results indicate that classifiers trained on UNSW-NB15 match or better the Weighted F1-Score of those trained on NSL-KDD and KDD-99 in the binary case, thus advocating UNSW-NB15 as a modern substitute to these datasets.Comment: Paper accepted into Proceedings of IEEE International Conference on Computing, Communication and Security 2018 (ICCCS-2018) Statistics: 8 pages, 7 tables, 3 figures, 34 reference

    Deep Learning Based Real Time Devanagari Character Recognition

    Get PDF
    The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are many algorithms and models that can be used for this purpose. This project tries to implement and optimize a deep learning-based model which will be able to recognize Devanagari script’s characters in real time by analyzing the hand movements
    corecore