11 research outputs found

    Bayes classifiers for imbalanced traffic accidents datasets

    Full text link
    [EN] Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. (C) 2015 Elsevier Ltd. All rights reserved.The authors are grateful to the Police Traffic Department in Jordan for providing the data necessary for this research. Griselda Lopez wishes to express her acknowledgement to the regional ministry of Economy, Innovation and Science of the regional government of Andalusia (Spain) for their scholarship to train teachers and researchers in Deficit Areas, which has made this work possible. The authors appreciate the reviewers' comments and effort in order to improve the paper.Mujalli, R.; López-Maldonado, G.; Garach, L. (2016). Bayes classifiers for imbalanced traffic accidents datasets. Accident Analysis & Prevention. 88:37-51. https://doi.org/10.1016/j.aap.2015.12.003S37518

    Recognition of Promoters in DNA Sequences Using Weightily Averaged One-dependence Estimators

    Get PDF
    AbstractThe completion of the human genome project in the last decade has generated a strong demand in computational analysis techniques in order to fully exploit the acquired human genome database. The human genome project generated a perplexing mass of genetic data which necessitates automatic genome annotation. There is a growing interest in the process of gene finding and gene recognition from DNA sequences. In genetics, a promoter is a segment of a DNA that marks the starting point of transcription of a particular gene. Therefore, recognizing promoters is a one step towards gene finding in DNA sequences. Promoters also play a fundamental role in many other vital cellular processes. Aberrant promoters can cause a wide range of diseases including cancers. This paper describes a state-of-the-art machine learning based approach called weightily averaged one-dependence estimators to tackle the problem of recognizing promoters in genetic sequences. To lower the computational complexity and to increase the generalization capability of the system, we employ an entropy-based feature extraction approach to select relevant nucleotides that are directly responsible for promoter recognition. We carried out experiments on a dataset extracted from the biological literature for a proof-of-concept. The proposed system has achieved an accuracy of 97.17% in classifying promoters. The experimental results demonstrate the efficacy of our framework and encourage us to extend the framework to recognize promoter sequences in various species of higher eukaryotes

    Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning

    Get PDF
    The original article is available on the Taylor & Francis Online website in the following link: http://www.tandfonline.com/doi/abs/10.1080/10447318.2016.1159799?journalCode=hihc20This paper describes the validation study of our software that uses combined webcam and microphone data for real-time, continuous, unobtrusive emotion recognition as part of our FILTWAM framework. FILTWAM aims at deploying a real time multimodal emotion recognition method for providing more adequate feedback to the learners through an online communication skills training. Herein, timely feedback is needed that reflects on their shown intended emotions and which is also useful to increase learners’ awareness of their own behaviour. At least, a reliable and valid software interpretation of performed face and voice emotions is needed to warrant such adequate feedback. This validation study therefore calibrates our software. The study uses a multimodal fusion method. Twelve test persons performed computer-based tasks in which they were asked to mimic specific facial and vocal emotions. All test persons’ behaviour was recorded on video and two raters independently scored the showed emotions, which were contrasted with the software recognition outcomes. A hybrid method for multimodal fusion of our multimodal software shows accuracy between 96.1% and 98.6% for the best-chosen WEKA classifiers over predicted emotions. The software fulfils its requirements of real-time data interpretation and reliable results.The Netherlands Laboratory for Lifelong Learning (NELLL) of the Open University Netherlands

    SODE: Self-Adaptive One-Dependence Estimators for classification

    Full text link
    © 2015 Elsevier Ltd. SuperParent-One-Dependence Estimators (SPODEs) represent a family of semi-naive Bayesian classifiers which relax the attribute independence assumption of Naive Bayes (NB) to allow each attribute to depend on a common single attribute (superparent). SPODEs can effectively handle data with attribute dependency but still inherent NB's key advantages such as computational efficiency and robustness for high dimensional data. In reality, determining an optimal superparent for SPODEs is difficult. One common approach is to use weighted combinations of multiple SPODEs, each having a different superparent with a properly assigned weight value (i.e., a weight value is assigned to each attribute). In this paper, we propose a self-adaptive SPODEs, namely SODE, which uses immunity theory in artificial immune systems to automatically and self-adaptively select the weight for each single SPODE. SODE does not need to know the importance of individual SPODE nor the relevance among SPODEs, and can flexibly and efficiently search optimal weight values for each SPODE during the learning process. Extensive experiments and comparisons on 56 benchmark data sets, and validations on image and text classification, demonstrate that SODE outperforms state-of-the-art weighted SPODE algorithms and is suitable for a wide range of learning tasks. Results also confirm that SODE provides an appropriate balance between runtime efficiency and accuracy

    Efficient lazy elimination for averaged one-dependence estimators

    No full text

    Efficient lazy elimination for averaged one-dependence estimators

    No full text
    Semi-naive Bayesian classifiers seek to retain the numerous strengths of naive Bayes while reducing error by relaxing the attribute independence assumption. Backwards Sequential Elimination (BSE) is a wrapper technique for attribute elimination that has proved effective at this task. We explore a new technique, Lazy Elimination (LE), which eliminates highly related attribute-values at classification time without the computational overheads inherent in wrapper techniques. We analyze the effect of LE and BSE on a state-of-the-art semi-naive Bayesian algorithm Averaged One-Dependence Estimators (AODE). Our experiments show that LE significantly reduces bias and error without undue computation, while BSE significantly reduces bias but not error, with high training time complexity. In the context of AODE, LE has a significant advantage over BSE in both computational efficiency and error. 1

    Data exploration by using the monotonicity property

    Get PDF
    Dealing with different misclassification costs has been a big problem for classification. Some algorithms can predict quite accurately when assuming the misclassification costs for each class are the same, like most rule induction methods. However, when the misclassification costs change, which is a common phenomenon in reality, these algorithms are not capable of adjusting their results. Some other algorithms, like the Bayesian methods, have the ability to yield probabilities of a certain unclassified example belonging to given classes, which is helpful to make modification on the results according to different misclassification costs. The shortcoming of such algorithms is, when the misclassification costs for each class are the same, they do not generate the most accurate results. This thesis attempts to incorporate the merits of both kinds of algorithms into one. That is, to develop a new algorithm which can predict relatively accurately and can adjust to the change of misclassification costs. The strategy of the new algorithm is to create a weighted voting system. A weighted voting system will evaluate the evidence of the new example belonging to each class, calculate the assessment of probabilities for the example, and assign the example to a certain class according to the probabilities as well as the misclassification costs. The main problem of creating a weighted voting system is to decide the optimal weights of the individual votes. To solve this problem, we will mainly refer to the monotonicity property. People have found the monotonicity property does not only exist in pure monotone systems, but also exists in non-monotone systems. Since the study of the monotonicity property has been a huge success on monotone systems, it is only natural to apply the monotonicity property to non-monotone systems too. This thesis deals only with binary systems. Though such systems hardly exist in practice, this treatment provides concrete ideas for the development of general solution algorithms. After the final algorithm has been formulated, it has been tested on a wide range of randomly generated synthetic datasets. It has also been compared with other existing classifiers. The results indicate this algorithm performs both effectively and efficiently

    Social talk capabilities for dialogue systems

    Get PDF
    Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or social talk) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called out-of-domain (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads

    Social talk capabilities for dialogue systems

    Get PDF
    Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or “social talk”) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called “out-of-domain” (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads
    corecore