181 research outputs found

    Multilayer Feedforward Neural Network for Internet Traffic Classification

    Get PDF
    Recently, the efficient internet traffic classification has gained attention in order to improve service quality in IP networks. But the problem with the existing solutions is to handle the imbalanced dataset which has high uneven distribution of flows between the classes. In this paper, we propose a multilayer feedforward neural network architecture to handle the high imbalanced dataset. In the proposed model, we used a variation of multilayer perceptron with 4 hidden layers (called as mountain mirror networks) which does the feature transformation effectively. To check the efficacy of the proposed model, we used Cambridge dataset which consists of 248 features spread across 10 classes. Experimentation is carried out for two variants of the same dataset which is a standard one and a derived subset. The proposed model achieved an accuracy of 99.08% for highly imbalanced dataset (standard)

    A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis

    Get PDF
    Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets

    ASA 2021 Statistics and Information Systems for Policy Evaluation

    Get PDF
    This book includes 25 peer-reviewed short papers submitted to the Scientific Opening Conference titled “Statistics and Information Systems for Policy Evaluation”, aimed at promoting new statistical methods and applications for the evaluation of policies and organized by the Association for Applied Statistics (ASA) and the Department of Statistics, Computer Science, Applications DiSIA “G. Parenti” of the University of Florence, jointly with the partners AICQ (Italian Association for Quality Culture), AICQ-CN (Italian Association for Quality Culture North and Centre of Italy), AISS (Italian Academy for Six Sigma), ASSIRM (Italian Association for Marketing, Social and Opinion Research), Comune di Firenze, the SIS – Italian Statistical Society, Regione Toscana and Valmon – Evaluation & Monitoring

    Sound-production Related Cognitive Tasks for Onset Detection in Self-Paced Brain-Computer Interfaces

    Get PDF
    Objective. The main goal of this research is proposing a novel method of onset detection for Self-Paced (SP) Brain-Computer Interfaces (BCIs) to increase usability and practicality of BCIs towards real-world uses from laboratory research settings. Approach. To achieve this goal, various Sound-Production Related Cognitive Tasks (SPRCTs) were tested against idle state in offline and simulated-online experiments. An online experiment was then conducted that turned a messenger dialogue on when a new message arrived by executing the Sound Imagery (SI) onset detection task in real-life scenarios (e.g. watching video, reading text). The SI task was chosen as an onset task because of its advantages over other tasks: 1) Intuitiveness. 2) Beneficial for people with motor disabilities. 3) No significant overlap with other common, spontaneous cognitive states becoming easier to use in daily-life situations. 4) No dependence on user’s mother language. Main results. The final online experimental results showed the new SI onset task had significantly better performance than the Motor Imagery (MI) approach. 84.04% (SI) vs 66.79% (MI) TFP score for sliding image scenario, 80.84% vs 61.07% for watching video task. Furthermore, the onset response speed showed the SI task being significantly faster than MI. In terms of usability, 75% of subjects answered SI was easier to use. Significance. The new SPRCT outperforms typical MI for SP onset detection BCIs (significantly better performance, faster onset response and easier usability), therefore it would be more easily used in daily-life situations. Another contribution of this thesis is a novel EMG artefact-contaminated EEG channel selection and handling method that showed significant class separation improvement against typical blind source separation techniques. A new performance evaluation metric for SP BCIs, called true-false positive score was also proposed as a standardised performance assessment method that considers idle period length, which was not considered in other typical metrics

    An investigation into the issues of multi-agent data mining

    Get PDF
    Multi-agent systems (MAS) often deal with complex applications that require distributedproblem solving. In many applications the individual and collective behaviourof the agents depends on the observed data from distributed sources. The field of DistributedData Mining (DDM) deals with these challenges in analyzing distributed dataand offers many algorithmic solutions to perform different data analysis and miningoperations in a fundamentally distributed manner that pays careful attention to the resourceconstraints. Since multi-agent systems are often distributed and agents haveproactive and reactive features, combining DM with MAS for data intensive applicationsis therefore appealing.This Chapter discusses a number of research issues concerned with the use ofMulti-Agent Systems for Data Mining (MADM), also known as agent-driven datamining. The Chapter also examines the issues affecting the design and implementationof a generic and extendible agent-based data mining framework. An ExtendibleMulti-Agent Data mining System (EMADS) Framework for integrating distributeddata sources is presented. This framework achieves high-availability and highperformance without compromising the data integrity and security. © 2010 Nova Science Publishers, Inc. All rights reserved

    Growth Econometrics

    Get PDF
    This paper provides a survey and synthesis of econometric tools that have been employed to study economic growth. While these tools range across a variety of statistical methods, they are united in the common goals of first, identifying interesting contemporaneous patterns in growth data and second, drawing inferences on long-run economic outcomes from cross-section and temporal variation in growth. We describe the main stylized facts that have motivated the development of growth econometrics, the major statistical tools that have been employed to provide structural explanations for these facts, and the primary statistical issues that arise in the study of growth data. An important aspect of the survey is attention to the limits that exist in drawing conclusions from growth data, limits that reflect model uncertainty and the general weakness of available data relative to the sorts of questions for which they are employed.

    Overview of BioCreative II gene mention recognition.

    Get PDF
    Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions
    • 

    corecore