822 research outputs found

    A Resource Aware MapReduce Based Parallel SVM for Large Scale Image Classifications

    Get PDF
    Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them support vector machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. This paper presents RASMO, a resource aware MapReduce based parallel SVM algorithm for large scale image classifications which partitions the training data set into smaller subsets and optimizes SVM training in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of RASMO in heterogeneous computing environments. RASMO is evaluated in both experimental and simulation environments. The results show that the parallel SVM algorithm reduces the training time significantly compared with the sequential SMO algorithm while maintaining a high level of accuracy in classifications.National Basic Research Program (973) of China under Grant 2014CB34040

    Personalized large scale classification of public tenders on hadoop

    Get PDF
    Ce projet a été réalisé dans le cadre d’un partenariat entre Fujitsu Canada et Université Laval. Les besoins du projets ont été centrés sur une problématique d’affaire définie conjointement avec Fujitsu. Le projet consistait à classifier un corpus d’appels d’offres électroniques avec une approche orienté big data. L’objectif était d’identifier avec un très fort rappel les offres pertinentes au domaine d’affaire de l’entreprise. Après une séries d’expérimentations à petite échelle qui nous ont permise d’illustrer empiriquement (93% de rappel) l’efficacité de notre approche basé sur l’algorithme BNS (Bi-Normal Separation), nous avons implanté un système complet qui exploite l’infrastructure technologique big data Hadoop. Nos expérimentations sur le système complet démontrent qu’il est possible d’obtenir une performance de classification tout aussi efficace à grande échelle (91% de rappel) tout en exploitant les gains de performance rendus possible par l’architecture distribuée de Hadoop.This project was completed as part of an innovation partnership with Fujitsu Canada and Université Laval. The needs and objectives of the project were centered on a business problem defined jointly with Fujitsu. Our project aimed to classify a corpus of electronic public tenders based on state of the art Hadoop big data technology. The objective was to identify with high recall public tenders relevant to the IT services business of Fujitsu Canada. A small scale prototype based on the BNS algorithm (Bi-Normal Separation) was empirically shown to classify with high recall (93%) the public tender corpus. The prototype was then re-implemented on a full scale Hadoop cluster using Apache Pig for the data preparation pipeline and using Apache Mahout for classification. Our experimentation show that the large scale system not only maintains high recall (91%) on the classification task, but can readily take advantage of the massive scalability gains made possible by Hadoop’s distributed architecture

    Big Data Analysis of Facebook Users Personality Recognition using Map Reduce Back Propagation Neural Networks

    Get PDF
    Abstract- Machine learning has been an effective tool to connect networks of enormous information for predicting personality.  Identification of personality-related indicators encrypted in Facebook profiles and activities are of special concern in most research efforts. This research modeled user personality based on set of features extracted from the Facebook data using Map-Reduce Back Propagation Neural Network (MRBPNN). The performance of the MRBPNN classification model was evaluated in terms of five basic personality dimensions: Extraversion (EXT), Agreeableness (AGR), Conscientiousness (CON), Neuroticism (NEU), and Openness to Experience (OPN) using True positive, False Positive, accuracy, precision and F-measure as metrics at the threshold value of 0.32. The experimental results reveal that MRBPNN model has accuracy of 91.40%, 93.89%, 91.33%, 90.43% and 89.13% CON, OPN, EXT, NEU and AGR respectively for personality recognition which is more computationally efficient than Back Propagation Neural Network (BPNN) and Support Vector Machine (SVM). Therefore, personality recognition based on MRBPNN would produce a reliable prediction system for various personality traits with data having a very large instance
    corecore