247,381 research outputs found

    Unsupervised Learning And Image Classification In High Performance Computing Cluster

    Get PDF
    Feature learning and object classification in machine learning have become very active research areas in recent decades. Identifying good features has various benefits for object classification in respect to reducing the computational cost and increasing the classification accuracy. In addition, many research studies have focused on the use of Graphics Processing Units (GPUs) to improve the training time for machine learning algorithms. In this study, the use of an alternative platform, called High Performance Computing Cluster (HPCC), to handle unsupervised feature learning, image and speech classification and improve the computational cost is proposed. HPCC is a Big Data processing and massively parallel processing (MPP) computing platform used for solving Big Data problems. Algorithms are implemented in HPCC with a language called Enterprise Control Language (ECL) which is a declarative, data-centric programming language. It is a powerful, high-level, parallel programming language ideal for Big Data intensive applications. In this study, various databases are explored, such as the CALTECH-101 and AR databases, and a subset of wild PubFig83 data to which multimedia content is added. Unsupervised learning algorithms are applied to extract low-level image features from unlabeled data using HPCC. A new object identification framework that works in a multimodal learning and classification process is proposed. Coates et al. discovered that K-Means clustering method out-performed various deep learning methods such as sparse autoencoder for image classification. K-Means implemented in HPCC with various classifiers is compared with Coates et al. classification results. Detailed results on image classification in HPCC using Naive Bayes, Random Forest, and C4.5 Decision Tree are performed and presented. The highest recognition rates are achieved using C4.5 Decision Tree classifier in HPCC systems. For example, the classification accuracy result of Coates et al. is improved from 74.3% to 85.2% using C4.5 Decision Tree classifier in HPCC. It is observed that the deeper the decision tree, the fitter the model, resulting in a higher accuracy. The most important contribution of this study is the exploration of image classification problems in HPCC platform

    Neural Network Guided Evolution of L-system Plants

    Get PDF
    A Lindenmayer system is a parallel rewriting system that generates graphic shapes using several rules. Genetic programming (GP) is an evolutionary algorithm that evolves expressions. A convolutional neural network(CNN) is a type of neural network which is useful for image recognition and classification. The goal of this thesis will be to generate different styles of L-system based 2D images of trees from scratch using genetic programming. The system will use a convolutional neural network to evaluate the trees and produce a fitness value for genetic programming. Different architectures of CNN are explored. We analyze the performance of the system and show the capabilities of the combination of CNN and GP. We show that a variety of interesting tree images can be automatically evolved. We also found that the success of the system highly depends on CNN training, as well as the form of the GP's L-system language representation

    Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

    Full text link
    This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.Comment: 11 pages, 2 figures, 2 tables, Has been accepted for the Information Retrieval in Software Engineering track at Forum for Information Retrieval Evaluation 202

    Sentiment Analysis of News Tweets

    Get PDF
    Sentiment Analysis is a process of extracting information from a large amount of data and classifying them into different classes called sentiments. Python is a simple yet powerful, high-level, interpreted, and dynamic programming language, which is well known for its functionality of processing natural language data by using NLTK (Natural Language Toolkit). NLTK is a library of python, which provides a base for building programs and classification of data. NLTK also provides a graphical demonstration for representing various results or trends and it also provides sample data to train and test various classifiers respectively. Sentiment classification aims to automatically predict the sentiment polarity of users publishing sentiment data. Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling work can be time-consuming and expensive. Meanwhile, users often use different words when they express sentiment in different domains. If we directly apply a classifier trained in one domain to other domains, the performance will be very low due to the difference between these domains. In this work, we develop a general solution to sentiment classification when we do not have any labels in the target domain but have some labeled data in a different domain, regarded as the source domain. The purpose of this study is to analyze the tweets of the popular local and international news agencies and classify the tweeted news as positive, negative, or neutral categories

    Implementation of an orchestration language as a haskell domain specific language

    Get PDF
    Even though concurrent programming has been a hot topic of discussion in Computer Science for the past 30 years, the community has yet to settle on a, or a few standard approaches to implement concurrent programs. But as more and more cores inhabit our CPUs and more and more services are made available on the web the problem of coordinating different tasks becomes increasingly relevant. The present paper addresses this problem with an implementation of the orchestration language Orc as a domain specific language in Haskell. Orc was, therefore, realized as a combinator library using the lightweight threads and the communication and synchronization primitives of the Concurrent Haskell library. With this implementation it becomes possible to create orchestrations that re-use existing Haskell code and, conversely, re-use orchestrations inside other Haskell programs. The complexity inherent to distributed computation, entails the need for the classification of efficient, re-usable, concurrent programming patterns. The paper discusses how the calculus of recursive schemes used in the derivation of functional programs, scales up to a distributed setting. It is shown, in particular, how to parallelize the entire class of binary tree hylomorphisms.FCT -Fuel Cell Technologies Program(PTDC/EIA/73252/2006

    A MATLAB-Based Interactive Environment for EMG Signal Decomposition Utilizing Matched Template Filters

    Get PDF
    An interactive software package for analyzing and decomposing electromyographic (EMG) signals is designed, constructed, and implemented using the MATLAB high-level programming language and its interactive environment. EMG signal analysis in the form of signal decomposition into their constituent motor unit potential trains (MUPTs) is considered as a classification task. Matched template filter methods have been employed for the classification of motor unit potentials (MUPs) in which the assignment criterion used for MUPs is based on a combination of MUP shapes and motor unit firing pattern information. The developed software package consists of several graphical user interfaces used to detect individual MUP waveforms from raw EMG signals, extract relevant features, and classify MUPs into MUPTs using matched template filter classifiers. The proposed software package is useful for enhancing the analysis quality and providing a systematic approach to the EMG signal decomposition process. It also worked as a very helpful environment for testing and evaluating algorithms developed for EMG signal decomposition research
    • …
    corecore