247,538 research outputs found

    Unsupervised Learning And Image Classification In High Performance Computing Cluster

    Get PDF
    Feature learning and object classification in machine learning have become very active research areas in recent decades. Identifying good features has various benefits for object classification in respect to reducing the computational cost and increasing the classification accuracy. In addition, many research studies have focused on the use of Graphics Processing Units (GPUs) to improve the training time for machine learning algorithms. In this study, the use of an alternative platform, called High Performance Computing Cluster (HPCC), to handle unsupervised feature learning, image and speech classification and improve the computational cost is proposed. HPCC is a Big Data processing and massively parallel processing (MPP) computing platform used for solving Big Data problems. Algorithms are implemented in HPCC with a language called Enterprise Control Language (ECL) which is a declarative, data-centric programming language. It is a powerful, high-level, parallel programming language ideal for Big Data intensive applications. In this study, various databases are explored, such as the CALTECH-101 and AR databases, and a subset of wild PubFig83 data to which multimedia content is added. Unsupervised learning algorithms are applied to extract low-level image features from unlabeled data using HPCC. A new object identification framework that works in a multimodal learning and classification process is proposed. Coates et al. discovered that K-Means clustering method out-performed various deep learning methods such as sparse autoencoder for image classification. K-Means implemented in HPCC with various classifiers is compared with Coates et al. classification results. Detailed results on image classification in HPCC using Naive Bayes, Random Forest, and C4.5 Decision Tree are performed and presented. The highest recognition rates are achieved using C4.5 Decision Tree classifier in HPCC systems. For example, the classification accuracy result of Coates et al. is improved from 74.3% to 85.2% using C4.5 Decision Tree classifier in HPCC. It is observed that the deeper the decision tree, the fitter the model, resulting in a higher accuracy. The most important contribution of this study is the exploration of image classification problems in HPCC platform

    Neural Network Guided Evolution of L-system Plants

    Get PDF
    A Lindenmayer system is a parallel rewriting system that generates graphic shapes using several rules. Genetic programming (GP) is an evolutionary algorithm that evolves expressions. A convolutional neural network(CNN) is a type of neural network which is useful for image recognition and classification. The goal of this thesis will be to generate different styles of L-system based 2D images of trees from scratch using genetic programming. The system will use a convolutional neural network to evaluate the trees and produce a fitness value for genetic programming. Different architectures of CNN are explored. We analyze the performance of the system and show the capabilities of the combination of CNN and GP. We show that a variety of interesting tree images can be automatically evolved. We also found that the success of the system highly depends on CNN training, as well as the form of the GP's L-system language representation

    Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

    Full text link
    This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.Comment: 11 pages, 2 figures, 2 tables, Has been accepted for the Information Retrieval in Software Engineering track at Forum for Information Retrieval Evaluation 202

    Sentiment Analysis of News Tweets

    Get PDF
    Sentiment Analysis is a process of extracting information from a large amount of data and classifying them into different classes called sentiments. Python is a simple yet powerful, high-level, interpreted, and dynamic programming language, which is well known for its functionality of processing natural language data by using NLTK (Natural Language Toolkit). NLTK is a library of python, which provides a base for building programs and classification of data. NLTK also provides a graphical demonstration for representing various results or trends and it also provides sample data to train and test various classifiers respectively. Sentiment classification aims to automatically predict the sentiment polarity of users publishing sentiment data. Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling work can be time-consuming and expensive. Meanwhile, users often use different words when they express sentiment in different domains. If we directly apply a classifier trained in one domain to other domains, the performance will be very low due to the difference between these domains. In this work, we develop a general solution to sentiment classification when we do not have any labels in the target domain but have some labeled data in a different domain, regarded as the source domain. The purpose of this study is to analyze the tweets of the popular local and international news agencies and classify the tweeted news as positive, negative, or neutral categories

    Implementation of an orchestration language as a haskell domain specific language

    Get PDF
    Even though concurrent programming has been a hot topic of discussion in Computer Science for the past 30 years, the community has yet to settle on a, or a few standard approaches to implement concurrent programs. But as more and more cores inhabit our CPUs and more and more services are made available on the web the problem of coordinating different tasks becomes increasingly relevant. The present paper addresses this problem with an implementation of the orchestration language Orc as a domain specific language in Haskell. Orc was, therefore, realized as a combinator library using the lightweight threads and the communication and synchronization primitives of the Concurrent Haskell library. With this implementation it becomes possible to create orchestrations that re-use existing Haskell code and, conversely, re-use orchestrations inside other Haskell programs. The complexity inherent to distributed computation, entails the need for the classification of efficient, re-usable, concurrent programming patterns. The paper discusses how the calculus of recursive schemes used in the derivation of functional programs, scales up to a distributed setting. It is shown, in particular, how to parallelize the entire class of binary tree hylomorphisms.FCT -Fuel Cell Technologies Program(PTDC/EIA/73252/2006

    Programming language identification using machine learning

    Get PDF
    Many developer tools such as code search and source code highlighting require to know the programming language each given file is written in. Another task that also requires knowing the programming language of some source code file is the generation of statistics for code repositories. The main aim of this project is to build a programming language classifier that could be used in the previously mentioned tasks. This is done mainly by employing machine learning in combination with text classification techniques. We have developed a source code classifier that has been previously trained and then tested using source code from the Rosetta project dataset [44]. We also measure some metrics such as the time it takes to train the classifier, its accuracy, and time to perform the classification of individual source files. We also assess the economic impact in this social context, evaluating why a tool like this makes sense for developers.Muchas herramientas diseñadas para programadores, como búsqueda de código o coloreado de sintaxis requieren conocer previamente en qué lenguaje está escrito cada fichero para funcionar correctamente. Generar estadísticas sobre repositorios de código también precisa de esta información. El principal objectivo de este Trabajo de Fin de Grado es construir un clasificador de lenguajes de programación que pueda ser utilizado en las tareas mencionadas anteriormente empleando aprendizaje automático junto con técnicas de procesamiento de textos. Hemos desarrollado un clasificador de código fuente que ha sido entrenado y posteriormente evaluado con código fuente del proyecto Rosetta [44]. Tambien hemos medido algunas métricas como el tiempo de entrenamiento de los clasificadores, su precisión, y el tiempo que tardan en clasificar ficheros de código fuente individuales. También se expone el impacto económico en este contexto social, evaluando por qué esta herramienta es importante para desarrolladores.Ingeniería Informátic
    corecore