247,538 research outputs found
Unsupervised Learning And Image Classification In High Performance Computing Cluster
Feature learning and object classification in machine learning have become very active research areas in recent decades. Identifying good features has various benefits for object classification in respect to reducing the computational cost and increasing the classification accuracy. In addition, many research studies have focused on the use of Graphics Processing Units (GPUs) to improve the training time for machine learning algorithms. In this study, the use of an alternative platform, called High Performance Computing Cluster (HPCC), to handle unsupervised feature learning, image and speech classification and improve the computational cost is proposed.
HPCC is a Big Data processing and massively parallel processing (MPP) computing platform used for solving Big Data problems. Algorithms are implemented in HPCC with a language called Enterprise Control Language (ECL) which is a declarative, data-centric programming language. It is a powerful, high-level, parallel programming language ideal for Big Data intensive applications.
In this study, various databases are explored, such as the CALTECH-101 and AR databases, and a subset of wild PubFig83 data to which multimedia content is added. Unsupervised learning algorithms are applied to extract low-level image features from unlabeled data using HPCC. A new object identification framework that works in a multimodal learning and classification process is proposed.
Coates et al. discovered that K-Means clustering method out-performed various deep learning methods such as sparse autoencoder for image classification. K-Means implemented in HPCC with various classifiers is compared with Coates et al. classification results.
Detailed results on image classification in HPCC using Naive Bayes, Random Forest, and C4.5 Decision Tree are performed and presented. The highest recognition rates are achieved using C4.5 Decision Tree classifier in HPCC systems. For example, the classification accuracy result of Coates et al. is improved from 74.3% to 85.2% using C4.5 Decision Tree classifier in HPCC. It is observed that the deeper the decision tree, the fitter the model, resulting in a higher accuracy.
The most important contribution of this study is the exploration of image classification problems in HPCC platform
Neural Network Guided Evolution of L-system Plants
A Lindenmayer system is a parallel rewriting system that generates graphic shapes using several rules. Genetic programming (GP) is an evolutionary algorithm that evolves expressions. A convolutional neural network(CNN) is a type of neural network which is useful for image recognition and classification. The goal of this thesis will be to generate different styles of L-system based 2D images of trees from scratch using genetic programming. The system will use a convolutional neural network to evaluate the trees and produce a fitness value for genetic programming. Different architectures of CNN are explored. We analyze the performance of the system and show the capabilities of the combination of CNN and GP. We show that a variety of interesting tree images can be automatically evolved. We also found that the success of the system highly depends on CNN training, as well as the form of the GP's L-system language representation
Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy
This report focuses on enhancing a binary code comment quality classification
model by integrating generated code and comment pairs, to improve model
accuracy. The dataset comprises 9048 pairs of code and comments written in the
C programming language, each annotated as "Useful" or "Not Useful."
Additionally, code and comment pairs are generated using a Large Language Model
Architecture, and these generated pairs are labeled to indicate their utility.
The outcome of this effort consists of two classification models: one utilizing
the original dataset and another incorporating the augmented dataset with the
newly generated code comment pairs and labels.Comment: 11 pages, 2 figures, 2 tables, Has been accepted for the Information
Retrieval in Software Engineering track at Forum for Information Retrieval
Evaluation 202
Sentiment Analysis of News Tweets
Sentiment Analysis is a process of extracting information from a large amount of data and classifying them into different classes called sentiments. Python is a simple yet powerful, high-level, interpreted, and dynamic programming language, which is well known for its functionality of processing natural language data by using NLTK (Natural Language Toolkit). NLTK is a library of python, which provides a base for building programs and classification of data. NLTK also provides a graphical demonstration for representing various results or trends and it also provides sample data to train and test various classifiers respectively. Sentiment classification aims to automatically predict the sentiment polarity of users publishing sentiment data. Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling work can be time-consuming and expensive. Meanwhile, users often use different words when they express sentiment in different domains. If we directly apply a classifier trained in one domain to other domains, the performance will be very low due to the difference between these domains. In this work, we develop a general solution to sentiment classification when we do not have any labels in the target domain but have some labeled data in a different domain, regarded as the source domain. The purpose of this study is to analyze the tweets of the popular local and international news agencies and classify the tweeted news as positive, negative, or neutral categories
Implementation of an orchestration language as a haskell domain specific language
Even though concurrent programming has been a hot topic of discussion in Computer Science for the past 30 years, the community has yet to settle on a, or a few standard approaches to implement concurrent programs. But as more and more cores inhabit our CPUs and more and more services are made available on the web the problem of coordinating different tasks becomes increasingly relevant.
The present paper addresses this problem with an implementation of the orchestration language Orc as a domain specific language in Haskell. Orc was, therefore, realized as a combinator library using the lightweight threads and the communication and synchronization primitives of the Concurrent Haskell library. With this implementation it becomes possible to create orchestrations that re-use existing Haskell code and, conversely, re-use orchestrations inside other Haskell programs.
The complexity inherent to distributed computation, entails the need for the classification of efficient, re-usable, concurrent programming patterns. The paper discusses how the calculus of recursive schemes used in the derivation of functional programs, scales up to a distributed setting. It is shown, in particular, how to parallelize the entire class of binary tree hylomorphisms.FCT -Fuel Cell Technologies Program(PTDC/EIA/73252/2006
Programming language identification using machine learning
Many developer tools such as code search and source code highlighting require
to know the programming language each given file is written in. Another task
that also requires knowing the programming language of some source code file
is the generation of statistics for code repositories.
The main aim of this project is to build a programming language classifier
that could be used in the previously mentioned tasks. This is done mainly by
employing machine learning in combination with text classification techniques.
We have developed a source code classifier that has been previously trained
and then tested using source code from the Rosetta project dataset [44]. We
also measure some metrics such as the time it takes to train the classifier, its
accuracy, and time to perform the classification of individual source files.
We also assess the economic impact in this social context, evaluating why a tool
like this makes sense for developers.Muchas herramientas diseñadas para programadores, como búsqueda de
código o coloreado de sintaxis requieren conocer previamente en qué lenguaje
está escrito cada fichero para funcionar correctamente. Generar estadísticas
sobre repositorios de código también precisa de esta información.
El principal objectivo de este Trabajo de Fin de Grado es construir un
clasificador de lenguajes de programación que pueda ser utilizado en las
tareas mencionadas anteriormente empleando aprendizaje automático junto
con técnicas de procesamiento de textos.
Hemos desarrollado un clasificador de código fuente que ha sido entrenado y
posteriormente evaluado con código fuente del proyecto Rosetta [44]. Tambien
hemos medido algunas métricas como el tiempo de entrenamiento de los
clasificadores, su precisión, y el tiempo que tardan en clasificar ficheros de
código fuente individuales.
También se expone el impacto económico en este contexto social, evaluando por
qué esta herramienta es importante para desarrolladores.Ingeniería Informátic
- …