Search CORE

361 research outputs found

Improving Distributed Representations of Tweets - Present and Future

Author: J Ganesh
Publication venue
Publication date: 01/01/2017
Field of study

Unsupervised representation learning for tweets is an important research field which helps in solving several business applications such as sentiment analysis, hashtag prediction, paraphrase detection and microblog ranking. A good tweet representation learning model must handle the idiosyncratic nature of tweets which poses several challenges such as short length, informal words, unusual grammar and misspellings. However, there is a lack of prior work which surveys the representation learning models with a focus on tweets. In this work, we organize the models based on its objective function which aids the understanding of the literature. We also provide interesting future directions, which we believe are fruitful in advancing this field by building high-quality tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201

arXiv.org e-Print Archive

Crossref

Improving Distributed Representations of Tweets - Present and Future

Author: J Ganesh
Publication venue
Publication date: 01/01/1915
Field of study

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Crossref

Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey

Author: Hamada Mohamed
Rahman Md. Mostafizer
Shirafuji Atsushi
Watanobe Yutaka
Publication venue
Publication date: 08/07/2023
Field of study

The automated code evaluation system (AES) is mainly designed to reliably assess user-submitted code. Due to their extensive range of applications and the accumulation of valuable resources, AESs are becoming increasingly popular. Research on the application of AES and their real-world resource exploration for diverse coding tasks is still lacking. In this study, we conducted a comprehensive survey on AESs and their resources. This survey explores the application areas of AESs, available resources, and resource utilization for coding tasks. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules, depending on their application. We explore the available datasets and other resources of these systems for research, analysis, and coding tasks. Moreover, we provide an overview of machine learning-driven coding tasks, such as bug detection, code review, comprehension, refactoring, search, representation, and repair. These tasks are performed using real-life datasets. In addition, we briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design (hardware and software), operation (competition and education), and research. This is due to the scalability of the AOJ platform (programming education, competitions, and practice), open internal features (hardware and software), attention from the research community, open source data (e.g., solution codes and submission documents), and transparency. We also analyze the overall performance of this system and the perceived challenges over the years

arXiv.org e-Print Archive

WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised Learning

Author: Jin Zhi
Li Ge
Li Zhuo
Zhang Huangzhao
Publication venue
Publication date: 27/05/2023
Field of study

Bug localization is a key software development task, where a developer locates the portion of the source code that must be modified based on the bug report. It is label-intensive and time-consuming due to the increasing size and complexity of the modern software. Effectively automating this task can greatly reduce costs by cutting down the developers' effort. Researchers have already made efforts to harness the great powerfulness of deep learning (DL) to automate bug localization. However, training DL models demands a large quantity of annotated training data, while the buggy-location-annotated dataset with reasonable quality and quantity is difficult to collect. This becomes an obstacle to the effective usage of DL for bug localization. We notice that the data pairs for bug detection, which provide weak buggy-or-not binary classification supervision, are much easier to obtain. Inspired by weakly supervised learning, this paper proposes WEakly supervised bug LocaLization (WELL), an approach to transform bug detectors to bug locators. Through the CodeBERT model finetuned by bug detection, WELL is capable to locate bugs in a weakly supervised manner based on the attention. The evaluations on three datasets of WELL show competitive performance with the existing strongly supervised DL solutions. WELL even outperforms current SOTA models in tasks of variable misuse and binary operator misuse.Comment: (Preprint) Software Engineer; Deep Learning; Bug Detection & Localizatio

arXiv.org e-Print Archive

On the use of Machine Learning and Deep Learning for Text Similarity and Categorization and its Application to Troubleshooting Automation

Author: Callegari Daniel
Couto Julia
Godoy Julia
Kniest Davi
Meneguzzi Felipe
Ruiz Duncan
Tomaz Laura
Publication venue: 'HICSS Conference Office'
Publication date: 03/01/2022
Field of study

Troubleshooting is a labor-intensive task that includes repetitive solutions to similar problems. This task can be partially or fully automated using text-similarity matching to find previous solutions, lowering the workload of technicians. We develop a systematic literature review to identify the best approaches to solve the problem of troubleshooting automation and classify incidents effectively. We identify promising approaches and point in the direction of a comprehensive set of solutions that could be employed in solving the troubleshooting automation problem

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)