361 research outputs found
Improving Distributed Representations of Tweets - Present and Future
Unsupervised representation learning for tweets is an important research
field which helps in solving several business applications such as sentiment
analysis, hashtag prediction, paraphrase detection and microblog ranking. A
good tweet representation learning model must handle the idiosyncratic nature
of tweets which poses several challenges such as short length, informal words,
unusual grammar and misspellings. However, there is a lack of prior work which
surveys the representation learning models with a focus on tweets. In this
work, we organize the models based on its objective function which aids the
understanding of the literature. We also provide interesting future directions,
which we believe are fruitful in advancing this field by building high-quality
tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201
Improving Distributed Representations of Tweets - Present and Future
Unsupervised representation learning for tweets is an important research
field which helps in solving several business applications such as sentiment
analysis, hashtag prediction, paraphrase detection and microblog ranking. A
good tweet representation learning model must handle the idiosyncratic nature
of tweets which poses several challenges such as short length, informal words,
unusual grammar and misspellings. However, there is a lack of prior work which
surveys the representation learning models with a focus on tweets. In this
work, we organize the models based on its objective function which aids the
understanding of the literature. We also provide interesting future directions,
which we believe are fruitful in advancing this field by building high-quality
tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201
Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey
The automated code evaluation system (AES) is mainly designed to reliably
assess user-submitted code. Due to their extensive range of applications and
the accumulation of valuable resources, AESs are becoming increasingly popular.
Research on the application of AES and their real-world resource exploration
for diverse coding tasks is still lacking. In this study, we conducted a
comprehensive survey on AESs and their resources. This survey explores the
application areas of AESs, available resources, and resource utilization for
coding tasks. AESs are categorized into programming contests, programming
learning and education, recruitment, online compilers, and additional modules,
depending on their application. We explore the available datasets and other
resources of these systems for research, analysis, and coding tasks. Moreover,
we provide an overview of machine learning-driven coding tasks, such as bug
detection, code review, comprehension, refactoring, search, representation, and
repair. These tasks are performed using real-life datasets. In addition, we
briefly discuss the Aizu Online Judge platform as a real example of an AES from
the perspectives of system design (hardware and software), operation
(competition and education), and research. This is due to the scalability of
the AOJ platform (programming education, competitions, and practice), open
internal features (hardware and software), attention from the research
community, open source data (e.g., solution codes and submission documents),
and transparency. We also analyze the overall performance of this system and
the perceived challenges over the years
WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised Learning
Bug localization is a key software development task, where a developer
locates the portion of the source code that must be modified based on the bug
report. It is label-intensive and time-consuming due to the increasing size and
complexity of the modern software. Effectively automating this task can greatly
reduce costs by cutting down the developers' effort. Researchers have already
made efforts to harness the great powerfulness of deep learning (DL) to
automate bug localization. However, training DL models demands a large quantity
of annotated training data, while the buggy-location-annotated dataset with
reasonable quality and quantity is difficult to collect. This becomes an
obstacle to the effective usage of DL for bug localization. We notice that the
data pairs for bug detection, which provide weak buggy-or-not binary
classification supervision, are much easier to obtain. Inspired by weakly
supervised learning, this paper proposes WEakly supervised bug LocaLization
(WELL), an approach to transform bug detectors to bug locators. Through the
CodeBERT model finetuned by bug detection, WELL is capable to locate bugs in a
weakly supervised manner based on the attention. The evaluations on three
datasets of WELL show competitive performance with the existing strongly
supervised DL solutions. WELL even outperforms current SOTA models in tasks of
variable misuse and binary operator misuse.Comment: (Preprint) Software Engineer; Deep Learning; Bug Detection &
Localizatio
On the use of Machine Learning and Deep Learning for Text Similarity and Categorization and its Application to Troubleshooting Automation
Troubleshooting is a labor-intensive task that includes repetitive solutions to similar problems. This task can be partially or fully automated using text-similarity matching to find previous solutions, lowering the workload of technicians. We develop a systematic literature review to identify the best approaches to solve the problem of troubleshooting automation and classify incidents effectively. We identify promising approaches and point in the direction of a comprehensive set of solutions that could be employed in solving the troubleshooting automation problem
- …