125,683 research outputs found
NLP2Code: Code Snippet Content Assist via Natural Language Tasks
Developers increasingly take to the Internet for code snippets to integrate
into their programs. To save developers the time required to switch from their
development environments to a web browser in the quest for a suitable code
snippet, we introduce NLP2Code, a content assist for code snippets. Unlike
related tools, NLP2Code integrates directly into the source code editor and
provides developers with a content assist feature to close the vocabulary gap
between developers' needs and code snippet meta data. Our preliminary
evaluation of NLP2Code shows that the majority of invocations lead to code
snippets rated as helpful by users and that the tool is able to support a wide
range of tasks.Comment: tool demo video available at
https://www.youtube.com/watch?v=h-gaVYtCznI; to appear as a tool demo paper
at ICSME 2017 (https://icsme2017.github.io/
Semantic Source Code Models Using Identifier Embeddings
The emergence of online open source repositories in the recent years has led
to an explosion in the volume of openly available source code, coupled with
metadata that relate to a variety of software development activities. As an
effect, in line with recent advances in machine learning research, software
maintenance activities are switching from symbolic formal methods to
data-driven methods. In this context, the rich semantics hidden in source code
identifiers provide opportunities for building semantic representations of code
which can assist tasks of code search and reuse. To this end, we deliver in the
form of pretrained vector space models, distributed code representations for
six popular programming languages, namely, Java, Python, PHP, C, C++, and C#.
The models are produced using fastText, a state-of-the-art library for learning
word representations. Each model is trained on data from a single programming
language; the code mined for producing all models amounts to over 13.000
repositories. We indicate dissimilarities between natural language and source
code, as well as variations in coding conventions in between the different
programming languages we processed. We describe how these heterogeneities
guided the data preprocessing decisions we took and the selection of the
training parameters in the released models. Finally, we propose potential
applications of the models and discuss limitations of the models.Comment: 16th International Conference on Mining Software Repositories (MSR
2019): Data Showcase Trac
Speech Recognition Technology: Improving Speed and Accuracy of Emergency Medical Services Documentation to Protect Patients
Because hospital errors, such as mistakes in documentation, cause one sixth of the deaths each year in the United States, the accuracy of health records in the emergency medical services (EMS) must be improved. One possible solution is to incorporate speech recognition (SR) software into current tools used by EMS first responders. The purpose of this research was to determine if SR software could increase the efficiency and accuracy of EMS documentation to improve the safety for patients of EMS. An initial review of the literature on the performance of current SR software demonstrated that this software was not 99% accurate and therefore, errors in the medical documentation produced by the software could harm patients. The literature review also identified weaknesses of SR software that could be overcome so that the software would be accurate enough for use in EMS settings. These weaknesses included the inability to differentiate between similar phrases and the inability to filter out background noise. To find a solution, an analysis of natural language processing algorithms showed that the bag-of-words post processing algorithm has the ability to differentiate between similar phrases. This algorithm is the best suited for SR applications because it is simple yet effective compared to machine learning algorithms that required a large amount of training data. The findings suggested that if these weaknesses of current SR software are solved, then the software would potentially increase the efficiency and accuracy of EMS documentation. Further studies should integrate the bag-of-words post processing method into SR software and field test its accuracy in EMS settings.https://scholarscompass.vcu.edu/uresposters/1273/thumbnail.jp
The Synonym management process in SAREL
The specification phase is one of the most important and least supported
parts of the software development process. The SAREL system has been
conceived as a knowledge-based tool to improve the specification phase.
The purpose of SAREL (Assistance System for Writing Software
Specifications in Natural Language) is to assist engineers in the
creation of software specifications written in Natural Language (NL).
These documents are divided into several parts. We can distinguish the
Introduction and the Overall Description as parts that should be used in
the Knowledge Base construction. The information contained in the
Specific Requirements Section corresponds to the information represented
in the Requirements Base. In order to obtain high-quality software
requirements specification the writing norms that define the linguistic
restrictions required and the software engineering constraints related
to the quality factors have been taken into account. One of the controls
performed is the lexical analysis that verifies the words belong to the
application domain lexicon which consists of the Required and the
Extended lexicon. In this sense a synonym management process is needed
in order to get a quality software specification. The aim of this paper
is to present the synonym management process performed during the
Knowledge Base construction. Such process makes use of the Spanish
Wordnet developed inside the Eurowordnet project. This process generates
both the Required lexicon and the Extended lexicon that will be used
during the Requirements Base construction.Postprint (published version
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
While modern parallel computing systems provide high performance resources,
utilizing them to the highest extent requires advanced programming expertise.
Programming for parallel computing systems is much more difficult than
programming for sequential systems. OpenMP is an extension of C++ programming
language that enables to express parallelism using compiler directives. While
OpenMP alleviates parallel programming by reducing the lines of code that the
programmer needs to write, deciding how and when to use these compiler
directives is up to the programmer. Novice programmers may make mistakes that
may lead to performance degradation or unexpected program behavior. Cognitive
computing has shown impressive results in various domains, such as health or
marketing. In this paper, we describe the use of IBM Watson cognitive system
for education of novice parallel programmers. Using the dialogue service of the
IBM Watson we have developed a solution that assists the programmer in avoiding
common OpenMP mistakes. To evaluate our approach we have conducted a survey
with a number of novice parallel programmers at the Linnaeus University, and
obtained encouraging results with respect to usefulness of our approach
Strategies for converting to a DBMS environment
The conversion to data base management systems processing techniques consists of three different strategies - one for each of the major stages in the development process. Each strategy was chosen for its approach in bringing about a smooth evolutionary type transition from one mode of operation to the next. The initial strategy of the indoctrination stage consisted of: (1) providing maximum access to current administrative data as soon as possible; (2) select and developing small prototype systems; (3) establishing a user information center as a central focal point for user training and assistance; and (4) developing a training program for programmers, management and ad hoc users in DBMS application and utilization. Security, the rate of the data dictionary, and data base tuning and capacity planning, and the development of a change of attitude in an automated office are issues meriting consideration
Using NLP tools in the specification phase
The software quality control is one of the main topics in the Software
Engineering area. To put the effort in the quality control during the
specification phase leads us to detect possible mistakes in an early
steps and, easily, to correct them before the design and implementation
steps start. In this framework the goal of SAREL system, a
knowledge-based system, is twofold. On one hand, to help software
engineers in the creation of quality Software Requirements
Specifications. On the other hand, to analyze the correspondence between
two different conceptual representations associated with two different
Software Requirements Specification documents.
For the first goal, a set of NLP and Knowledge management tools is
applied to obtain a conceptual representation that can be validated and
managed by the software engineer.
For the second goal we have established some correspondence measures in
order to get a comparison between two conceptual representations. This
information will be useful during the interaction.Postprint (published version
Speech Recognition Technology: Improving Speed and Accuracy of Emergency Medical Services Documentation to Protect Patients
Because hospital errors, such as mistakes in documentation, cause one in six deaths each year in the United States, the accuracy of health records in the emergency medical services (EMS) must be improved. One possible solution is to incorporate speech recognition (SR) software into current tools used by EMS first responders. The purpose of this research was to determine if SR software could increase the efficiency and accuracy of EMS documentation to improve the safety of patients of EMS. An initial review of the literature on the performance of current SR software demonstrated that this software was not 99% accurate, and therefore, errors in the medical documentation produced by the software could harm patients. The literature review also identified weaknesses of SR software that could be overcome so that the software would be accurate enough for use in EMS settings. These weaknesses included the inability to differentiate between similar phrases and the inability to filter out background noise. To find a solution, an analysis of natural language processing algorithms showed that the bag-of-words post processing algorithm has the ability to differentiate between similar phrases. This algorithm is best suited for SR applications because it is simple yet effective compared to machine learning algorithms that required a large amount of training data. The findings suggested that if these weaknesses of current SR software are solved, then the software would potentially increase the efficiency and accuracy of EMS documentation. Further studies should integrate the bag-of-words post processing method into SR software and field test its accuracy in EMS settings
- …