Search CORE

125,683 research outputs found

NLP2Code: Code Snippet Content Assist via Natural Language Tasks

Author: Campbell Brock Angus
Treude Christoph
Publication venue
Publication date: 02/08/2017
Field of study

Developers increasingly take to the Internet for code snippets to integrate into their programs. To save developers the time required to switch from their development environments to a web browser in the quest for a suitable code snippet, we introduce NLP2Code, a content assist for code snippets. Unlike related tools, NLP2Code integrates directly into the source code editor and provides developers with a content assist feature to close the vocabulary gap between developers' needs and code snippet meta data. Our preliminary evaluation of NLP2Code shows that the majority of invocations lead to code snippets rated as helpful by users and that the tool is able to support a wide range of tasks.Comment: tool demo video available at https://www.youtube.com/watch?v=h-gaVYtCznI; to appear as a tool demo paper at ICSME 2017 (https://icsme2017.github.io/

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Semantic Source Code Models Using Identifier Embeddings

Author: Efstathiou Vasiliki
Spinellis Diomidis
Publication venue
Publication date: 15/04/2019
Field of study

The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data-driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic representations of code which can assist tasks of code search and reuse. To this end, we deliver in the form of pretrained vector space models, distributed code representations for six popular programming languages, namely, Java, Python, PHP, C, C++, and C#. The models are produced using fastText, a state-of-the-art library for learning word representations. Each model is trained on data from a single programming language; the code mined for producing all models amounts to over 13.000 repositories. We indicate dissimilarities between natural language and source code, as well as variations in coding conventions in between the different programming languages we processed. We describe how these heterogeneities guided the data preprocessing decisions we took and the selection of the training parameters in the released models. Finally, we propose potential applications of the models and discuss limitations of the models.Comment: 16th International Conference on Mining Software Repositories (MSR 2019): Data Showcase Trac

arXiv.org e-Print Archive

Crossref

Speech Recognition Technology: Improving Speed and Accuracy of Emergency Medical Services Documentation to Protect Patients

Author: Tran Tan T
Publication venue: VCU Scholars Compass
Publication date: 01/01/2018
Field of study

Because hospital errors, such as mistakes in documentation, cause one sixth of the deaths each year in the United States, the accuracy of health records in the emergency medical services (EMS) must be improved. One possible solution is to incorporate speech recognition (SR) software into current tools used by EMS first responders. The purpose of this research was to determine if SR software could increase the efficiency and accuracy of EMS documentation to improve the safety for patients of EMS. An initial review of the literature on the performance of current SR software demonstrated that this software was not 99% accurate and therefore, errors in the medical documentation produced by the software could harm patients. The literature review also identified weaknesses of SR software that could be overcome so that the software would be accurate enough for use in EMS settings. These weaknesses included the inability to differentiate between similar phrases and the inability to filter out background noise. To find a solution, an analysis of natural language processing algorithms showed that the bag-of-words post processing algorithm has the ability to differentiate between similar phrases. This algorithm is the best suited for SR applications because it is simple yet effective compared to machine learning algorithms that required a large amount of training data. The findings suggested that if these weaknesses of current SR software are solved, then the software would potentially increase the efficiency and accuracy of EMS documentation. Further studies should integrate the bag-of-words post processing method into SR software and field test its accuracy in EMS settings.https://scholarscompass.vcu.edu/uresposters/1273/thumbnail.jp

VCU Scholars Compass

The Synonym management process in SAREL

Author: Castell Ariño Núria
Hernández Gómez M. Angeles
Publication venue
Publication date: 01/01/2002
Field of study

The specification phase is one of the most important and least supported parts of the software development process. The SAREL system has been conceived as a knowledge-based tool to improve the specification phase. The purpose of SAREL (Assistance System for Writing Software Specifications in Natural Language) is to assist engineers in the creation of software specifications written in Natural Language (NL). These documents are divided into several parts. We can distinguish the Introduction and the Overall Description as parts that should be used in the Knowledge Base construction. The information contained in the Specific Requirements Section corresponds to the information represented in the Requirements Base. In order to obtain high-quality software requirements specification the writing norms that define the linguistic restrictions required and the software engineering constraints related to the quality factors have been taken into account. One of the controls performed is the lexical analysis that verifies the words belong to the application domain lexicon which consists of the Required and the Extended lexicon. In this sense a synonym management process is needed in order to get a quality software specification. The aim of this paper is to present the synonym management process performed during the Knowledge Base construction. Such process makes use of the Spanish Wordnet developed inside the Eurowordnet project. This process generates both the Required lexicon and the Extended lexicon that will be used during the Requirements Base construction.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution

Author: Chozas Adrian Calvo
Memeti Suejb
Pllana Sabri
Publication venue
Publication date: 01/01/2017
Field of study

While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems is much more difficult than programming for sequential systems. OpenMP is an extension of C++ programming language that enables to express parallelism using compiler directives. While OpenMP alleviates parallel programming by reducing the lines of code that the programmer needs to write, deciding how and when to use these compiler directives is up to the programmer. Novice programmers may make mistakes that may lead to performance degradation or unexpected program behavior. Cognitive computing has shown impressive results in various domains, such as health or marketing. In this paper, we describe the use of IBM Watson cognitive system for education of novice parallel programmers. Using the dialogue service of the IBM Watson we have developed a solution that assists the programmer in avoiding common OpenMP mistakes. To evaluate our approach we have conducted a survey with a number of novice parallel programmers at the Linnaeus University, and obtained encouraging results with respect to usefulness of our approach

arXiv.org e-Print Archive

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas

Strategies for converting to a DBMS environment

Author: Durban D. M.
Publication venue
Publication date
Field of study

The conversion to data base management systems processing techniques consists of three different strategies - one for each of the major stages in the development process. Each strategy was chosen for its approach in bringing about a smooth evolutionary type transition from one mode of operation to the next. The initial strategy of the indoctrination stage consisted of: (1) providing maximum access to current administrative data as soon as possible; (2) select and developing small prototype systems; (3) establishing a user information center as a central focal point for user training and assistance; and (4) developing a training program for programmers, management and ad hoc users in DBMS application and utilization. Security, the rate of the data dictionary, and data base tuning and capacity planning, and the development of a change of attitude in an automated office are issues meriting consideration

NASA Technical Reports Server

Using NLP tools in the specification phase

Author: Castell Ariño Núria
Hernández Gómez M. Angeles
Publication venue
Publication date: 01/01/2003
Field of study

The software quality control is one of the main topics in the Software Engineering area. To put the effort in the quality control during the specification phase leads us to detect possible mistakes in an early steps and, easily, to correct them before the design and implementation steps start. In this framework the goal of SAREL system, a knowledge-based system, is twofold. On one hand, to help software engineers in the creation of quality Software Requirements Specifications. On the other hand, to analyze the correspondence between two different conceptual representations associated with two different Software Requirements Specification documents. For the first goal, a set of NLP and Knowledge management tools is applied to obtain a conceptual representation that can be validated and managed by the software engineer. For the second goal we have established some correspondence measures in order to get a comparison between two conceptual representations. This information will be useful during the interaction.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Speech Recognition Technology: Improving Speed and Accuracy of Emergency Medical Services Documentation to Protect Patients

Author: Tran Tan
Publication venue: VCU Scholars Compass
Publication date: 01/01/2018
Field of study

Because hospital errors, such as mistakes in documentation, cause one in six deaths each year in the United States, the accuracy of health records in the emergency medical services (EMS) must be improved. One possible solution is to incorporate speech recognition (SR) software into current tools used by EMS first responders. The purpose of this research was to determine if SR software could increase the efficiency and accuracy of EMS documentation to improve the safety of patients of EMS. An initial review of the literature on the performance of current SR software demonstrated that this software was not 99% accurate, and therefore, errors in the medical documentation produced by the software could harm patients. The literature review also identified weaknesses of SR software that could be overcome so that the software would be accurate enough for use in EMS settings. These weaknesses included the inability to differentiate between similar phrases and the inability to filter out background noise. To find a solution, an analysis of natural language processing algorithms showed that the bag-of-words post processing algorithm has the ability to differentiate between similar phrases. This algorithm is best suited for SR applications because it is simple yet effective compared to machine learning algorithms that required a large amount of training data. The findings suggested that if these weaknesses of current SR software are solved, then the software would potentially increase the efficiency and accuracy of EMS documentation. Further studies should integrate the bag-of-words post processing method into SR software and field test its accuracy in EMS settings

VCU Scholars Compass