Search CORE

93 research outputs found

An information retrieval strategy for large multimodal data collections involving source code and natural language

Author: Baquero Vargas Juan Felipe
Publication venue
Publication date: 03/07/2019
Field of study

Source code repositories store data from software products. Among this data we can find the evolution of the source code, requirements, bugs and communication between developers. Source code repositories have been growing rapidly in the recent years andwith them the need of extracting information from them. An interesting source code repository that is growing both in usage and information is Stack Overflow (SO), this web site provides one of the biggest Question Answering places used by thousands of developers everyday. In SO the developers can ask any question related to a programming issue and it will be answered by other users. We can find a source code repository with both source code and natural language with thousands of samples and the possibility of combining both sources of information to extract useful and not eye-noticeable information from it. In this thesis, we explore how to represent source code and natural language and how to combine these representations. We try to solve the task of understanding how users in SO talk about the programming language, how similar these programming languages are among them based on how users talk about them, and finally, we provide tools on the building of an information retrieval strategy by identifying duplicated post.Los repositorios de software almacenan datos sobre los productos de software, datos relacionados con la evolución de código fuente, requerimientos de software, reporte de bugs y comunicación entre desarrolladores. Los repositorios de software han crecido rápidamente en los últimos años y con ellos la necesidad de extraer información significativa de ellos. Un repositorio de software intersante es Stack Overflow(SO), este sitio web es uno de los sitios de Question Answering más grandes y usados por miles de desarrolladores de sofware en su día a día. En SO los desarrollares pueden preguntar cualquier duda relacionada con programación y software que será respondida por otros usuarios. Como SO, existen muchos repositorios de software con código fuente y texto con millones de ejemplares y la posibilidad de combinar ambas fuentes para extraer información de ellos que no es visible a simple vista. En este trabajo de tesis, exploramos como representar código fuente y lenguaje natural y cómo combinar estas representaciones. Intentamos resolver la tarea de entender como los usuarios de SO hablan sobre un lenguage de programación, que tan similares son los lenguajes de programación basados en cómo los usuarios hablen sobre ellos y, finalmente, proporcionar herramientas para construir una estrategia de recuperación de información para identificar post duplicados.Maestrí

Fundamental Approaches to Software Engineering

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications

OAPEN Library

Expert Finding in Disparate Environments

Author: D'Amore Raymond
Publication venue
Publication date: 01/03/2008
Field of study

Providing knowledge workers with access to experts and communities-of-practice is central to expertise sharing, and crucial to effective organizational performance, adaptation, and even survival. However, in complex work environments, it is difficult to know who knows what across heterogeneous groups, disparate locations, and asynchronous work. As such, where expert finding has traditionally been a manual operation there is increasing interest in policy and technical infrastructure that makes work visible and supports automated tools for locating expertise. Expert finding, is a multidisciplinary problem that cross-cuts knowledge management, organizational analysis, and information retrieval. Recently, a number of expert finders have emerged; however, many tools are limited in that they are extensions of traditional information retrieval systems and exploit artifact information primarily. This thesis explores a new class of expert finders that use organizational context as a basis for assessing expertise and for conferring trust in the system. The hypothesis here is that expertise can be inferred through assessments of work behavior and work derivatives (e.g., artifacts). The Expert Locator, developed within a live organizational environment, is a model-based prototype that exploits organizational work context. The system associates expertise ratings with expert’s signaling behavior and is extensible so that signaling behavior from multiple activity space contexts can be fused into aggregate retrieval scores. Post-retrieval analysis supports evidence review and personal network browsing, aiding users in both detection and selection. During operational evaluation, the prototype generated high-precision searches across a range of topics, and was sensitive to organizational role; ranking true experts (i.e., authorities) higher than brokers providing referrals. Precision increased with the number of activity spaces used in the model, but varied across queries. The highest performing queries are characterized by high specificity terms, and low organizational diffusion amongst retrieved experts; essentially, the highest rated experts are situated within organizational niches