Search CORE

5,178 research outputs found

Untangling Fine-Grained Code Changes

Author: Bacchelli A.
Cassou D.
Dias M.
Ducasse S.
Gousios G.
Gueheneuc Y.-G.
Publication venue
Publication date: 01/01/2015
Field of study

After working for some time, developers commit their code changes to a version control system. When doing so, they often bundle unrelated changes (e.g., bug fix and refactoring) in a single commit, thus creating a so-called tangled commit. Sharing tangled commits is problematic because it makes review, reversion, and integration of these commits harder and historical analyses of the project less reliable. Researchers have worked at untangling existing commits, i.e., finding which part of a commit relates to which task. In this paper, we contribute to this line of work in two ways: (1) A publicly available dataset of untangled code changes, created with the help of two developers who accurately split their code changes into self contained tasks over a period of four months; (2) a novel approach, EpiceaUntangler, to help developers share untangled commits (aka. atomic commits) by using fine-grained code change information. EpiceaUntangler is based and tested on the publicly available dataset, and further evaluated by deploying it to 7 developers, who used it for 2 weeks. We recorded a median success rate of 91% and average one of 75%, in automatically creating clusters of untangled fine-grained code changes

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Radboud Repository

Hal-Diderot

OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content

Author: Anastasiou Lucas
Eckart de Castilho Richard
Galanis Dimitrios
Georgantopoulos Byron
Greenwood Mark
Katerina Gkirtzou
Knoth Petr
Labropoulou Penny
Lempesis Antonis
Manola Natalia
Martziou Stefania
Piperidis Stelios
Sachtouris Stavros
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/06/2018
Field of study

The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud

TUbiblio

Open Research Online (The Open University)

Using Neural Networks for Relation Extraction from Biomedical Literature

Author: A Koike
A Lamurias
A Lamurias
A Lamurias
A Lamurias
A Singhal
AV Aho
B Xu
CD Manning
CH Alves
D Westergaard
D Zhou
E Guresen
F Rinaldi
HC Wang
HM Müller
J Hastings
L Aroyo
M Ashburner
MY Kim
N Ma
N Peng
P Goyal
P Zweigenbaum
PN Robinson
Q Li
QL Nguyen
S HayKin
S Hochreiter
TR Gruber
W Wang
WWM Fleuren
Y Hao
Y Luo
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2020
Field of study

Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

arXiv.org e-Print Archive

Crossref

The network-untangling problem : from interactions to activity timelines

Author: Gionis Aristides
Rozenshtein Polina
Tatti Nikolaj
Publication venue
Publication date: 01/01/2021
Field of study

In this paper we study a problem of determining when entities are active based on their interactions with each other. We consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge (u, v, t) is an element of E denotes an interaction between entities u and v at time t. We assume an activity model where each entity is active during at most k time intervals. An interaction (u, v, t) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals for all entities in the network, so as to explain the observed interactions. This problem, the network-untangling problem, can be applied to discover event timelines from complex entity interactions. We provide two formulations of the network-untangling problem: (i) minimizing the total interval length over all entities (sum version), and (ii) minimizing the maximum interval length (max version). We study separately the two problems for k = 1 and k > 1 activity intervals per entity. For the case k = 1, we show that the sum problem is NP-hard, while the max problem can be solved optimally in linear time. For the sum problem we provide efficient algorithms motivated by realistic assumptions. For the case of k > 1, we show that both formulations are inapproximable. However, wepropose efficient algorithms based on alternative optimization. We complement our study with an evaluation on synthetic and real-world datasets, which demonstrates the validity of our concepts and the good performance of our algorithms.Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Improving Software Project Health Using Machine Learning

Author: Partachi Profir-Petru
Publication venue: UCL (University College London)
Publication date: 28/12/2020
Field of study

In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools

UCL Discovery