68 research outputs found
Machine learning as a service for high energy physics (MLaaS4HEP): a service for ML-based data analyses
With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community.
The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner.
Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services
Code smells detection and visualization: A systematic literature review
Context: Code smells (CS) tend to compromise software quality and also demand more effort by developers to maintain and evolve the application throughout its life-cycle. They have long been cataloged with corresponding mitigating solutions called refactoring operations. Objective: This SLR has a twofold goal: the first is to identify the main code smells detection techniques and tools discussed in the literature, and the second is to analyze to which extent visual techniques have been applied to support the former. Method: Over 83 primary studies indexed in major scientific repositories were identified by our search string in this SLR. Then, following existing best practices for secondary studies, we applied inclusion/exclusion criteria to select the most relevant works, extract their features and classify them. Results: We found that the most commonly used approaches to code smells detection are search-based (30.1%), and metric-based (24.1%). Most of the studies (83.1%) use open-source software, with the Java language occupying the first position (77.1%). In terms of code smells, God Class (51.8%), Feature Envy (33.7%), and Long Method (26.5%) are the most covered ones. Machine learning techniques are used in 35% of the studies. Around 80% of the studies only detect code smells, without providing visualization techniques. In visualization-based approaches, several methods are used, such as city metaphors, 3D visualization techniques. Conclusions: We confirm that the detection of CS is a non-trivial task, and there is still a lot of work to be done in terms of: reducing the subjectivity associated with the definition and detection of CS; increasing the diversity of detected CS and of supported programming languages; constructing and sharing oracles and datasets to facilitate the replication of CS detection and visualization techniques validation experiments.info:eu-repo/semantics/acceptedVersio
Streamlining code smells: Using collective intelligence and visualization
Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code.
Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue.
Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms.
Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells.
Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest).
Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validityContexto. Os cheiros de código são a principal causa de dívida técnica (technical debt), como tal, devem ser detectados e removidos. Os cheiros de código já foram há muito tempo catalogados juntamente com as correspondentes soluções mitigadoras chamadas operações de refabricação (refactoring). No entanto, embora estas últimas sejam suportadas nas IDEs
actuais (por exemplo, Eclipse), a deteção de cheiros de código têm ainda muitas limitações. Os investigadores argumentam que a subjectividade do processo de deteção de cheiros de código é um dos principais obstáculo à mitigação do problema da qualidade do código.
Objectivo. Esta tese apresenta uma nova abordagem à detecção de cheiros de código, a que chamámos CrowdSmelling, e os resultados de uma experiência de validação para esta abordagem. A nossa abordagem de CrowdSmelling baseia-se em técnicas de aprendizagem automática supervisionada, onde a sabedoria da multidão (dos programadores de software) é
utilizada para calibrar colectivamente algoritmos de detecção de cheiros de código, diminuindo assim a questão da subjectividade.
Método. Em três anos consecutivos, no âmbito da Unidade Curricular de Engenharia de Software, uma "multidão", num total de cerca de uma centena de equipas, com uma média de três membros cada, classificou a presença de 3 cheiros de código (Long Method, God Class, and Feature Envy) em código fonte Java. Estas classificações foram a base dos oráculos utilizados para o treino de seis algoritmos de aprendizagem automática. Mais de cem modelos foram gerados e avaliados para determinar quais os algoritmos de aprendizagem de máquinas com melhor desempenho na detecção de cada um dos cheiros de código acima mencionados.
Resultados. Foram obtidos bons desempenhos na detecção do God Class (ROC=0,896 para Naive Bayes) e na detecção do Long Method (ROC=0,870 para AdaBoostM1), mas muito mais baixos para Feature Envy (ROC=0,570 para Random Forrest).
Conclusões. Os resultados obtidos sugerem que o Crowdsmelling é uma abordagem viável para a detecção de cheiros de código, mas são necessárias mais experiências de validação para cobrir mais cheiros de código e para aumentar a validade externa
Programming Languages and Systems
This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Global Politics in a Post-Truth Age
This book brings together ten chapters that reflect upon the state of global, regional and national politics in the twenty-first century within the context of post-truth. The Oxford Dictionary’s definition of post-truth describes it as circumstances in which facts are less influential in shaping public opinion and political action than emotion, belief and distortion. What unites the chapters in this book, other than their focus on the meaning and nature of post-truth, is that they also consider the (supposed) erosion of many of the norms and patterns of political and social behaviour established in the second half of the twentieth century. This is especially pertinent given the rise in social media and the internet, political polarisation, and new patterns of state rivalries that harness post-truth politics. Each chapter is styled to engage with academic themes and leading-edge research, yet also to present complex ideas accessibly where possible
RoboArch: Architectural Modelling for Robotic Applications
Robotic systems are being employed in a diverse range of applications, with both the scale and complexity of their software increasing through having to operate in unstructured environments and to provide higher levels of autonomy. In addition, the need for robotic systems to be verified grows as robots are used in applications where they can have significant safety implications.
Verification of even small robotic systems software is a challenging problem. Therefore, additional techniques are required to enable the practitioners to produce verified robotic systems. The use of model-driven engineering and domain-specific languages (DSLs) have proven useful in the development of complex systems in other areas so applying them to the field of robotics can contribute to the goal of building reliable and safe systems.
In this thesis we present RoboArch, a notation for describing the architectures and patterns of robotic systems software supported by the formally defined semantics of RoboChart. RoboChart is a DSL for modelling the behaviour of robot software controllers using state machines.
We describe RoboArch from the top-down. First, we examine the role of robotics software architectures in the development of robotic systems by reviewing five robotics architectures, and five DSLs. Next, for the layered architectural pattern, the RoboArch notation is introduced; we provide a metamodel, well-formedness conditions, and transformation rules to RoboChart. Further, we characterise two patterns: reactive skills and subsumption, which can be used by a layer.
Finally, we discuss a tool and its implementation for the evaluation of RoboArch and automation of the rules as model transformations. We use a case study of a small obstacle avoidance system to demonstrate: the application of the reactive skills pattern using RoboArch and the expected properties of the architecture that can be proven using the generated RoboChart model CSP semantics
Programming Languages and Systems
This open access book constitutes the proceedings of the 29th European Symposium on Programming, ESOP 2020, which was planned to take place in Dublin, Ireland, in April 2020, as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The actual ETAPS 2020 meeting was postponed due to the Corona pandemic. The papers deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Role of Inflammation in the Pathogenesis of Myeloproliferative Neoplasms
Myeloproliferative neoplasms (MPNs) are a group of diseases frequently caused by activating mutations in JAK2, CALR or MPL and characterized by aberrant proliferation of the erythroid, megakaryocytic and myeloid lineages. They represent clonal disorders of the hematopoietic stem cell (HSC) with an inherent tendency towards leukemic transformation. MPNs are subdivided into three disease entities: polycythemia vera (PV), essential thrombocythemia (ET) and primary myelofibrosis (PMF). JAK2-V617F is the most frequently recurring somatic mutation in MPN patients, but it can also be found in healthy individuals with clonal hematopoiesis of indeterminate potential (CHIP) with a frequency much higher than the incidence of MPN. This suggests that the acquisition of the JAK2-V617F is not the rate-limiting step and other factors might be required for the expansion of the JAK2 mutated clone and initiation of MPN disease. MPN is often linked with a chronic inflammatory state due to elevated production of inflammatory cytokines and chemokines from hematopoietic and non-hematopoietic cells. Interleukin-1β (IL-1β) is one of the master regulators of the inflammatory state and its aberrant activity has been implicated in various pathological diseases including MPN. In the first part of this study, we focused on the early stages of MPN disease initiation and examined the role of IL-1β in this context. Our results showed that IL-1β secreted from mutant cells promoted the expansion of JAK2-V617F clones and loss of IL-1β from mutant cells resulted in reduced frequency of MPN disease initiation. Furthermore, our results indicated that IL-1β was required for optimal stem cell function and long-term repopulation capacity of JAK2-V617F HSCs. Moreover, we showed that early secretion of IL-1β from mutant cells caused neuronal damage in the bone marrow resulting in loss of nestin-positive stromal cells. Loss of nestin-positive stromal cells favored clonal expansion and MPN disease manifestation. In the second part of the study, we showed that JAK2-V617F mutation correlated with increased IL-1 signaling in MPN patients. We showed that genetic deletion of IL-1β from mutant cells resulted in reduced production of inflammatory cytokines, reduced MPN symptom burden and myelofibrosis. Notably, pharmacological inhibition of IL-1β or NLRP3 inflammasome complex reduced myelofibrosis. Combined targeting of IL-1β with JAK1/2 inhibitor, ruxolitinib resulted in complete reversal of myelofibrosis, reduced production of inflammatory cytokines and normalization of MPN constitutional symptoms in vivo. Overall, our results showed that IL-1β is required for optimal MPN disease initiation and progression to myelofibrosis
Software Analytics for Improving Program Comprehension
Title from PDF of title page viewed June 28, 2021Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 122-143)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021Program comprehension is an essential part of software development and maintenance. Traditional methods of program comprehension, such as reviewing the codebase and documentation, are still challenging for understanding the software's overall structure and implementation. In recent years, software static analysis studies have emerged to facilitate program comprehensions, such as call graphs, which represent the system’s structure and its implementation as a directed graph. Furthermore, some studies focused on semantic enrichment of the software system problems using systematic learning analytics, including machine learning and NLP.
While call graphs can enhance the program comprehension process, they still face three main challenges: (1) complex call graphs can become very difficult to understand making call graphs much harder to visualize and interpret by a developer and thus increases the overhead in program comprehension; (2) they are often limited to a single level of granularity, such as function calls; and (3) there is a lack of the interpretation semantics about the graphs.
In this dissertation, we propose a novel framework, called CodEx, to facilitate and accelerate program comprehension. CodEx enables top-down and bottom-up analysis of the system's call graph and its execution paths for an enhanced program comprehension experience. Specifically, the proposed framework is designed to cope with the following techniques: multi-level graph abstraction using a coarsening technique, hierarchical clustering to represent the call graph into subgraphs (i.e., multi-levels of granularity), and interactive visual exploration of the graphs at different levels of abstraction. Moreover, we are also worked on building semantics of software systems using NLP and machine learning, including topic modeling, to interpret the meaning of the abstraction levels of the call graph.Introduction -- Multi-Level Call Graph for Program Comprehension -- Static Trace Clustering: Single-Level Approach -- Static Trace Clustering: Multi-Level Approach -- Topic Modeling for Cluster Analysis -- Visual Exploration of Software Clustered Traces -- Conclusion and Feature Work -- Appendi
- …