573 research outputs found

    Software Design Change Artifacts Generation through Software Architectural Change Detection and Categorisation

    Get PDF
    Software is solely designed, implemented, tested, and inspected by expert people, unlike other engineering projects where they are mostly implemented by workers (non-experts) after designing by engineers. Researchers and practitioners have linked software bugs, security holes, problematic integration of changes, complex-to-understand codebase, unwarranted mental pressure, and so on in software development and maintenance to inconsistent and complex design and a lack of ways to easily understand what is going on and what to plan in a software system. The unavailability of proper information and insights needed by the development teams to make good decisions makes these challenges worse. Therefore, software design documents and other insightful information extraction are essential to reduce the above mentioned anomalies. Moreover, architectural design artifacts extraction is required to create the developer’s profile to be available to the market for many crucial scenarios. To that end, architectural change detection, categorization, and change description generation are crucial because they are the primary artifacts to trace other software artifacts. However, it is not feasible for humans to analyze all the changes for a single release for detecting change and impact because it is time-consuming, laborious, costly, and inconsistent. In this thesis, we conduct six studies considering the mentioned challenges to automate the architectural change information extraction and document generation that could potentially assist the development and maintenance teams. In particular, (1) we detect architectural changes using lightweight techniques leveraging textual and codebase properties, (2) categorize them considering intelligent perspectives, and (3) generate design change documents by exploiting precise contexts of components’ relations and change purposes which were previously unexplored. Our experiment using 4000+ architectural change samples and 200+ design change documents suggests that our proposed approaches are promising in accuracy and scalability to deploy frequently. Our proposed change detection approach can detect up to 100% of the architectural change instances (and is very scalable). On the other hand, our proposed change classifier’s F1 score is 70%, which is promising given the challenges. Finally, our proposed system can produce descriptive design change artifacts with 75% significance. Since most of our studies are foundational, our approaches and prepared datasets can be used as baselines for advancing research in design change information extraction and documentation

    Tradition and Innovation in Construction Project Management

    Get PDF
    This book is a reprint of the Special Issue 'Tradition and Innovation in Construction Project Management' that was published in the journal Buildings

    Towards an integrated vulnerability-based approach for evaluating, managing and mitigating earthquake risk in urban areas

    Get PDF
    Tese de doutoramento em Civil EngineeringSismos de grande intensidade, como aqueles que ocorreram na Turquía-Síria (2023) ou México (2017) deviam chamar a atenção para o projeto e implementação de ações proativas que conduzam à identificação de bens vulneráveis. A presente tese propõe um fluxo de trabalho relativamente simples para efetuar avaliações da vulnerabilidade sísmica à escala urbana mediante ferramentas digitais. Um modelo de vulnerabilidade baseado em parâmetros é adotado devido à afinidade que possui com o Catálogo Nacional de Monumentos Históricos mexicano. Uma primeira implementação do método (a grande escala) foi efetuada na cidade histórica de Atlixco (Puebla, México), demonstrando a sua aplicabilidade e algumas limitações, o que permitiu o desenvolvimento de uma estratégia para quantificar e considerar as incertezas epistémicas encontradas nos processos de aquisição de dados. Devido ao volume de dados tratado, foi preciso desenvolver meios robustos para obter, armazenar e gerir informações. O uso de Sistemas de Informação Geográfica, com programas à medida baseados em linguagem Python e a distribuição de ficheiros na ”nuvem”, facilitou a criação de bases de dados de escala urbana para facilitar a aquisição de dados em campo, os cálculos de vulnerabilidade e dano e, finalmente, a representação dos resultados. Este desenvolvimento foi a base para um segundo conjunto de trabalhos em municípios do estado de Morelos (México). A caracterização da vulnerabilidade sísmica de mais de 160 construções permitiu a avaliação da representatividade do método paramétrico pela comparação entre os níveis de dano teórico e os danos observados depois do terramoto de Puebla-Morelos (2017). Esta comparação foi a base para efetuar processos de calibração e ajuste assistidos por algoritmos de aprendizagem de máquina (Machine Learning), fornecendo bases para o desenvolvimento de modelos de vulnerabilidade à medida (mediante o uso de Inteligência Artificial), apoiados nas evidências de eventos sísmicos prévios.Strong seismic events like the ones of Türkiye-Syria (2023) or Mexico (2017) should guide our attention to the design and implementation of proactive actions aimed to identify vulnerable assets. This work is aimed to propose a suitable and easy-to-implement workflow for performing large-scale seismic vulnerability assessments in historic environments by means of digital tools. A vulnerability-oriented model based on parameters is adopted given its affinity with the Mexican Catalogue of Historical Monuments. A first large-scale implementation of this method in the historical city of Atlixco (Puebla, Mexico) demonstrated its suitability and some limitations, which lead to develop a strategy for quantifying and involving the epistemic uncertainties found during the data acquisition process. Given the volume of data that these analyses involve, it was necessary to develop robust data acquisition, storing and management strategies. The use of Geographical Information System environments together with customised Python-based programs and cloud-based distribution permitted to assemble urban databases for facilitating field data acquisition, performing vulnerability and damage calculations, and representing outcomes. This development was the base for performing a second large-scale assessment in selected municipalities of the state of Morelos (Mexico). The characterisation of the seismic vulnerability of more than 160 buildings permitted to assess the representativeness of the parametric vulnerability approach by comparing the theoretical damage estimations against the damages observed after the Puebla-Morelos 2017 Earthquakes. Such comparison is the base for performing a Machine Learning assisted process of calibration and adjustment, representing a feasible strategy for calibrating these vulnerability models by using Machine-Learning algorithms and the empirical evidence of damage in post-seismic scenarios.This work was partly financed by FCT/MCTES through national funds (PIDDAC) under the R&D Unit Institute for Sustainability and Innovation in Structural Engineering (ISISE), reference UIDB/04029/2020. This research had financial support provided by the Portuguese Foundation of Science and Technology (FCT) through the Analysis and Mitigation of Risks in Infrastructures (InfraRisk) program under the PhD grant PD/BD/150385/2019

    Fine-tuning language models to recognize semantic relations

    Get PDF
    Transformer-based pre-trained Language Models (PLMs) have emerged as the foundations for the current state-of-the-art algorithms in most natural language processing tasks, in particular when applied to context rich data such as sentences or paragraphs. However, their impact on the tasks defined in terms of abstract individual word properties, not necessary tied to their specific use in a particular sentence, has been inadequately explored, which is a notable research gap. Addressing this gap is crucial for advancing our understanding of natural language processing. To fill this void, we concentrate on classification of semantic relations: given a pair of concepts (words or word sequences) the aim is to identify the semantic label to describe their relationship. E.g. in the case of the pair green/colour, “is a” is a suitable relation while “part of”, “property of”, and “opposite of” are not suitable. This classification is independent of a particular sentence in which these concepts might have been used. We are first to incorporate a language model into both existing approaches to this task, namely path-based and distribution-based methods. Our transformer-based approaches exhibit significant improvements over the state-of-the-art and come remarkably close to achieving human-level performance on rigorous benchmarks. We are also first to provide evidence that the standard datasets over-state the performance due to the effect of “lexical memorisation.” We reduce this effect by applying lexical separation. On the new benchmark datasets, the algorithmic performance remains significantly below human-level, highlighting that the task of semantic relation classification is still unresolved, particularly for language models of the sizes commonly used at the time of our study. We also identify additional challenges that PLM-based approaches face and conduct extensive ablation studies and other experiments to investigate the sensitivity of our findings to specific modelling and implementation choices. Furthermore, we examine the specific relations that pose greater challenges and discuss the trade-offs between accuracy and processing time

    Observing LOD: Its Knowledge Domains and the Varying Behavior of Ontologies Across Them

    Get PDF
    Linked Open Data (LOD) is the largest, collaborative, distributed, and publicly-accessible Knowledge Graph (KG) uniformly encoded in the Resource Description Framework (RDF) and formally represented according to the semantics of the Web Ontology Language (OWL). LOD provides researchers with a unique opportunity to study knowledge engineering as an empirical science: to observe existing modelling practices and possibly understanding how to improve knowledge engineering methodologies and knowledge representation formalisms. Following this perspective, several studies have analysed LOD to identify (mis-)use of OWL constructs or other modelling phenomena e.g. class or property usage, their alignment, the average depth of taxonomies. A question that remains open is whether there is a relation between observed modelling practices and knowledge domains (natural science, linguistics, etc.): do certain practices or phenomena change as the knowledge domain varies? Answering this question requires an assessment of the domains covered by LOD as well as a classification of its datasets. Existing approaches to classify LOD datasets provide partial and unaligned views, posing additional challenges. In this paper, we introduce a classification of knowledge domains, and a method for classifying LOD datasets and ontologies based on it. We classify a large portion of LOD and investigate whether a set of observed phenomena have a domain-specific character

    Security and Privacy of Resource Constrained Devices

    Get PDF
    The thesis aims to present a comprehensive and holistic overview on cybersecurity and privacy & data protection aspects related to IoT resource-constrained devices. Chapter 1 introduces the current technical landscape by providing a working definition and architecture taxonomy of ‘Internet of Things’ and ‘resource-constrained devices’, coupled with a threat landscape where each specific attack is linked to a layer of the taxonomy. Chapter 2 lays down the theoretical foundations for an interdisciplinary approach and a unified, holistic vision of cybersecurity, safety and privacy justified by the ‘IoT revolution’ through the so-called infraethical perspective. Chapter 3 investigates whether and to what extent the fast-evolving European cybersecurity regulatory framework addresses the security challenges brought about by the IoT by allocating legal responsibilities to the right parties. Chapters 4 and 5 focus, on the other hand, on ‘privacy’ understood by proxy as to include EU data protection. In particular, Chapter 4 addresses three legal challenges brought about by the ubiquitous IoT data and metadata processing to EU privacy and data protection legal frameworks i.e., the ePrivacy Directive and the GDPR. Chapter 5 casts light on the risk management tool enshrined in EU data protection law, that is, Data Protection Impact Assessment (DPIA) and proposes an original DPIA methodology for connected devices, building on the CNIL (French data protection authority) model

    Disinformation and Fact-Checking in Contemporary Society

    Get PDF
    Funded by the European Media and Information Fund and research project PID2022-142755OB-I00

    Leveraging Evolutionary Changes for Software Process Quality

    Full text link
    Real-world software applications must constantly evolve to remain relevant. This evolution occurs when developing new applications or adapting existing ones to meet new requirements, make corrections, or incorporate future functionality. Traditional methods of software quality control involve software quality models and continuous code inspection tools. These measures focus on directly assessing the quality of the software. However, there is a strong correlation and causation between the quality of the development process and the resulting software product. Therefore, improving the development process indirectly improves the software product, too. To achieve this, effective learning from past processes is necessary, often embraced through post mortem organizational learning. While qualitative evaluation of large artifacts is common, smaller quantitative changes captured by application lifecycle management are often overlooked. In addition to software metrics, these smaller changes can reveal complex phenomena related to project culture and management. Leveraging these changes can help detect and address such complex issues. Software evolution was previously measured by the size of changes, but the lack of consensus on a reliable and versatile quantification method prevents its use as a dependable metric. Different size classifications fail to reliably describe the nature of evolution. While application lifecycle management data is rich, identifying which artifacts can model detrimental managerial practices remains uncertain. Approaches such as simulation modeling, discrete events simulation, or Bayesian networks have only limited ability to exploit continuous-time process models of such phenomena. Even worse, the accessibility and mechanistic insight into such gray- or black-box models are typically very low. To address these challenges, we suggest leveraging objectively [...]Comment: Ph.D. Thesis without appended papers, 102 page

    Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

    Get PDF
    The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.Open Acces
    corecore