6,060 research outputs found

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Boundary Spanner Corruption in Business Relationships

    Get PDF
    Boundary spanner corruption—voluntary collaborative behaviour between individuals representing different organisations that violates their organisations’ norms—is a serious problem in business relationships. Drawing on insights from the literatures on general corruption perspectives, the dark side of business relationships and deviance in sales and service organisations, this dissertation identifies boundary spanner corruption as a potential dark side complication inherent in close business relationships It builds research questions from these literature streams and proposes a research structure based upon commonly used methods in corruption research to address this new concept. In the first study, using an exploratory survey of boundary spanner practitioners, the dissertation finds that the nature of boundary spanner corruption is broad and encompasses severe and non-severe types. The survey also finds that these deviance types are prevalent in a widespread of geographies and industries. This prevalence is particularly noticeable for less-severe corruption types, which may be an under-researched phenomenon in general corruption research. The consequences of boundary spanner corruption can be serious for both individuals and organisations. Indeed, even less-severe types can generate long-term negative consequences. A second interview-based study found that multi-level trust factors could also motivate the emergence of boundary spanner corruption. This was integrated into a theoretical model that illustrates how trust at the interpersonal, intraorganisational, and interorganisational levels enables corrupt behaviours by allowing deviance-inducing factors stemming from the task environment or from the individual boundary spanner to manifest in boundary spanner corruption. Interpersonal trust between representatives of different organisations, interorganisational trust between these organisations, and intraorganisational agency trust of management in their representatives foster the development of a boundary-spanning social cocoon—a mechanism that can inculcate deviant norms leading to corrupt behaviour. This conceptualisation and model of boundary spanner corruption highlights intriguing directions for future research to support practitioners engaged in a difficult problem in business relationships

    Resilience and food security in a food systems context

    Get PDF
    This open access book compiles a series of chapters written by internationally recognized experts known for their in-depth but critical views on questions of resilience and food security. The book assesses rigorously and critically the contribution of the concept of resilience in advancing our understanding and ability to design and implement development interventions in relation to food security and humanitarian crises. For this, the book departs from the narrow beaten tracks of agriculture and trade, which have influenced the mainstream debate on food security for nearly 60 years, and adopts instead a wider, more holistic perspective, framed around food systems. The foundation for this new approach is the recognition that in the current post-globalization era, the food and nutritional security of the world’s population no longer depends just on the performance of agriculture and policies on trade, but rather on the capacity of the entire (food) system to produce, process, transport and distribute safe, affordable and nutritious food for all, in ways that remain environmentally sustainable. In that context, adopting a food system perspective provides a more appropriate frame as it incites to broaden the conventional thinking and to acknowledge the systemic nature of the different processes and actors involved. This book is written for a large audience, from academics to policymakers, students to practitioners

    Knowledge Graph Building Blocks: An easy-to-use Framework for developing FAIREr Knowledge Graphs

    Full text link
    Knowledge graphs and ontologies provide promising technical solutions for implementing the FAIR Principles for Findable, Accessible, Interoperable, and Reusable data and metadata. However, they also come with their own challenges. Nine such challenges are discussed and associated with the criterion of cognitive interoperability and specific FAIREr principles (FAIR + Explorability raised) that they fail to meet. We introduce an easy-to-use, open source knowledge graph framework that is based on knowledge graph building blocks (KGBBs). KGBBs are small information modules for knowledge-processing, each based on a specific type of semantic unit. By interrelating several KGBBs, one can specify a KGBB-driven FAIREr knowledge graph. Besides implementing semantic units, the KGBB Framework clearly distinguishes and decouples an internal in-memory data model from data storage, data display, and data access/export models. We argue that this decoupling is essential for solving many problems of knowledge management systems. We discuss the architecture of the KGBB Framework as we envision it, comprising (i) an openly accessible KGBB-Repository for different types of KGBBs, (ii) a KGBB-Engine for managing and operating FAIREr knowledge graphs (including automatic provenance tracking, editing changelog, and versioning of semantic units); (iii) a repository for KGBB-Functions; (iv) a low-code KGBB-Editor with which domain experts can create new KGBBs and specify their own FAIREr knowledge graph without having to think about semantic modelling. We conclude with discussing the nine challenges and how the KGBB Framework provides solutions for the issues they raise. While most of what we discuss here is entirely conceptual, we can point to two prototypes that demonstrate the principle feasibility of using semantic units and KGBBs to manage and structure knowledge graphs

    A citizen science approach to the characterisation and modelling of urban pluvial flooding

    Get PDF
    Urban pluvial flooding (UPF), a growing challenge across cities worldwide that is expected to worsen due to climate change and urbanisation, requires comprehensive response strategies. However, the characterisation and simulation of UPF is more complex than traditional catchment hydrological modelling because UPF is driven by a complex set of interconnected factors and modelling constraints. Different integrated approaches have attempted to address UPF by coupling humans and environmental systems and reflecting on the possible outcomes from the interactions among varied disciplines. Nonetheless, it is argued that current integrated approaches are insufficient. To further improve the characterisation and modelling of UPF, this study advances a citizen science approach that integrates local knowledge with the understanding and interpretation of UPF. The proposed framework provides an avenue to couple quantitative and qualitative community-based observations with traditional sources of hydro-information. This approach allows researchers and practitioners to fill spatial and temporal data gaps in urban catchments and hydrologic/hydrodynamic models, thus yielding a more accurate characterisation of local catchment response and improving rainfall-runoff modelling of UPF. The results of applying this framework indicate how community-based practices provide a bi-directional learning context between experts and residents, which can contribute to resilience building by providing UPF knowledge necessary for risk reduction and response to extreme flooding events

    Decoding spatial location of attended audio-visual stimulus with EEG and fNIRS

    Get PDF
    When analyzing complex scenes, humans often focus their attention on an object at a particular spatial location in the presence of background noises and irrelevant visual objects. The ability to decode the attended spatial location would facilitate brain computer interfaces (BCI) for complex scene analysis. Here, we tested two different neuroimaging technologies and investigated their capability to decode audio-visual spatial attention in the presence of competing stimuli from multiple locations. For functional near-infrared spectroscopy (fNIRS), we targeted dorsal frontoparietal network including frontal eye field (FEF) and intra-parietal sulcus (IPS) as well as superior temporal gyrus/planum temporal (STG/PT). They all were shown in previous functional magnetic resonance imaging (fMRI) studies to be activated by auditory, visual, or audio-visual spatial tasks. We found that fNIRS provides robust decoding of attended spatial locations for most participants and correlates with behavioral performance. Moreover, we found that FEF makes a large contribution to decoding performance. Surprisingly, the performance was significantly above chance level 1s after cue onset, which is well before the peak of the fNIRS response. For electroencephalography (EEG), while there are several successful EEG-based algorithms, to date, all of them focused exclusively on auditory modality where eye-related artifacts are minimized or controlled. Successful integration into a more ecological typical usage requires careful consideration for eye-related artifacts which are inevitable. We showed that fast and reliable decoding can be done with or without ocular-removal algorithm. Our results show that EEG and fNIRS are promising platforms for compact, wearable technologies that could be applied to decode attended spatial location and reveal contributions of specific brain regions during complex scene analysis

    Acoustic modelling, data augmentation and feature extraction for in-pipe machine learning applications

    Get PDF
    Gathering measurements from infrastructure, private premises, and harsh environments can be difficult and expensive. From this perspective, the development of new machine learning algorithms is strongly affected by the availability of training and test data. We focus on audio archives for in-pipe events. Although several examples of pipe-related applications can be found in the literature, datasets of audio/vibration recordings are much scarcer, and the only references found relate to leakage detection and characterisation. Therefore, this work proposes a methodology to relieve the burden of data collection for acoustic events in deployed pipes. The aim is to maximise the yield of small sets of real recordings and demonstrate how to extract effective features for machine learning. The methodology developed requires the preliminary creation of a soundbank of audio samples gathered with simple weak annotations. For practical reasons, the case study is given by a range of appliances, fittings, and fixtures connected to pipes in domestic environments. The source recordings are low-reverberated audio signals enhanced through a bespoke spectral filter and containing the desired audio fingerprints. The soundbank is then processed to create an arbitrary number of synthetic augmented observations. The data augmentation improves the quality and the quantity of the metadata and automatically creates strong and accurate annotations that are both machine and human-readable. Besides, the implemented processing chain allows precise control of properties such as signal-to-noise ratio, duration of the events, and the number of overlapping events. The inter-class variability is expanded by recombining source audio blocks and adding simulated artificial reverberation obtained through an acoustic model developed for the purpose. Finally, the dataset is synthesised to guarantee separability and balance. A few signal representations are optimised to maximise the classification performance, and the results are reported as a benchmark for future developments. The contribution to the existing knowledge concerns several aspects of the processing chain implemented. A novel quasi-analytic acoustic model is introduced to simulate in-pipe reverberations, adopting a three-layer architecture particularly convenient for batch processing. The first layer includes two algorithms: one for the numerical calculation of the axial wavenumbers and one for the separation of the modes. The latter, in particular, provides a workaround for a problem not explicitly treated in the literature and related to the modal non-orthogonality given by the solid-liquid interface in the analysed domain. A set of results for different waveguides is reported to compare the dispersive behaviour against different mechanical configurations. Two more novel solutions are also included in the second layer of the model and concern the integration of the acoustic sources. Specifically, the amplitudes of the non-orthogonal modal potentials are obtained using either a distance minimisation objective function or by solving an analytical decoupling problem. In both cases, results show that sources sufficiently smooth can be approximated with a limited number of modes keeping the error below 1%. The last layer proposes a bespoke approach for the integration of the acoustic model into the synthesiser as a reverberation simulator. Additional elements of novelty relate to the other blocks of the audio synthesiser. The statistical spectral filter, for instance, is a batch-processing solution for the attenuation of the background noise of the source recordings. The signal-to-noise ratio analysis for both moderate and high noise levels indicates a clear improvement of several decibels against the closest filter example in the literature. The recombination of the audio blocks and the system of fully tracked annotations are also novel extensions of similar approaches recently adopted in other contexts. Moreover, a bespoke synthesis strategy is proposed to guarantee separable and balanced datasets. The last contribution concerns the extraction of convenient sets of audio features. Elements of novelty are introduced for the optimisation of the filter banks of the mel-frequency cepstral coefficients and the scattering wavelet transform. In particular, compared to the respective standard definitions, the average F-score performance of the optimised features is roughly 6% higher in the first case and 2.5% higher for the latter. Finally, the soundbank, the synthetic dataset, and the fundamental blocks of the software library developed are publicly available for further research

    Analyzing Usage Conflict Situations in Localized Spectrum Sharing Scenarios: An Agent-Based Modeling and Machine Learning Approach

    Get PDF
    As spectrum sharing matures, different approaches have been proposed for a more efficient allocation, assignment, and usage of spectrum resources. These approaches include cognitive radios, multi-level user definitions, radio environment maps, among others. However, spectrum usage conflicts (e.g., "harmful" interference) remain a common challenge in spectrum sharing schemes. In particular, in conflict situations where it is necessary to take actions to ensure the sound operations of sharing agreements. A typical example of a usage conflict is where incumbents' tolerable levels of interference (i.e., interference thresholds) are surpassed. In this work, we present a new method to examine and study spectrum usage conflicts. A fundamental goal of this project is to capture local resource usage patterns to provide more realistic estimates of interference. For this purpose, we have defined two spectrum and network-specific characteristics that directly impact the local interference assessment: resource access strategy and governance framework. Thus, we are able to test the viability in spectrum sharing situations of distributed or decentralized governance systems, including polycentric and self-governance. In addition, we are able to design, model, and test a multi-tier spectrum sharing scheme that provides stakeholders with more flexible resource access opportunities. To perform this dynamic and localized study of spectrum usage and conflicts, we rely on Agent-Based Modeling (ABM) as our main analysis instrument. A crucial component for capturing local resource usage patterns is to provide agents with local information about their spectrum situation. Thus, the environment of the models presented in this dissertation are given by the REM's Interference Cartography (IC) map. Additionally, the agents' definitions and actions are the results of the interaction of the technical aspects of resource access and management, stakeholder interactions, and the underlying usage patterns as defined in the Common Pool Resource (CPR) literature. Finally, to capture local resource usage patterns and, consequently, provide more realistic estimates of conflict situations, we enhance the classical rule-based ABM approach by using Machine Learning (ML) techniques. Via ML algorithms, we refine the internal models of agents in an ABM. Thus, the agents' internal models allow them to choose more suitable responses to changes in the environment

    La détection d'anomalies comme outil de renforcement d'analyse des données et de prédiction dans l'éducation

    Get PDF
    Les établissements d'enseignement cherchent à concevoir des mécanismes efficaces pour améliorer les résultats scolaires, renforcer le processus d'apprentissage et éviter l'abandon scolaire. L'analyse et la prédiction des performances des étudiants au cours de leurs études peuvent mettre en évidence certaines lacunes d'une formation et détecter les étudiants ayant des problèmes d'apprentissage. Il s'agit donc de développer des techniques et des modèles basés sur des données qui visent à améliorer l'enseignement et l'apprentissage. Les modèles classiques ignorent généralement les étudiants présentant des comportements et incohérences inhabituels, bien qu'ils puissent fournir des informations importantes aux experts du domaine et améliorer les modèles de prédiction. Les profils atypiques dans l'éducation sont à peine explorés et leur impact sur les modèles de prédiction n'a pas encore été étudié dans la littérature. Cette thèse vise donc à étudier les valeurs anormales dans les données éducatives et à étendre les connaissances existantes à leur sujet. La thèse présente trois études de cas de détection de données anormales pour différents contextes éducatifs et modes de représentation des données (jeu de données numériques pour une université allemande, jeu de données numériques pour une université russe, jeu de données séquentiel pour les écoles d'infirmières françaises). Pour chaque cas, l'approche de prétraitement des données est proposée en tenant compte des particularités du jeu de données. Les données préparées ont été utilisées pour détecter les valeurs anormales dans des conditions de vérité terrain inconnue. Les caractéristiques des valeurs anormales détectées ont été explorées et analysées, ce qui a permis d'étendre les connaissances sur le comportement des étudiants dans un processus d'apprentissage. L'une des principales tâches dans le domaine de l'éducation est de développer des mécanismes essentiels qui permettront d'améliorer les résultats scolaires et de réduire l'abandon scolaire. Ainsi, il est nécessaire de construire des modèles de prédiction de performance qui sont capables de détecter les étudiants ayant des problèmes d'apprentissage, qui ont besoin d'une aide spéciale. Le deuxième objectif de la thèse est d'étudier l'impact des valeurs anormales sur les modèles de prédiction. Nous avons considéré deux des tâches de prédiction les plus courantes dans le domaine de l'éducation: (i) la prédiction de l'abandon scolaire, (ii) la prédiction du score final. Les modèles de prédiction ont été comparés en fonction de différents algorithmes de prédiction et de la présence de valeurs anormales dans les données d'entraînement. Cette thèse ouvre de nouvelles voies pour étudier les performances des élèves dans les environnements éducatifs. La compréhension des valeurs anormales et des raisons de leur apparition peut aider les experts du domaine à extraire des informations précieuses des données. La détection des valeurs aberrantes pourrait faire partie du pipeline des systèmes d'alerte précoce pour détecter les élèves à haut risque d'abandon. De plus, les tendances comportementales des valeurs aberrantes peuvent servir de base pour fournir des recommandations aux étudiants dans leurs études ou prendre des décisions concernant l'amélioration du processus éducatif.Educational institutions seek to design effective mechanisms that improve academic results, enhance the learning process, and avoid dropout. The performance analysis and performance prediction of students in their studies may show drawbacks in the educational formations and detect students with learning problems. This induces the task of developing techniques and data-based models which aim to enhance teaching and learning. Classical models usually ignore the students-outliers with uncommon and inconsistent characteristics although they may show significant information to domain experts and affect the prediction models. The outliers in education are barely explored and their impact on the prediction models has not been studied yet in the literature. Thus, the thesis aims to investigate the outliers in educational data and extend the existing knowledge about them. The thesis presents three case studies of outlier detection for different educational contexts and ways of data representation (numerical dataset for the German University, numerical dataset for the Russian University, sequential dataset for French nurse schools). For each case, the data preprocessing approach is proposed regarding the dataset peculiarities. The prepared data has been used to detect outliers in conditions of unknown ground truth. The characteristics of detected outliers have been explored and analysed, which allowed extending the comprehension of students' behaviour in a learning process. One of the main tasks in the educational domain is to develop essential tools which will help to improve academic results and reduce attrition. Thus, plenty of studies aim to build models of performance prediction which can detect students with learning problems that need special help. The second goal of the thesis is to study the impact of outliers on prediction models. The two most common prediction tasks in the educational field have been considered: (i) dropout prediction, (ii) the final score prediction. The prediction models have been compared in terms of different prediction algorithms and the presence of outliers in the training data. This thesis opens new avenues to investigate the students' performance in educational environments. The understanding of outliers and the reasons for their appearance can help domain experts to extract valuable information from the data. Outlier detection might be a part of the pipeline in the early warning systems of detecting students with a high risk of dropouts. Furthermore, the behavioral tendencies of outliers can serve as a basis for providing recommendations for students in their studies or making decisions about improving the educational process
    • …
    corecore