8,193 research outputs found

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

    Full text link
    Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.Comment: Under Revie

    Comparative Multiple Case Study into the Teaching of Problem-Solving Competence in Lebanese Middle Schools

    Get PDF
    This multiple case study investigates how problem-solving competence is integrated into teaching practices in private schools in Lebanon. Its purpose is to compare instructional approaches to problem-solving across three different programs: the American (Common Core State Standards and New Generation Science Standards), French (Socle Commun de Connaissances, de Compétences et de Culture), and Lebanese with a focus on middle school (grades 7, 8, and 9). The project was conducted in nine schools equally distributed among three categories based on the programs they offered: category 1 schools offered the Lebanese program, category 2 the French and Lebanese programs, and category 3 the American and Lebanese programs. Each school was treated as a separate case. Structured observation data were collected using observation logs that focused on lesson objectives and specific cognitive problem-solving processes. The two logs were created based on a document review of the requirements for the three programs. Structured observations were followed by semi-structured interviews that were conducted to explore teachers' beliefs and understandings of problem-solving competence. The comparative analysis of within-category structured observations revealed an instruction ranging from teacher-led practices, particularly in category 1 schools, to more student-centered approaches in categories 2 and 3. The cross-category analysis showed a reliance on cognitive processes primarily promoting exploration, understanding, and demonstrating understanding, with less emphasis on planning and executing, monitoring and reflecting, thus uncovering a weakness in addressing these processes. The findings of the post-observation semi-structured interviews disclosed a range of definitions of problem-solving competence prevalent amongst teachers with clear divergences across the three school categories. This research is unique in that it compares problem-solving teaching approaches across three different programs and explores underlying teachers' beliefs and understandings of problem-solving competence in the Lebanese context. It is hoped that this project will inform curriculum developers about future directions and much-anticipated reforms of the Lebanese program and practitioners about areas that need to be addressed to further improve the teaching of problem-solving competence

    Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles

    Get PDF
    This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets

    Pollution-induced community tolerance in freshwater biofilms – from molecular mechanisms to loss of community functions

    Get PDF
    Exposure to herbicides poses a threat to aquatic biofilms by affecting their community structure, physiology and function. These changes render biofilms to become more tolerant, but on the downside community tolerance has ecologic costs. A concept that addresses induced community tolerance to a pollutant (PICT) was introduced by Blanck and Wängberg (1988). The basic principle of the concept is that microbial communities undergo pollution-induced succession when exposed to a pollutant over a long period of time, which changes communities structurally and functionally and enhancing tolerance to the pollutant exposure. However, the mechanisms of tolerance and the ecologic consequences were hardly studied up to date. This thesis addresses the structural and functional changes in biofilm communities and applies modern molecular methods to unravel molecular tolerance mechanisms. Two different freshwater biofilm communities were cultivated for a period of five weeks, with one of the communities being contaminated with 4 μg L-1 diuron. Subsequently, the communities were characterized for structural and functional differences, especially focusing on their crucial role of photosynthesis. The community structure of the autotrophs was assessed using HPLC-based pigment analysis and their functional alterations were investigated using Imaging-PAM fluorometry to study photosynthesis and community oxygen profiling to determine net primary production. Then, the molecular fingerprints of the communities were measured with meta-transcriptomics (RNA-Seq) and GC-based community metabolomics approaches and analyzed with respect to changes in their molecular functions. The communities were acute exposed to diuron for one hour in a dose-response design, to reveal a potential PICT and uncover related adaptation to diuron exposure. The combination of apical and molecular methods in a dose-response design enabled the linkage of functional effects of diuron exposure and underlying molecular mechanisms based on a sensitivity analysis. Chronic exposure to diuron impaired freshwater biofilms in their biomass accrual. The contaminated communities particularly lost autotrophic biomass, reflected by the decrease in specific chlorophyll a content. This loss was associated with a change in the molecular fingerprint of the communities, which substantiates structural and physiological changes. The decline in autotrophic biomass could be due to a primary loss of sensitive autotrophic organisms caused by the selection of better adapted species in the course of chronic exposure. Related to this hypothesis, an increase in diuron tolerance has been detected in the contaminated communities and molecular mechanisms facilitating tolerance have been found. It was shown that genes of the photosystem, reductive-pentose phosphate cycle and arginine metabolism were differentially expressed among the communities and that an increased amount of potential antioxidant degradation products was found in the contaminated communities. This led to the hypothesis that contaminated communities may have adapted to oxidative stress, making them less sensitive to diuron exposure. Moreover, the photosynthetic light harvesting complex was altered and the photoprotective xanthophyll cycle was increased in the contaminated communities. Despite these adaptation strategies, the loss of autotrophic biomass has been shown to impair primary production. This impairment persisted even under repeated short-term exposure, so that the tolerance mechanisms cannot safeguard primary production as a key function in aquatic systems.:1. The effect of chemicals on organisms and their functions .............................. 1 1.1 Welcome to the anthropocene .......................................................................... 1 1.2 From cellular stress responses to ecosystem resilience ................................... 3 1.2.1 The individual pursuit for homeostasis ....................................................... 3 1.2.2 Stability from diversity ................................................................................. 5 1.3 Community ecotoxicology - a step forward in monitoring the effects of chemical pollution? ................................................................................................................. 6 1.4 Functional ecotoxicological assessment of microbial communities ................... 9 1.5 Molecular tools – the key to a mechanistic understanding of stressor effects from a functional perspective in microbial communities? ...................................... 12 2. Aims and Hypothesis ......................................................................................... 14 2.1 Research question .......................................................................................... 14 2.2 Hypothesis and outline .................................................................................... 15 2.3 Experimental approach & concept .................................................................. 16 2.3.1 Aquatic freshwater biofilms as model community ..................................... 16 2.3.2 Diuron as model herbicide ........................................................................ 17 2.3.3 Experimental design ................................................................................. 18 3. Structural and physiological changes in microbial communities after chronic exposure - PICT and altered functional capacity ................................................. 21 3.1 Introduction ..................................................................................................... 21 3.2 Methods .......................................................................................................... 23 3.2.1 Biofilm cultivation ...................................................................................... 23 3.2.2 Dry weight and autotrophic index ............................................................. 23 3.2.4 Pigment analysis of periphyton ................................................................. 23 3.2.4.1 In-vivo pigment analysis for community characterization ....................... 24 3.2.4.2 In-vivo pigment analysis based on Imaging-PAM fluorometry ............... 24 3.2.4.3 In-vivo pigment fluorescence for tolerance detection ............................. 26 3.2.4.4 Ex-vivo pigment analysis by high-pressure liquid-chromatography ....... 27 3.2.5 Community oxygen metabolism measurements ....................................... 28 3.3 Results and discussion ................................................................................... 29 3.3.1 Comparison of the structural community parameters ............................... 29 3.3.2 Photosynthetic activity and primary production of the communities after selection phase ................................................................................................. 33 3.3.3 Acquisition of photosynthetic tolerance .................................................... 34 3.3.4 Primary production at exposure conditions ............................................... 36 3.3.5 Tolerance detection in primary production ................................................ 37 3.4 Summary and Conclusion ........................................................................... 40 4. Community gene expression analysis by meta-transcriptomics ................... 41 4.1 Introduction to meta-transcriptomics ............................................................... 41 4.2. Methods ......................................................................................................... 43 4.2.1 Sampling and RNA extraction................................................................... 43 4.2.2 RNA sequencing analysis ......................................................................... 44 4.2.3 Data assembly and processing................................................................. 45 4.2.4 Prioritization of contigs and annotation ..................................................... 47 4.2.5 Sensitivity analysis of biological processes .............................................. 48 4.3 Results and discussion ................................................................................... 48 4.3.1 Characterization of the meta-transcriptomic fingerprints .......................... 49 4.3.2 Insights into community stress response mechanisms using trend analysis (DRomic’s) ......................................................................................................... 51 4.3.3 Response pattern in the isoform PS genes .............................................. 63 4.5 Summary and conclusion ................................................................................ 65 5. Community metabolome analysis ..................................................................... 66 5.1 Introduction to community metabolomics ........................................................ 66 5.2 Methods .......................................................................................................... 68 5.2.1 Sampling, metabolite extraction and derivatisation................................... 68 5.2.2 GC-TOF-MS analysis ............................................................................... 69 5.2.3 Data processing and statistical analysis ................................................... 69 5.3 Results and discussion ................................................................................... 70 5.3.1 Characterization of the metabolic fingerprints .......................................... 70 5.3.2 Difference in the metabolic fingerprints .................................................... 71 5.3.3 Differential metabolic responses of the communities to short-term exposure of diuron ............................................................................................................ 73 5.4 Summary and conclusion ................................................................................ 78 6. Synthesis ............................................................................................................. 79 6.1 Approaches and challenges for linking molecular data to functional measurements ...................................................................................................... 79 6.2 Methods .......................................................................................................... 83 6.2.1 Summary on the data ............................................................................... 83 6.2.2 Aggregation of molecular data to index values (TELI and MELI) .............. 83 6.2.3 Functional annotation of contigs and metabolites using KEGG ................ 83 6.3 Results and discussion ................................................................................... 85 6.3.1 Results of aggregation techniques ........................................................... 85 6.3.2 Sensitivity analysis of the different molecular approaches and endpoints 86 6.3.3 Mechanistic view of the molecular stress responses based on KEGG functions ............................................................................................................ 89 6.4 Consolidation of the results – holistic interpretation and discussion ............... 93 6.4.1 Adaptation to chronic diuron exposure - from molecular changes to community effects.............................................................................................. 93 6.4.2 Assessment of the ecological costs of Pollution-induced community tolerance based on primary production ............................................................. 94 6.5 Outlook ............................................................................................................ 9

    Countermeasures for the majority attack in blockchain distributed systems

    Get PDF
    La tecnología Blockchain es considerada como uno de los paradigmas informáticos más importantes posterior al Internet; en función a sus características únicas que la hacen ideal para registrar, verificar y administrar información de diferentes transacciones. A pesar de esto, Blockchain se enfrenta a diferentes problemas de seguridad, siendo el ataque del 51% o ataque mayoritario uno de los más importantes. Este consiste en que uno o más mineros tomen el control de al menos el 51% del Hash extraído o del cómputo en una red; de modo que un minero puede manipular y modificar arbitrariamente la información registrada en esta tecnología. Este trabajo se enfocó en diseñar e implementar estrategias de detección y mitigación de ataques mayoritarios (51% de ataque) en un sistema distribuido Blockchain, a partir de la caracterización del comportamiento de los mineros. Para lograr esto, se analizó y evaluó el Hash Rate / Share de los mineros de Bitcoin y Crypto Ethereum, seguido del diseño e implementación de un protocolo de consenso para controlar el poder de cómputo de los mineros. Posteriormente, se realizó la exploración y evaluación de modelos de Machine Learning para detectar software malicioso de tipo Cryptojacking.DoctoradoDoctor en Ingeniería de Sistemas y Computació

    Binaural virtual auditory display for music discovery and recommendation

    Get PDF
    Emerging patterns in audio consumption present renewed opportunity for searching or navigating music via spatial audio interfaces. This thesis examines the potential benefits and considerations for using binaural audio as the sole or principal output interface in a music browsing system. Three areas of enquiry are addressed. Specific advantages and constraints in spatial display of music tracks are explored in preliminary work. A voice-led binaural music discovery prototype is shown to offer a contrasting interactive experience compared to a mono smartspeaker. Results suggest that touch or gestural interaction may be more conducive input modes in the former case. The limit of three binaurally spatialised streams is identified from separate data as a usability threshold for simultaneous presentation of tracks, with no evident advantages derived from visual prompts to aid source discrimination or localisation. The challenge of implementing personalised binaural rendering for end-users of a mobile system is addressed in detail. A custom framework for assessing head-related transfer function (HRTF) selection is applied to data from an approach using 2D rendering on a personal computer. That HRTF selection method is developed to encompass 3D rendering on a mobile device. Evaluation against the same criteria shows encouraging results in reliability, validity, usability and efficiency. Computational analysis of a novel approach for low-cost, real-time, head-tracked binaural rendering demonstrates measurable advantages compared to first order virtual Ambisonics. Further perceptual evaluation establishes working parameters for interactive auditory display use cases. In summation, the renderer and identified tolerances are deployed with a method for synthesised, parametric 3D reverberation (developed through related research) in a final prototype for mobile immersive playlist editing. Task-oriented comparison with a graphical interface reveals high levels of usability and engagement, plus some evidence of enhanced flow state when using the eyes-free binaural system
    corecore