2,308 research outputs found

    Dealing with clones in software : a practical approach from detection towards management

    Get PDF
    Despite the fact that duplicated fragments of code also called code clones are considered one of the prominent code smells that may exist in software, cloning is widely practiced in industrial development. The larger the system, the more people involved in its development and the more parts developed by different teams result in an increased possibility of having cloned code in the system. While there are particular benefits of code cloning in software development, research shows that it might be a source of various troubles in evolving software. Therefore, investigating and understanding clones in a software system is important to manage the clones efficiently. However, when the system is fairly large, it is challenging to identify and manage those clones properly. Among the various types of clones that may exist in software, research shows detection of near-miss clones where there might be minor to significant differences (e.g., renaming of identifiers and additions/deletions/modifications of statements) among the cloned fragments is costly in terms of time and memory. Thus, there is a great demand of state-of-the-art technologies in dealing with clones in software. Over the years, several tools have been developed to detect and visualize exact and similar clones. However, usually the tools are standalone and do not integrate well with a software developer's workflow. In this thesis, first, a study is presented on the effectiveness of a fingerprint based data similarity measurement technique named 'simhash' in detecting clones in large scale code-base. Based on the positive outcome of the study, a time efficient detection approach is proposed to find exact and near-miss clones in software, especially in large scale software systems. The novel detection approach has been made available as a highly configurable and fully fledged standalone clone detection tool named 'SimCad', which can be configured for detection of clones in both source code and non-source code based data. Second, we show a robust use of the clone detection approach studied earlier by assembling its detection service as a portable library named 'SimLib'. This library can provide tightly coupled (integrated) clone detection functionality to other applications as opposed to loosely coupled service provided by a typical standalone tool. Because of being highly configurable and easily extensible, this library allows the user to customize its clone detection process for detecting clones in data having diverse characteristics. We performed a user study to get some feedback on installation and use of the 'SimLib' API (Application Programming Interface) and to uncover its potential use as a third-party clone detection library. Third, we investigated on what tools and techniques are currently in use to detect and manage clones and understand their evolution. The goal was to find how those tools and techniques can be made available to a developer's own software development platform for convenient identification, tracking and management of clones in the software. Based on that, we developed a clone-aware software development platform named 'SimEclipse' to promote the practical use of code clone research and to provide better support for clone management in software. Finally, we evaluated 'SimEclipse' by conducting a user study on its effectiveness, usability and information management. We believe that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspect of code clone research and manage cloned code in software systems

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    Doctor of Philosophy

    Get PDF
    dissertationHSPB5 (aka αB-crystallin or CRYAB) is a small molecular weight heat shock protein that functions as a key chaperone in striated muscle. Mutations in HSPB5 are linked with human disease including cardiomyopathy, skeletal myopathy, and cataracts. Abnormal accumulation or protein aggregation including the mutant form of HSPB5 in muscle is a hallmark of the majority of known disease-associated HSPB5 variants, though it is yet unclear mechanistically how this mutant chaperone contributes to myopathy (reviewed in Chapter 1). This dissertation focuses on molecular studies of one such mutation, 343delT, which is associated with severe early-onset skeletal myopathy requiring ventilation to sustain life. Induced pluripotent stem cells (iPSCs) derived from the patient harboring the 343delT mutation along with genome edited isogenic control cell lines are utilized in this work as a mechanism to study the endogenous form of the protein in cell types of interest (i.e., skeletal and cardiac muscle). Molecular studies of 343delT HSPB5 demonstrate extreme insolubility of the mutant protein and suggest a loss of function mechanism for disease, though gain-of-toxic function cannot be excluded (Chapter 3). Chapter 2 is included as an addendum to the Introduction on HSPB5 (Chapter 1) to provide relevant background information on iPSCs and genome editing. The fast-paced genome editing field is currently hampered by inefficient means of isolating cells containing modifications of interest. Previously published approaches employed in Chapter 3 were highly efficient, though required a two-step editing process and were not readily scalable. Chapter 4 presents a strategy for genome editing, termed cotargeting with selection (CTS) that involves simultaneous targeting of two loci, where selection for incorporation of a selection cassette into a safe-harbor locus enriches many fold for a separate modification at a gene of interest. CTS streamlines the genome editing process compared with previous techniques. Chapter 5 of this dissertation includes a combined discussion and Chapter 6 presents future experiments planned for the study of 343delT HSPB5. Altogether, this dissertation affords molecular insights into myopathy causing 343delT HSPB5 using cutting edge technology of iPSCs and genome editing, as well as providing technical advancement to the field of genome editing

    Target Cell APOBEC3C Can Induce Limited G-to-A Mutation in HIV-1

    Get PDF
    The evolutionary success of primate lentiviruses reflects their high capacity to mutate and adapt to new host species, immune responses within individual hosts, and, in recent years, antiviral drugs. APOBEC3G (A3G) and APOBEC3F (A3F) are host cell DNA-editing enzymes that induce extensive HIV-1 mutation that severely attenuates viral replication. The HIV-1 virion infectivity factor (Vif), expressed in vivo, counteracts the antiviral activity of A3G and A3F by inducing their degradation. Other APOBECs may contribute more to viral diversity by inducing less extensive mutations allowing viral replication to persist. Here we show that in APOBEC3C (A3C)-expressing cells infected with the patient-derived HIV-1 molecular clones 210WW, 210WM, 210MW, and 210MM, and the lab-adapted molecular clone LAI, viral G-to-A mutations were detected in the presence of Vif expression. Mutations occurred primarily in the GA context and were relatively infrequent, thereby allowing for spreading infection. The mutations were absent in cells lacking A3C but were induced after transient expression of A3C in the infected target cell. Inhibiting endogenous A3C by RNA interference in Magi cells prevented the viral mutations. Thus, A3C is necessary and sufficient for G-to-A mutations in some HIV-1 strains. A3C-induced mutations occur at levels that allow replication to persist and may therefore contribute to viral diversity. Developing drugs that inhibit A3C may be a novel strategy for delaying viral escape from immune or antiretroviral inhibition

    Improving fire blight resistance in susceptible apple cultivars by different biotechnological approaches

    Get PDF
    Fire blight, caused by the bacterium Erwinia amylovora (E. amylovora), is one of the most economically important apple (Malus x domestica) pathogens worldwide. Various chemical and biological approaches can be applied to deal with the disease, but none of these is decisive. Such strategies are also prohibited in many countries due to their potential impact on human health and environment. To date, the most efficient strategy for controlling E. amylovora is thus to breed resistant/tolerant apple cultivars by manipulating one or multiple plant genes, which are associated with resistance or susceptibility to the disease. Within this context, classical breeding or genetic engineering can be applied. While conventional breeding is still considered a time-consuming and laborious process, genetic engineering methodologies represent rapid, precise and powerful alternatives to insert the desired trait into the crop of interest. In this thesis, we exploit different biotechnological approaches on the one hand to improve fire blight resistance trait by knocking-out a known susceptibility gene and on the other hand to investigate potential disease-related key genes. At first, we develop a CRISPR/Cas9-FLP/FRT-based gene editing system, mediated by Agrobacterium tumefaciens, to knock-out the fire blight susceptibility gene MdDIPM4 and generate apple (‘Gala’ and ‘Golden Delicious’) cultivars with reduced susceptibility to the disease and a minimal trace of exogenous DNA. Several transgenic lines were screened by sequencing to identify mutations in MdDIPM4. An editing efficiency of 75% was observed. Candidate lines showing loss-of-function mutation were inoculated with E. amylovora and a significant reduction (of about 40%) in disease symptoms was observed compared to wild-type plants. No CRISPR/Cas9 off-targeting activity was detected in five potential off-target regions. Thus, with the aim of removing the ‘entire’ T-DNA in those lines with reduced susceptibility to the pathogen, the FLP/FRT system was induced and the excision of the T-DNA was validated. This work demonstrates for the first time the development and application of a CRISPR/Cas9-FLP/FRT-based editing system for the production of ‘clean’ fire blight resistant apple cultivars Secondly, we investigate the apple miRNA MdmiR285N which is predicted to play a key role in the post-transcriptional regulation of 35 RNA transcripts coding for different disease resistance proteins. A complex network of potential transcriptional regulatory elements involved in plant growth and development, and in response to different hormones and stress conditions has been identified in MdmiR285N promoter in both apple and the model plant species Arabidopsis thaliana. Moreover, Spatio-temporal expression of MdmiR285N has been assessed in plants at physiological growth conditions and in response to bacterial pathogens. Our results suggest that MdmiR285N is a multifunctional microRNA which may control different processes, such as biotic stress response, plant growth and development. In parallel, a methodological work has been carried out for a precise and rapid characterization of the transgenic apple lines produced. A quantitative, rapid and cost-effective method has been developed, based on real-time PCR to quantify the copy number of nptII marker gene in apple lines and to evaluate its elimination after the activation of the recombinase system. This method may be valuable for those institutions committed to tracing ‘gmo’ apple products

    Refactoring the architecture of a polyketide gene cluster enhances docosahexaenoic acid production in Yarrowia lipolytica through improved expression and genetic stability

    Get PDF
    Background Long-chain polyunsaturated fatty acids (LC-PUFAs), such as docosahexaenoic acid (DHA), are essential for human health and have been widely used in the food and pharmaceutical industries. However, the limited availa‑ bility of natural sources, such as oily fsh, has led to the pursuit of microbial production as a promising alternative. Yarrowia lipolytica can produce various PUFAs via genetic modifcation. A recent study upgraded Y. lipolytica for DHA pro‑ duction by expressing a four-gene cluster encoding a myxobacterial PKS-like PUFA synthase, reducing the demand for redox power. However, the genetic architecture of gene expression in Y. lipolytica is complex and involves various control elements, ofering space for additional improvement of DHA production. This study was designed to optimize the expression of the PUFA cluster using a modular cloning approach. Results Expression of the monocistronic cluster with each gene under the control of the constitutive TEF promoter led to low-level DHA production. By using the minLEU2 promoter instead and incorporating additional upstream activating UAS1B4 sequences, 5’ promoter introns, and intergenic spacers, DHA production was increased by 16-fold. The producers remained stable over 185 h of cultivation. Benefcially, the diferent genetic control elements acted synergistically: UAS1B elements generally increased expression, while the intron caused gene-specifc efects. Mutants with UAS1B16 sequences within 2–8 kb distance, however, were found to be genetically unstable, which limited pro‑ duction performance over time, suggesting the avoidance of long repetitive sequence blocks in synthetic multigene clusters and careful monitoring of genetic stability in producing strains. Conclusions Overall, the results demonstrate the efectiveness of synthetic heterologous gene clusters to drive DHA production in Y. lipolytica. The combinatorial exploration of diferent genetic control elements allowed the optimiza‑ tion of DHA production. These fndings have important implications for developing Y. lipolytica strains for the indus‑ trial-scale production of valuable polyunsaturated fatty acids

    Analyzing Multigene Stacking and Genome Editing Strategies in Rice

    Get PDF
    Crop improvement through biotechnology is an integrated effort, incorporating multiple approaches like integration of genes, editing of native genes, and removal of selection marker genes. Before streamlining the protocols, the efficiency and feasibility of the individual approach and their components must be tested. This study evaluated following approaches: 1) stacking an array of genes into a single locus by site-specific integration via Cre-lox recombination in rice, 2) determining the efficiency of I-SceI and the CCR5-ZFN in the targeted excisions of gene fragments in rice and Arabidopsis, and 3) determining the efficiency of CRISPR/Cas9 in generating targeted mutations for genome editing in rice. In gene stacking, \u3e50% site-specific integration lines contained full-length integration of five genes. All genes were properly regulated by their promoters as indicated by the correlation of expression levels of the three constitutively expressed genes with their allelic number, and heat- or cold-induction levels of the two inducible genes. Analysis of I-SceI and CCR5-ZFN in rice and Arabidopsis found that these overexpressing constructs were refractory to plant transformation. The heat-inducible I-SceI expression in Arabidopsis was effective in creating somatic excisions but ineffective in generating heritable excisions. The inducible expression of CCR5-ZFN in rice, although transmitted stably to the progeny, appeared ineffective in creating detectable excisions. Finally, the application of CRISPR/Cas9 in rice was found to induce mutations at a high rate, but point-mutations occurred far more frequently than genomic deletions as determined in 114 rice lines including the primary transgenic lines and their progenies for 3 different genes. The heat-shock induced CRISPR/Cas9 was found to create heat-inducible targeted mutations that were inherited by the progeny. Additionally, mutations in the predicted off-target sites were undetectable or found at a lower rate in the heat-shock CRISPR/Cas9 lines as compared to their frequency in the constitutive‐overexpression CRISPR/Cas9 lines. In summary, while Cre-lox mediated site-specific integration and CRISPR/Cas9 mediated point-mutagenesis were highly effective in rice genome, application of I-SceI or CCR5-ZFN was problematic as tested in Arabidopsis and/or rice
    • 

    corecore