42,477 research outputs found

    An Empirical Study of Cohesion and Coupling: Balancing Optimisation and Disruption

    Get PDF
    Search based software engineering has been extensively applied to the problem of finding improved modular structures that maximise cohesion and minimise coupling. However, there has, hitherto, been no longitudinal study of developers’ implementations, over a series of sequential releases. Moreover, results validating whether developers respect the fitness functions are scarce, and the potentially disruptive effect of search-based remodularisation is usually overlooked. We present an empirical study of 233 sequential releases of 10 different systems; the largest empirical study reported in the literature so far, and the first longitudinal study. Our results provide evidence that developers do, indeed, respect the fitness functions used to optimise cohesion/coupling (they are statistically significantly better than arbitrary choices with p << 0.01), yet they also leave considerable room for further improvement (cohesion/coupling can be improved by 25% on average). However, we also report that optimising the structure is highly disruptive (on average more than 57% of the structure must change), while our results reveal that developers tend to avoid such disruption. Therefore, we introduce and evaluate a multi-objective evolutionary approach that minimises disruption while maximising cohesion/coupling improvement. This allows developers to balance reticence to disrupt existing modular structure, against their competing need to improve cohesion and coupling. The multi-objective approach is able to find modular structures that improve the cohesion of developers’ implementations by 22.52%, while causing an acceptably low level of disruption (within that already tolerated by developers)

    Gene expression in large pedigrees: analytic approaches.

    Get PDF
    BackgroundWe currently have the ability to quantify transcript abundance of messenger RNA (mRNA), genome-wide, using microarray technologies. Analyzing genotype, phenotype and expression data from 20 pedigrees, the members of our Genetic Analysis Workshop (GAW) 19 gene expression group published 9 papers, tackling some timely and important problems and questions. To study the complexity and interrelationships of genetics and gene expression, we used established statistical tools, developed newer statistical tools, and developed and applied extensions to these tools.MethodsTo study gene expression correlations in the pedigree members (without incorporating genotype or trait data into the analysis), 2 papers used principal components analysis, weighted gene coexpression network analysis, meta-analyses, gene enrichment analyses, and linear mixed models. To explore the relationship between genetics and gene expression, 2 papers studied expression quantitative trait locus allelic heterogeneity through conditional association analyses, and epistasis through interaction analyses. A third paper assessed the feasibility of applying allele-specific binding to filter potential regulatory single-nucleotide polymorphisms (SNPs). Analytic approaches included linear mixed models based on measured genotypes in pedigrees, permutation tests, and covariance kernels. To incorporate both genotype and phenotype data with gene expression, 4 groups employed linear mixed models, nonparametric weighted U statistics, structural equation modeling, Bayesian unified frameworks, and multiple regression.Results and discussionRegarding the analysis of pedigree data, we found that gene expression is familial, indicating that at least 1 factor for pedigree membership or multiple factors for the degree of relationship should be included in analyses, and we developed a method to adjust for familiality prior to conducting weighted co-expression gene network analysis. For SNP association and conditional analyses, we found FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) and SOLAR-MGA (Sequential Oligogenic Linkage Analysis Routines -Major Gene Analysis) have similar type 1 and type 2 errors and can be used almost interchangeably. To improve the power and precision of association tests, prior knowledge of DNase-I hypersensitivity sites or other relevant biological annotations can be incorporated into the analyses. On a biological level, eQTL (expression quantitative trait loci) are genetically complex, exhibiting both allelic heterogeneity and epistasis. Including both genotype and phenotype data together with measurements of gene expression was found to be generally advantageous in terms of generating improved levels of significance and in providing more interpretable biological models.ConclusionsPedigrees can be used to conduct analyses of and enhance gene expression studies

    A multiple hill climbing approach to software module clustering

    Get PDF
    Automated software module clustering is important for maintenance of legacy systems written in a 'monolithic format' with inadequate module boundaries. Even where systems were originally designed with suitable module boundaries, structure tends to degrade as the system evolves, making re-modularization worthwhile. This paper focuses upon search-based approaches to the automated module clustering problem, where hitherto, the local search approach of hill climbing has been found to be most successful. In the paper we show that results from a set of multiple hill climbs can be combined to locate good 'building blocks' for subsequent searches. Building blocks are formed by identifying the common features in a selection of best hill climbs. This process reduces the search space, while simultaneously 'hard wiring' parts of the solution. The paper reports the results of an empirical study that show that the multiple hill climbing approach does indeed guide the search to higher peaks in subsequent executions. The paper also investigates the relationship between the improved results and the system size

    Transcriptome Analyses of Tumor-Adjacent Somatic Tissues Reveal Genes Co-Expressed with Transposable Elements

    Get PDF
    Background: Despite the long-held assumption that transposons are normally only expressed in the germ-line, recent evidence shows that transcripts of transposable element (TE) sequences are frequently found in the somatic cells. However, the extent of variation in TE transcript levels across different tissues and different individuals are unknown, and the co-expression between TEs and host gene mRNAs have not been examined. Results: Here we report the variation in TE derived transcript levels across tissues and between individuals observed in the non-tumorous tissues collected for The Cancer Genome Atlas. We found core TE co-expression modules consisting mainly of transposons, showing correlated expression across broad classes of TEs. Despite this co-expression within tissues, there are individual TE loci that exhibit tissue-specific expression patterns, when compared across tissues. The core TE modules were negatively correlated with other gene modules that consisted of immune response genes in interferon signaling. KRAB Zinc Finger Proteins (KZFPs) were over-represented gene members of the TE modules, showing positive correlation across multiple tissues. But we did not find overlap between TE-KZFP pairs that are co-expressed and TE-KZFP pairs that are bound in published ChIP-seq studies. Conclusions: We find unexpected variation in TE derived transcripts, within and across non-tumorous tissues. We describe a broad view of the RNA state for non-tumorous tissues exhibiting higher level of TE transcripts. Tissues with higher level of TE transcripts have a broad range of TEs co-expressed, with high expression of a large number of KZFPs, and lower RNA levels of immune genes

    Data-Driven Application Maintenance: Views from the Trenches

    Full text link
    In this paper we present our experience during design, development, and pilot deployments of a data-driven machine learning based application maintenance solution. We implemented a proof of concept to address a spectrum of interrelated problems encountered in application maintenance projects including duplicate incident ticket identification, assignee recommendation, theme mining, and mapping of incidents to business processes. In the context of IT services, these problems are frequently encountered, yet there is a gap in bringing automation and optimization. Despite long-standing research around mining and analysis of software repositories, such research outputs are not adopted well in practice due to the constraints these solutions impose on the users. We discuss need for designing pragmatic solutions with low barriers to adoption and addressing right level of complexity of problems with respect to underlying business constraints and nature of data.Comment: Earlier version of paper appearing in proceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice (SER&IP), IEEE Press, pp. 48-54, 201

    A temporal precedence based clustering method for gene expression microarray data

    Get PDF
    Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits
    • …
    corecore