201 research outputs found

    CompaGB: An open framework for genome browsers comparison

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tools to visualize and explore genomes hold a central place in genomics and the diversity of genome browsers has increased dramatically over the last few years. It often turns out to be a daunting task to compare and choose a well-adapted genome browser, as multidisciplinary knowledge is required to carry out this task and the number of tools, functionalities and features are overwhelming.</p> <p>Findings</p> <p>To assist in this task, we propose a community-based framework based on two cornerstones: (i) the implementation of industry promoted software qualification method (QSOS) adapted for genome browser evaluations, and (ii) a web resource providing numerous facilities either for visualizing comparisons or performing new evaluations. We formulated 60 criteria specifically for genome browsers, and incorporated another 65 directly from QSOS's generic section. Those criteria aim to answer versatile needs, ranging from a biologist whose interest primarily lies into user-friendly and informative functionalities, a bioinformatician who wants to integrate the genome browser into a wider framework, or a computer scientist who might choose a software according to more technical features. We developed a dedicated web application to enrich the existing QSOS functionalities (weighting of criteria, user profile) with features of interest to a community-based framework: easy management of evolving data, user comments...</p> <p>Conclusions</p> <p>The framework is available at <url>http://genome.jouy.inra.fr/CompaGB</url>. It is open to anyone who wishes to participate in the evaluations. It helps the scientific community to (1) choose a genome browser that would better fit their particular project, (2) visualize features comparatively with easily accessible formats, such as tables or radar plots and (3) perform their own evaluation against the defined criteria. To illustrate the CompaGB functionalities, we have evaluated seven genome browsers according to the implemented methodology. A summary of the features of the compared genome browsers is presented and discussed.</p

    Identification of disease-causing genes using microarray data mining and gene ontology

    Get PDF
    Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

    Psoriasis prediction from genome-wide SNP profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the availability of large-scale genome-wide association study (GWAS) data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs) to predict psoriasis from searching GWAS data.</p> <p>Methods</p> <p>Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB) method was compared with classical linear discriminant analysis(LDA) for classification performance.</p> <p>Results</p> <p>The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698), while only 0.520(95% CI: 0.472-0.524) was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study.</p> <p>Conclusions</p> <p>The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.</p

    Notes on wormhole existence in scalar-tensor and F(R) gravity

    Full text link
    Some recent papers have claimed the existence of static, spherically symmetric wormhole solutions to gravitational field equations in the absence of ghost (or phantom) degrees of freedom. We show that in some such cases the solutions in question are actually not of wormhole nature while in cases where a wormhole is obtained, the effective gravitational constant G_eff is negative in some region of space, i.e., the graviton becomes a ghost. In particular, it is confirmed that there are no vacuum wormhole solutions of the Brans-Dicke theory with zero potential and the coupling constant \omega > -3/2, except for the case \omega = 0; in the latter case, G_eff < 0 in the region beyond the throat. The same is true for wormhole solutions of F(R) gravity: special wormhole solutions are only possible if F(R) contains an extremum at which G_eff changes its sign.Comment: 7 two-column pages, no figures, to appear in Grav. Cosmol. A misprint corrected, references update

    The radial arrangement of the human chromosome 7 in the lymphocyte cell nucleus is associated with chromosomal band gene density

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ Springer-Verlag 2008.In the nuclei of human lymphocytes, chromosome territories are distributed according to the average gene density of each chromosome. However, chromosomes are very heterogeneous in size and base composition, and can contain both very gene-dense and very gene-poor regions. Thus, a precise analysis of chromosome organisation in the nuclei should consider also the distribution of DNA belonging to the chromosomal bands in each chromosome. To improve our understanding of the chromatin organisation, we localised chromosome 7 DNA regions, endowed with different gene densities, in the nuclei of human lymphocytes. Our results showed that this chromosome in cell nuclei is arranged radially with the gene-dense/GC-richest regions exposed towards the nuclear interior and the gene-poorest/GC-poorest ones located at the nuclear periphery. Moreover, we found that chromatin fibres from the 7p22.3 and the 7q22.1 bands are not confined to the territory of the bulk of this chromosome, protruding towards the inner part of the nucleus. Overall, our work demonstrates the radial arrangement of the territory of chromosome 7 in the lymphocyte nucleus and confirms that human genes occupy specific radial positions, presumably to enhance intra- and inter-chromosomal interaction among loci displaying a similar expression pattern, and/or similar replication timing

    Defining genes: a computational framework

    Get PDF
    The precise elucidation of the gene concept has become the subject of intense discussion in light of results from several, large high-throughput surveys of transcriptomes and proteomes. In previous work, we proposed an approach for constructing gene concepts that combines genomic heritability with elements of function. Here, we introduce a definition of the gene within a computational framework of cellular interactions. The definition seeks to satisfy the practical requirements imposed by annotation, capture logical aspects of regulation, and encompass the evolutionary property of homology

    A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype

    Get PDF
    Abstract Background Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performances of Gene Set Enrichment Analysis (GSEA), Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST) methods in a simulation study and on two ovarian cancer data sets. We considered two versions of GSEA by allowing different weights: GSEA1 uses equal weights, yielding results similar to the Kolmogorov-Smirnov test; while GSEA2's weights are based on the correlation between genes and the phenotype. Results We compared GSEA1, GSEA2, GT, WT and GBST in a simulation study with various settings for the correlation structure of the genes and the association parameter between the survival outcome and the genes. Simulation results indicated that GT, WT and GBST consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the power of the five tests depends on the combination of correlation structure and association parameter. For the ovarian cancer data set, using the FDR threshold of q Conclusion Simulation studies and a real data example indicate that GT, WT and GBST tend to have high power, whereas GSEA1 and GSEA2 have lower power. We also found that the power of the five tests is much higher when genes are correlated than when genes are independent, when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-class correlation and the association between survival and genes is positive or negative (i.e., one-direction correlation).</p

    ANMM4CBR: a case-based reasoning method for gene expression data classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms.</p> <p>Method</p> <p>In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data.</p> <p>Results</p> <p>The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and <it>k </it>nearest neighbor (<it>k</it>NN), especially when the data contains a high level of noise.</p> <p>Availability</p> <p>The source code is attached as an additional file of this paper.</p

    Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

    Get PDF
    &lt;b&gt;Background&lt;/b&gt; The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers.&lt;p&gt;&lt;/p&gt; &lt;b&gt;Results&lt;/b&gt; We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets.&lt;p&gt;&lt;/p&gt; &lt;b&gt;Conclusions&lt;/b&gt; The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis

    Detection of regulator genes and eQTLs in gene networks

    Full text link
    Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure
    corecore