23 research outputs found

    Mapping the Structure and Evolution of Chemistry Research

    Get PDF
    How does our collective scholarly knowledge grow over time? What major areas of science exist and how are they interlinked? Which areas are major knowledge producers; which ones are consumers? Computational scientometrics – the application of bibliometric/scientometric methods to large-scale scholarly datasets – and the communication of results via maps of science might help us answer these questions. This paper represents the results of a prototype study that aims to map the structure and evolution of chemistry research over a 30 year time frame. Information from the combined Science (SCIE) and Social Science (SSCI) Citations Indexes from 2002 was used to generate a disciplinary map of 7,227 journals and 671 journal clusters. Clusters relevant to study the structure and evolution of chemistry were identified using JCR categories and were further clustered into 14 disciplines. The changing scientific composition of these 14 disciplines and their knowledge exchange via citation linkages was computed. Major changes on the dominance, influence, and role of Chemistry, Biology, Biochemistry, and Bioengineering over these 30 years are discussed. The paper concludes with suggestions for future work

    ExprAlign - the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities.</p> <p>Results</p> <p>Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments.</p> <p>Conclusion</p> <p>The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data.</p

    Predicting Prokaryotic Ecological Niches Using Genome Sequence Analysis

    Get PDF
    Automated DNA sequencing technology is so rapid that analysis has become the rate-limiting step. Hundreds of prokaryotic genome sequences are publicly available, with new genomes uploaded at the rate of approximately 20 per month. As a result, this growing body of genome sequences will include microorganisms not previously identified, isolated, or observed. We hypothesize that evolutionary pressure exerted by an ecological niche selects for a similar genetic repertoire in those prokaryotes that occupy the same niche, and that this is due to both vertical and horizontal transmission. To test this, we have developed a novel method to classify prokaryotes, by calculating their Pfam protein domain distributions and clustering them with all other sequenced prokaryotic species. Clusters of organisms are visualized in two dimensions as ‘mountains’ on a topological map. When compared to a phylogenetic map constructed using 16S rRNA, this map more accurately clusters prokaryotes according to functional and environmental attributes. We demonstrate the ability of this map, which we term a “niche map”, to cluster according to ecological niche both quantitatively and qualitatively, and propose that this method be used to associate uncharacterized prokaryotes with their ecological niche as a means of predicting their functional role directly from their genome sequence

    High-throughput genomic/proteomic studies : finding structure and meaning by similarity

    Get PDF
    The post-genomic challenge was to develop high-throughput technologies for measuring genome scale mRNA expression levels. Analyses of these data rely on computers in an unprecedented way to make the results accessible to researchers. My research in this area enabled the first compendium of microarray experiments for a multi-cellular eukaryote, Caenorhabditis elegans. Prior to this research approximately 6% of the C. elegans genome had been studied, and little was known about global expression patterns in this organism. Here I cluster data from 553 different microarray experiments and show that the results are stable, statistically significant and highly enriched for specific biological functions. These enrichments allow identification of gene function for the majority of C. elegans genes. Tissue specific expression patterns are discovered suggesting the role of particular proteins in digestion, tumor suppression, protection from bacteria and from heavy metals. I report evidence that genome instability in males involves transposons, and find co-expression patterns between sperm proteins, protein kinases and phosphatases suggesting that sperm, that are transcriptionally inactive cells, commonly use phosphorylation to regulate protein activities. My subsequent research addresses protein concentrations and interactions, beginning with a simultaneous comparison of multiple data sets to analyze Saccharomyces cerevisiae gene-expression (cell cycle and exit from stationary phase/G0) and protein-interaction studies. Here, I find that G1-regulated genes are not co-regulated during exit from stationary phase, indicating that the cells are not synchronized. The tight clustering of other genes during exit from stationary-phase does indicate that the physiological responses during G0 exit are separable from cell-cycle events. Subsequently, I report in vivo proteomic research investigating population phenotypes in stationary phase cultures using the yeast Green Fluorescent Protein-fusion library (4156 strains) together with flow cytometry. Stationary phase cultures consist of dense quiescent (Q) and less dense non-quiescent (NQ) fractions. The Q-cell fraction is generally composed of daughter cells with high concentrations of proteins involved in the citric acid cycle and the electron transport chain, for example Cit1p. The NQ fraction has subpopulations of cells that can be separated by the low and high concentrations of these mitochondrial proteins, i.e., NQ cells often have double intensity peaks: a bright fraction and a much dimmer fraction, which is the case for Cit1p. The Q fraction uses oxygen 6 times as rapidly as the NQ fraction, and 1.6 times as rapidly as exponentially growing cells. NQ cells are less reproductively capable than Q cells, and show evidence of reactive oxygen species stress. These phenotypes develop as early as 20-24 hours after the diauxic shift, which is as early as we can make a differentiating measurement using fluorescence intensities. Finally, I propose a new way to analyze multidimensional flow cytometry data, which may lead to better understanding of Q/NQ cell differentiation

    A robust measure of correlation between two genes on a microarray

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The underlying goal of microarray experiments is to identify gene expression patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions could be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partition genes of interest into groups, clusters, or modules based on measures of similarity. Typically, Pearson correlation is used to measure distance (or similarity) before implementing a clustering algorithm. Pearson correlation is quite susceptible to outliers, however, an unfortunate characteristic when dealing with microarray data (well known to be typically quite noisy.)</p> <p>Results</p> <p>We propose a resistant similarity metric based on Tukey's biweight estimate of multivariate scale and location. The resistant metric is simply the correlation obtained from a resistant covariance matrix of scale. We give results which demonstrate that our correlation metric is much more resistant than the Pearson correlation while being more efficient than other nonparametric measures of correlation (e.g., Spearman correlation.) Additionally, our method gives a systematic gene flagging procedure which is useful when dealing with large amounts of noisy data.</p> <p>Conclusion</p> <p>When dealing with microarray data, which are known to be quite noisy, robust methods should be used. Specifically, robust distances, including the biweight correlation, should be used in clustering and gene network analysis.</p

    Understanding Hierarchical Clustering Results by Interactive Exploration of Dendrograms: A Case Study with Genomic Microarray Data

    Get PDF
    Abstract: Hierarchical clustering is widely used to find patterns in multi-dimensional datasets, especially for genomic microarray data. Finding groups of genes with similar expression patterns can lead to better understanding of the functions of genes. Early software tools produced only printed results, while newer ones enabled some online exploration. We describe four general techniques that could be used in interactive explorations of clustering algorithms: (1) overview of the entire dataset, coupled with a detail view so that high-level patterns and hot spots can be easily found and examined, (2) dynamic query controls so that users can restrict the number of clusters they view at a time and show those clusters more clearly, (3) coordinated displays: the overview mosaic has a bi-directional link to 2-dimensional scattergrams, (4) cluster comparisons to allow researchers to see how different clustering algorithms group the genes. (UMIACS-TR-2002-50) (HCIL-TR-2002-10

    The Entomopathogenic Bacterial Endosymbionts Xenorhabdus and Photorhabdus: Convergent Lifestyles from Divergent Genomes

    Get PDF
    Members of the genus Xenorhabdus are entomopathogenic bacteria that associate with nematodes. The nematode-bacteria pair infects and kills insects, with both partners contributing to insect pathogenesis and the bacteria providing nutrition to the nematode from available insect-derived nutrients. The nematode provides the bacteria with protection from predators, access to nutrients, and a mechanism of dispersal. Members of the bacterial genus Photorhabdus also associate with nematodes to kill insects, and both genera of bacteria provide similar services to their different nematode hosts through unique physiological and metabolic mechanisms. We posited that these differences would be reflected in their respective genomes. To test this, we sequenced to completion the genomes of Xenorhabdus nematophila ATCC 19061 and Xenorhabdus bovienii SS-2004. As expected, both Xenorhabdus genomes encode many anti-insecticidal compounds, commensurate with their entomopathogenic lifestyle. Despite the similarities in lifestyle between Xenorhabdus and Photorhabdus bacteria, a comparative analysis of the Xenorhabdus, Photorhabdus luminescens, and P. asymbiotica genomes suggests genomic divergence. These findings indicate that evolutionary changes shaped by symbiotic interactions can follow different routes to achieve similar end points

    Comparative Analysis of Thresholding Algorithms for Microarray-derived Gene Correlation Matrices

    Get PDF
    The thresholding problem is important in today’s data-rich research scenario. A threshold is a well-defined point in the data distribution beyond which the data is highly likely to have scientific meaning. The selection of threshold is crucial since it heavily influences any downstream analysis and inferences made there from. A legitimate threshold is one that is not arbitrary but scientifically well grounded, data-dependent and best segregates the information-rich and noisy sections of data. Although the thresholding problem is not restricted to any particular field of study, little research has been done. This study investigates the problem in context of network-based analysis of transcriptomic data. Six conceptually diverse algorithms – based on number of maximal cliques, correlations of control spots with genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values and statistical power – are used to threshold the gene correlation matrices of three time-series microarray datasets and tested for stability and validity. Stability or reliability of the first four algorithms towards thresholding is tested upon block bootstrapping of arrays in the datasets and comparing the estimated thresholds against the bootstrap threshold distributions. Validity of thresholding algorithms is tested by comparison of the estimated thresholds against threshold based on biological information. Thresholds based on the modular basis of gene networks are concluded to perform better both in terms of stability as well as validity. Future challenges to research the problem have been identified. Although the study utilizes transcriptomic data for analysis, we assert its applicability to thresholding across various fields
    corecore