210 research outputs found
An introduction to low-level analysis methods of DNA microarray data
This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect
A Mutagenetic Tree Hidden Markov Model for Longitudinal Clonal HIV Sequence Data
RNA viruses provide prominent examples of measurably evolving populations. In
HIV infection, the development of drug resistance is of particular interest,
because precise predictions of the outcome of this evolutionary process are a
prerequisite for the rational design of antiretroviral treatment protocols. We
present a mutagenetic tree hidden Markov model for the analysis of longitudinal
clonal sequence data. Using HIV mutation data from clinical trials, we estimate
the order and rate of occurrence of seven amino acid changes that are
associated with resistance to the reverse transcriptase inhibitor efavirenz.Comment: 20 pages, 6 figure
Recommended from our members
A short guide to increase FAIRness of atmospheric model data
The generation, processing and analysis of atmospheric model data are expensive, as atmospheric model runs are often computationally intensive and the costs of ‘fast’ disk space are rising. Moreover, atmospheric models are mostly developed by groups of scientists over many years and therefore only few appropriate models exist for specific analyses, e.g. for urban climate. Hence, atmospheric model data should be made available for reuse by scientists, the public sector, companies and other stakeholders. Thereby, this leads to an increasing need for swift, user-friendly adaptation of standards.The FAIR data principles (Findable, Accessible, Interoperable, Reusable) were established to foster the reuse of data. Research data become findable and accessible if they are published in public repositories with general metadata and Persistent Identifiers (PIDs), e.g. DataCite DOIs. The use of PIDs should ensure that describing metadata is persistently available. Nevertheless, PIDs and basic metadata do not guarantee that the data are indeed interoperable and reusable without project-specific knowledge. Additionally, the lack of standardised machine-readable metadata reduces the FAIRness of data. Unfortunately, there are no common standards for non-climate models, e.g. for mesoscale models, available. This paper proposes a concept to improve the FAIRness of archived atmospheric model data. This concept was developed within the AtMoDat project (Atmospheric Model Data). The approach consists of several aspects, each of which is easy to implement: requirements for rich metadata with controlled vocabulary, the landing pages, file formats (netCDF) and the structure within the files. The landing pages are a core element of this concept as they should be human- and machine readable, hold discipline-specific metadata and present metadata on simulation and variable level. This guide is meant to help data producers and curators to prepare data for publication. Furthermore, this guide provides information for the choice of keywords, which supports data reusers in their search for data with search engines. © 2020 The author
Recommended from our members
The ATMODAT Standard enhances FAIRness of Atmospheric Model data
Within the AtMoDat project (Atmospheric Model Data, www.atmodat.de), a standard has been developed which is meant for improving the FAIRness of atmospheric model data published in repositories. Atmospheric model data form the basis to understand and predict natural events, including atmospheric circulation, local air quality patterns, and the planetary energy budget. Such data should be made available for evaluation and reuse by scientists, the public sector, and relevant stakeholders.
Atmospheric modeling is ahead of other fields in many regards towards FAIR (Findable, Accessible, Interoperable, Reusable, see e.g. Wilkinson et al. (2016, doi:10.1101/418376)) data: many models write their output directly into netCDF or file formats that can be converted into netCDF. NetCDF is a non-proprietary, binary, and self-describing format, ensuring interoperability and facilitating reusability. Nevertheless, consistent human- and machine-readable standards for discipline-specific metadata are also necessary. While standardisation of file structure and metadata (e.g. the Climate and Forecast Conventions) is well established for
some subdomains of the earth system modeling community (e.g. the Coupled Model Intercomparison Project, Juckes et al. (2020,
https:doi.org/10.5194/gmd-13-201-2020)), other subdomains are still lacking such standardisation. For example, standardisation is not well advanced for obstacle-resolving atmospheric models (e.g. for urban-scale modeling).
The ATMODAT standard, which will be presented here, includes concrete recommendations related to the maturity, publication, and enhanced FAIRness of atmospheric model data. The suggestions include requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF), and the structure within files. Human- and machine-readable landing pages are a core element of this standard and should hold and present discipline-specific metadata on simulation and variable level
Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools
From a research data repositories’ perspective, offering research data management services in line with the FAIR principles is becoming increasingly important. However, there exists no globally established and trusted approach to evaluate FAIRness to date. Here, we apply five different available FAIRness evaluation approaches to selected data archived in the World Data Center for Climate (WDCC). Two approaches are purely automatic, two approaches are purely manual and one approach applies a hybrid method (manual and automatic combined).
The results of our evaluation show an overall mean FAIR score of WDCC-archived (meta) data of 0.67 of 1, with a range of 0.5 to 0.88. Manual approaches show higher scores than automated ones and the hybrid approach shows the highest score. Computed statistics indicate that the test approaches show an overall good agreement at the data collection level.
We find that while neither one of the five valuation approaches is fully fit-forpurpose to evaluate (discipline-specific) FAIRness, all have their individual strengths. Specifically, manual approaches capture contextual aspects of FAIRness relevant for reuse, whereas automated approaches focus on the strictly standardised aspects of machine actionability. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation approaches.
Based on our results, we recommend future FAIRness evaluation tools to be based on a mature hybrid approach. Especially the design and adoption of the discipline-specific aspects of FAIRness will have to be conducted in concerted community efforts
Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools
From a research data repositories’ perspective, offering research data management services in line with the FAIR principles is becoming increasingly important. However, there exists no globally established and trusted approach to evaluate FAIRness to date. Here, we apply five different available FAIRness evaluation approaches to selected data archived in the World Data Center for Climate (WDCC). Two approaches are purely automatic, two approaches are purely manual and one approach applies a hybrid method (manual and automatic combined).
The results of our evaluation show an overall mean FAIR score of WDCC-archived (meta) data of 0.67 of 1, with a range of 0.5 to 0.88. Manual approaches show higher scores than automated ones and the hybrid approach shows the highest score. Computed statistics indicate that the test approaches show an overall good agreement at the data collection level.
We find that while neither one of the five valuation approaches is fully fit-forpurpose to evaluate (discipline-specific) FAIRness, all have their individual strengths. Specifically, manual approaches capture contextual aspects of FAIRness relevant for reuse, whereas automated approaches focus on the strictly standardised aspects of machine actionability. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation approaches.
Based on our results, we recommend future FAIRness evaluation tools to be based on a mature hybrid approach. Especially the design and adoption of the discipline-specific aspects of FAIRness will have to be conducted in concerted community efforts
Origin and pathogenesis of nodular lymphocyte–predominant Hodgkin lymphoma as revealed by global gene expression analysis
The pathogenesis of nodular lymphocyte–predominant Hodgkin lymphoma (NLPHL) and its relationship to other lymphomas are largely unknown. This is partly because of the technical challenge of analyzing its rare neoplastic lymphocytic and histiocytic (L&H) cells, which are dispersed in an abundant nonneoplastic cellular microenvironment. We performed a genome-wide expression study of microdissected L&H lymphoma cells in comparison to normal and other malignant B cells that indicated a relationship of L&H cells to and/or that they originate from germinal center B cells at the transition to memory B cells. L&H cells show a surprisingly high similarity to the tumor cells of T cell–rich B cell lymphoma and classical Hodgkin lymphoma, a partial loss of their B cell phenotype, and deregulation of many apoptosis regulators and putative oncogenes. Importantly, L&H cells are characterized by constitutive nuclear factor {kappa}B activity and aberrant extracellular signal-regulated kinase signaling. Thus, these findings shed new light on the nature of L&H cells, reveal several novel pathogenetic mechanisms in NLPHL, and may help in differential diagnosis and lead to novel therapeutic strategies
Recommended from our members
ATMODAT Standard v3.0
Within the AtMoDat project (Atmospheric Model Data), a standard has been developed which is meant for improving the FAIRness of atmospheric model data published in repositories. The ATMODAT standard includes concrete recommendations related to the maturity, publication and enhanced FAIRness of atmospheric model data. The suggestions include requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine readable landing pages are a core element of this standard, and should hold and present discipline-specific metadata on simulation and variable level.
This standard is an updated and translated version of "Bericht über initialen Kernstandard und Kurationskriterien des AtMoDat Projektes (v2.4
An Assessment of the Role of DNA Adenine Methyltransferase on Gene Expression Regulation in E coli
N6-Adenine methylation is an important epigenetic signal, which regulates various processes, such as DNA replication and repair and transcription. In γ-proteobacteria, Dam is a stand-alone enzyme that methylates GATC sites, which are non-randomly distributed in the genome. Some of these overlap with transcription factor binding sites. This work describes a global computational analysis of a published Dam knockout microarray alongside other publicly available data to throw insights into the extent to which Dam regulates transcription by interfering with protein binding. The results indicate that DNA methylation by DAM may not globally affect gene transcription by physically blocking access of transcription factors to binding sites. Down-regulation of Dam during stationary phase correlates with the activity of TFs whose binding sites are enriched for GATC sites
Origin and pathogenesis of nodular lymphocyte–predominant Hodgkin lymphoma as revealed by global gene expression analysis
The pathogenesis of nodular lymphocyte–predominant Hodgkin lymphoma (NLPHL) and its relationship to other lymphomas are largely unknown. This is partly because of the technical challenge of analyzing its rare neoplastic lymphocytic and histiocytic (L&H) cells, which are dispersed in an abundant nonneoplastic cellular microenvironment. We performed a genome-wide expression study of microdissected L&H lymphoma cells in comparison to normal and other malignant B cells that indicated a relationship of L&H cells to and/or that they originate from germinal center B cells at the transition to memory B cells. L&H cells show a surprisingly high similarity to the tumor cells of T cell–rich B cell lymphoma and classical Hodgkin lymphoma, a partial loss of their B cell phenotype, and deregulation of many apoptosis regulators and putative oncogenes. Importantly, L&H cells are characterized by constitutive nuclear factor κB activity and aberrant extracellular signal-regulated kinase signaling. Thus, these findings shed new light on the nature of L&H cells, reveal several novel pathogenetic mechanisms in NLPHL, and may help in differential diagnosis and lead to novel therapeutic strategies
- …