223 research outputs found
The Degrees of Freedom of Partial Least Squares Regression
The derivation of statistical properties for Partial Least Squares regression
can be a challenging task. The reason is that the construction of latent
components from the predictor variables also depends on the response variable.
While this typically leads to good performance and interpretable models in
practice, it makes the statistical analysis more involved. In this work, we
study the intrinsic complexity of Partial Least Squares Regression. Our
contribution is an unbiased estimate of its Degrees of Freedom. It is defined
as the trace of the first derivative of the fitted values, seen as a function
of the response. We establish two equivalent representations that rely on the
close connection of Partial Least Squares to matrix decompositions and Krylov
subspace techniques. We show that the Degrees of Freedom depend on the
collinearity of the predictor variables: The lower the collinearity is, the
higher the Degrees of Freedom are. In particular, they are typically higher
than the naive approach that defines the Degrees of Freedom as the number of
components. Further, we illustrate how the Degrees of Freedom approach can be
used for the comparison of different regression methods. In the experimental
section, we show that our Degrees of Freedom estimate in combination with
information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio
Two intracellular and cell type-specific bacterial symbionts in the placozoan Trichoplax H2
Placozoa is an enigmatic phylum of simple, microscopic, marine metazoans(1,2). Although intracellular bacteria have been found in all members of this phylum, almost nothing is known about their identity, location and interactions with their host(3-6). We used metagenomic and metatranscriptomic sequencing of single host individuals, plus metaproteomic and imaging analyses, to show that the placozoan Trichoplax sp. H2 lives in symbiosis with two intracellular bacteria. One symbiont forms an undescribed genus in the Midichloriaceae (Rickettsiales)(7,8) and has a genomic repertoire similar to that of rickettsial parasites(9,10), but does not seem to express key genes for energy parasitism. Correlative image analyses and three-dimensional electron tomography revealed that this symbiont resides in the rough endoplasmic reticulum of its host's internal fibre cells. The second symbiont belongs to the Margulisbacteria, a phylum without cultured representatives and not known to form intracellular associations(11-13). This symbiont lives in the ventral epithelial cells of Trichoplax, probably metabolizes algal lipids digested by its host and has the capacity to supplement the placozoan's nutrition. Our study shows that one of the simplest animals has evolved highly specific and intimate associations with symbiotic, intracellular bacteria and highlights that symbioses can provide access to otherwise elusive microbial dark matter
Morphology of obligate ectosymbionts reveals Paralaxus gen. nov.: A new circumtropical genus of marine stilbonematine nematodes
Stilbonematinae are a subfamily of conspicuous marine nematodes, distinguished by a coat of sulphurâoxidizing bacterial ectosymbionts on their cuticle. As most nematodes, the worm hosts have a relatively simple anatomy and few taxonomically informative characters, and this has resulted in numerous taxonomic reassignments and synonymizations. Recent studies using a combination of morphological and molecular traits have helped to improve the taxonomy of Stilbonematinae but also raised questions on the validity of several genera. Here, we describe a new circumtropically distributed genus Paralaxus (Stilbonematinae) with three species: Paralaxus cocos sp. nov., P. bermudensis sp. nov. and P. columbae sp. nov. We used single worm metagenomes to generate host 18S rRNA and cytochrome c oxidase I (COI) as well as symbiont 16S rRNA gene sequences. Intriguingly, COI alignments and primer matching analyses suggest that the COI is not suitable for PCRâbased barcoding approaches in Stilbonematinae as the genera have a highly diverse base composition and no conserved primer sites. The phylogenetic analyses of all three gene sets, however, confirm the morphological assignments and support the erection of the new genus Paralaxus as well as corroborate the status of the other stilbonematine genera. Paralaxus most closely resembles the stilbonematine genus Laxus in overlapping sets of diagnostic features but can be distinguished from Laxus by the morphology of the genusâspecific symbiont coat. Our reâanalyses of key parameters of the symbiont coat morphology as character for all Stilbonematinae genera show that with amended descriptions, including the coat, highly reliable genus assignments can be obtained
Paracatenula, an ancient symbiosis between thiotrophic Alphaproteobacteria and catenulid flatworms
Harnessing chemosynthetic symbionts is a recurring evolutionary strategy. Eukaryotes from six phyla as well as one archaeon have acquired chemoautotrophic sulfur-oxidizing bacteria. In contrast to this broad host diversity, known bacterial partners apparently belong to two classes of bacteriaâthe Gamma- and Epsilonproteobacteria. Here, we characterize the intracellular endosymbionts of the mouthless catenulid flatworm genus Paracatenula as chemoautotrophic sulfur-oxidizing Alphaproteobacteria. The symbionts of Paracatenula galateia are provisionally classified as âCandidatus Riegeria galateiaeâ based on 16S ribosomal RNA sequencing confirmed by fluorescence in situ hybridization together with functional gene and sulfur metabolite evidence. 16S rRNA gene phylogenetic analysis shows that all 16 Paracatenula species examined harbor host species-specific intracellular Candidatus Riegeria bacteria that form a monophyletic group within the order Rhodospirillales. Comparing host and symbiont phylogenies reveals strict cocladogenesis and points to vertical transmission of the symbionts. Between 33% and 50% of the body volume of the various worm species is composed of bacterial symbionts, by far the highest proportion among all known endosymbiotic associations between bacteria and metazoans. This symbiosis, which likely originated more than 500 Mya during the early evolution of flatworms, is the oldest known animalâchemoautotrophic bacteria association. The distant phylogenetic position of the symbionts compared with other mutualistic or parasitic Alphaproteobacteria promises to illuminate the common genetic predispositions that have allowed several members of this class to successfully colonize eukaryote cells
The Substrate-Bound Crystal Structure of a BaeyerâVilliger Monooxygenase Exhibits a Criegee-like Conformation
The Baeyer\u2013Villiger monooxygenases (BVMOs) are a family of bacterial flavoproteins that catalyze the synthetically useful Baeyer\u2013Villiger oxidation reaction. This involves the conversion of ketones into esters or cyclic ketones into lactones by introducing an oxygen atom adjacent to the carbonyl group. The BVMOs offer exquisite regio- and enantiospecificity while acting on a wide range of substrates. They use only NADPH and oxygen as cosubstrates, and produce only NADP+ and water as byproducts, making them environmentally attractive for industrial purposes. Here, we report the first crystal structure of a BVMO, cyclohexanone monooxygenase (CHMO) from Rhodococcus sp. HI-31 in complex with its substrate, cyclohexanone, as well as NADP+ and FAD, to 2.4 \uc5 resolution. This structure shows a drastic rotation of the NADP+ cofactor in comparison to previously reported NADP+-bound structures, as the nicotinamide moiety is no longer positioned above the flavin ring. Instead, the substrate, cyclohexanone, is found at this location, in an appropriate position for the formation of the Criegee intermediate. The rotation of NADP+ permits the substrate to gain access to the reactive flavin peroxyanion intermediate while preventing it from diffusing out of the active site. The structure thus reveals the conformation of the enzyme during the key catalytic step. CHMO is proposed to undergo a series of conformational changes to gradually move the substrate from the solvent, via binding in a solvent excluded pocket that dictates the enzyme\u2019s chemospecificity, to a location above the flavin\u2013peroxide adduct where catalysis occurs.Peer reviewed: YesNRC publication: Ye
Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
<p>Abstract</p> <p>Background</p> <p>Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of biological replicates) have additional variance that cannot be accommodated by this assumption alone. Several models have been proposed to account for this effect, all of which utilize a continuous prior distribution to explain the excess variance. Here, a Poisson mixture model, which assumes excess variability arises from sampling a mixture of distinct components, is proposed and the merits of this model are discussed and evaluated.</p> <p>Results</p> <p>The goodness of fit of the Poisson mixture model on 15 sets of biological SAGE replicates is compared to the previously proposed hierarchical gamma-Poisson (negative binomial) model, and a substantial improvement is seen. In further support of the mixture model, there is observed: 1) an increase in the number of mixture components needed to fit the expression of tags representing more than one transcript; and 2) a tendency for components to cluster libraries into the same groups. A confidence score is presented that can identify tags that are differentially expressed between groups of SAGE libraries. Several examples where this test outperforms those previously proposed are highlighted.</p> <p>Conclusion</p> <p>The Poisson mixture model performs well as a) a method to represent SAGE data from biological replicates, and b) a basis to assign significance when testing for differential expression between multiple groups of replicates. Code for the R statistical software package is included to assist investigators in applying this model to their own data.</p
Identification of SERPINA1 as single marker for papillary thyroid carcinoma through microarray meta analysis and quantification of its discriminatory power in independent validation
<p>Abstract</p> <p>Background</p> <p>Several DNA microarray based expression signatures for the different clinically relevant thyroid tumor entities have been described over the past few years. However, reproducibility of these signatures is generally low, mainly due to study biases, small sample sizes and the highly multivariate nature of microarrays. While there are new technologies available for a more accurate high throughput expression analysis, we show that there is still a lot of information to be gained from data deposited in public microarray databases. In this study we were aiming (1) to identify potential markers for papillary thyroid carcinomas through meta analysis of public microarray data and (2) to confirm these markers in an independent dataset using an independent technology.</p> <p>Methods</p> <p>We adopted a meta analysis approach for four publicly available microarray datasets on papillary thyroid carcinoma (PTC) nodules versus nodular goitre (NG) from N2-frozen tissue. The methodology included merging of datasets, bias removal using distance weighted discrimination (DWD), feature selection/inference statistics, classification/crossvalidation and gene set enrichment analysis (GSEA). External Validation was performed on an independent dataset using an independent technology, quantitative RT-PCR (RT-qPCR) in our laboratory.</p> <p>Results</p> <p>From meta analysis we identified one gene (SERPINA1) which identifies papillary thyroid carcinoma against benign nodules with 99% accuracy (n = 99, sensitivity = 0.98, specificity = 1, PPV = 1, NPV = 0.98). In the independent validation data, which included not only PTC and NG, but all major histological thyroid entities plus a few variants, SERPINA1 was again markedly up regulated (36-fold, p = 1:3*10<sup>-10</sup>) in PTC and identification of papillary carcinoma was possible with 93% accuracy (n = 82, sensitivity = 1, specificity = 0.90, PPV = 0.76, NPV = 1). We also show that the extracellular matrix pathway is strongly activated in the meta analysis data, suggesting an important role of tumor-stroma interaction in the carcinogenesis of papillary thyroid carcinoma.</p> <p>Conclusions</p> <p>We show that valuable new information can be gained from meta analysis of existing microarray data deposited in public repositories. While single microarray studies rarely exhibit a sample number which allows robust feature selection, this can be achieved by combining published data using DWD. This approach is not only efficient, but also very cost-effective. Independent validation shows the validity of the results from this meta analysis and confirms SERPINA1 as a potent mRNA marker for PTC in a total (meta analysis plus validation) of 181 samples.</p
Microanatomy of the trophosome region of Paracatenula cf. polyhymnia (Catenulida, Platyhelminthes) and its intracellular symbionts
Marine catenulid platyhelminths of the genus Paracatenula lack mouth, pharynx and gut. They live in a symbiosis with intracellular bacteria which are restricted to the body region posterior to the brain. The symbiont-housing cells (bacteriocytes) collectively form the trophosome tissue, which functionally replaces the digestive tract. It constitutes the largest part of the body and is the most important synapomorphy of this group. While some other features of the Paracatenula anatomy have already been analyzed, an in-depth analysis of the trophosome region was missing. Here, we identify and characterize the composition of the trophosome and its surrounding tissue by analyzing series of ultra-thin cross-sections of the species Paracatenula cf. polyhymnia. For the first time, a protonephridium is detected in a Paracatenula species, but it is morphologically reduced and most likely not functional. Cells containing needle-like inclusions in the reference species Paracatenula polyhymnia Sterrer and Rieger, 1974 were thought to be sperm, and the inclusions interpreted as the sperm nucleus. Our analysis of similar cells and their inclusions by EDX and Raman microspectroscopy documents an inorganic spicule consisting of a unique magnesiumâphosphate compound. Furthermore, we identify the neoblast stem cells located underneath the epidermis. Except for the modifications due to the symbiotic lifestyle and the enigmatic spicule cells, the organization of Paracatenula cf. polyhymnia conforms to that of the Catenulida in all studied aspects. Therefore, this species represents an excellent model system for further studies of host adaptation to an obligate symbiotic lifestyle
Prototypes for Content-Based Image Retrieval in Clinical Practice
Content-based image retrieval (CBIR) has been proposed as key technology for computer-aided diagnostics (CAD). This paper reviews the state of the art and future challenges in CBIR for CAD applied to clinical practice
- âŠ