20 research outputs found

    A framework for significance analysis of gene expression data using dimension reduction methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems.</p> <p>Results</p> <p>Three public data sets are analysed. One is used for classification, one contains spiked-in transcripts of known concentrations, and one represents a regression problem with several measured responses. Model-based significance analysis is performed using a modified version of Hotelling's <it>T</it><sup>2</sup>-test, and a false discovery rate significance level is estimated by resampling. Our results show that underlying biological phenomena and unknown relationships in the data can be detected by a simple visual interpretation of the model parameters. It is also found that measured phenotypic responses may model the expression data more accurately than if the design-parameters are used as input. For the classification data, our method finds much the same genes as the standard methods, in addition to some extra which are shown to be biologically relevant. The list of spiked-in genes is also reproduced with high accuracy.</p> <p>Conclusion</p> <p>The dimension reduction methods are versatile tools that may also be used for significance testing. Visual inspection of model components is useful for interpretation, and the methodology is the same whether the goal is classification, prediction of responses, feature selection or exploration of a data set. The presented framework is conceptually and algorithmically simple, and a Matlab toolbox (Mathworks Inc, USA) is supplemented.</p

    A Dense SNP-Based Linkage Map for Atlantic Salmon (Salmo salar) Reveals extended Chromosome Homeologies and Striking Differences in Sex-Specific Recombination Patterns

    Get PDF
    Background: The Atlantic salmon genome is in the process of returning to a diploid state after undergoing awhole genome duplication (WGD) event between 25 and100 million years ago. Existing data on the proportion ofparalogous sequence variants (PSVs), multisite variants (MSVs) and other types of complex sequence variationsuggest that the rediplodization phase is far from over. The aims of this study were to construct a high densitylinkage map for Atlantic salmon, to characterize the extent of rediploidization and to improve our understandingof genetic differences between sexes in this species.Results: A linkage map for Atlantic salmon comprising 29 chromosomes and 5650 single nucleotidepolymorphisms (SNPs) was constructed using genotyping data from 3297 fish belonging to 143 families. Of these,2696 SNPs were generated from ESTs or other gene associated sequences. Homeologous chromosomal regionswere identified through the mapping of duplicated SNPs and through the investigation of syntenic relationshipsbetween Atlantic salmon and the reference genome sequence of the threespine stickleback (Gasterosteusaculeatus). The sex-specific linkage maps spanned a total of 2402.3 cM in females and 1746.2 cM in males,highlighting a difference in sex specific recombination rate (1.38:1) which is much lower than previously reportedin Atlantic salmon. The sexes, however, displayed striking differences in the distribution of recombination siteswithin linkage groups, with males showing recombination strongly localized to telomeres.Conclusion: The map presented here represents a valuable resource for addressing important questions of interestto evolution (the process of re-diploidization), aquaculture and salmonid life history biology and not least as aresource to aid the assembly of the forthcoming Atlantic salmon reference genome sequence

    The best of two worlds

    No full text

    Genotype calling and mapping of multisite variants using an Atlantic salmon iSelect SNP array

    No full text
    Motivation: Due to a genome duplication event in the recent history of salmonids, modern Atlantic salmon (Salmo salar) have a mosaic genome with roughly one-third being tetraploid. This is a complicating factor in genotyping and genetic mapping since polymorphisms within duplicated regions (multisite variants; MSVs) are challenging to call and to assign to the correct paralogue. Standard genotyping software offered by Illumina has not been written to interpret MSVs and will either fail or miscall these polymorphisms. For the purpose of mapping, linkage or association studies in non-diploid species, there is a pressing need for software that includes analysis of MSVs in addition to regular single nucleotide polymorphism (SNP) markers. Results: A software package is presented for the analysis of partially tetraploid genomes genotyped using Illumina Infinium BeadArrays (Illumina Inc.) that includes pre-processing, clustering, plotting and validation routines. More than 3000 salmon from an aquacultural strain in Norway, distributed among 266 full-sib families, were genotyped on a 15K BeadArray including both SNP- and MSVmarkers. A total of 4268 SNPs and 1471 MSVs were identified, with average call accuracies of 0.97 and 0.86, respectively. A total of 150 MSVs polymorphic in both paralogs were dissected and mapped to their respective chromosomes, yielding insights about the salmon genome reversion to diploidy and improving marker genome coverage. Several retained homologies were found and are reported

    Feasibility of In-Line Raman Spectroscopy for Quality Assessment in Food Industry: How Fast Can We Go?

    Get PDF
    Raman spectroscopy is a viable tool within process analytical technologies due to recent technological advances. In this article, we evaluate the feasibility of Raman spectroscopy for in-line applications in the food industry by estimating the concentration of the fatty acids EPA + DHA in ground salmon samples (n = 63) and residual bone concentration in samples of mechanically recovered ground chicken (n = 66). The samples were measured under industry like conditions: They moved on a conveyor belt through a dark cabinet where they were scanned with a wide area illumination standoff Raman probe. Such a setup should be able to handle relevant industrial conveyor belt speeds, and it was studied how different speeds (i.e., exposure times) influenced the signal-to-noise ratio (SNR) of the Raman spectra as well as the corresponding model performance. For all samples we applied speeds that resulted in 1 s, 2 s, 4 s, and 10 s exposure times. Samples were scanned in both heterogenous and homogenous state. The slowest speed (10 s exposure) yielded prediction errors (RMSECV) of 0.41%EPA + DHA and 0.59% ash for the salmon and chicken data sets, respectively. The more in-line relevant exposure time of 1 s resulted in increased RMSECV values, 0.84% EPA + DHA and 0.84% ash, respectively. The increase in prediction error correlated closely with the decrease in SNR. Further improvements of model performance were possible through different noise reduction strategies. Model performance for homogenous and heterogenous samples was similar, suggesting that the presented Raman scanning approach has the potential to work well also on intact heterogenous foods. The estimation errors obtained at these high speeds are likely acceptable for industrial use, but successful strategies to increase SNR will be key for widespread in-line use in the food industry.Feasibility of In-Line Raman Spectroscopy for Quality Assessment in Food Industry: How Fast Can We Go?publishedVersio

    Feasibility of In-Line Raman Spectroscopy for Quality Assessment in Food Industry: How Fast Can We Go?

    Get PDF
    Raman spectroscopy is a viable tool within process analytical technologies due to recent technological advances. In this article, we evaluate the feasibility of Raman spectroscopy for in-line applications in the food industry by estimating the concentration of the fatty acids EPA + DHA in ground salmon samples (n = 63) and residual bone concentration in samples of mechanically recovered ground chicken (n = 66). The samples were measured under industry like conditions: They moved on a conveyor belt through a dark cabinet where they were scanned with a wide area illumination standoff Raman probe. Such a setup should be able to handle relevant industrial conveyor belt speeds, and it was studied how different speeds (i.e., exposure times) influenced the signal-to-noise ratio (SNR) of the Raman spectra as well as the corresponding model performance. For all samples we applied speeds that resulted in 1 s, 2 s, 4 s, and 10 s exposure times. Samples were scanned in both heterogenous and homogenous state. The slowest speed (10 s exposure) yielded prediction errors (RMSECV) of 0.41%EPA + DHA and 0.59% ash for the salmon and chicken data sets, respectively. The more in-line relevant exposure time of 1 s resulted in increased RMSECV values, 0.84% EPA + DHA and 0.84% ash, respectively. The increase in prediction error correlated closely with the decrease in SNR. Further improvements of model performance were possible through different noise reduction strategies. Model performance for homogenous and heterogenous samples was similar, suggesting that the presented Raman scanning approach has the potential to work well also on intact heterogenous foods. The estimation errors obtained at these high speeds are likely acceptable for industrial use, but successful strategies to increase SNR will be key for widespread in-line use in the food industry

    A dense SNP-based linkage map for Atlantic salmon (<it>Salmo salar</it>) reveals extended chromosome homeologies and striking differences in sex-specific recombination patterns

    No full text
    Abstract Background The Atlantic salmon genome is in the process of returning to a diploid state after undergoing a whole genome duplication (WGD) event between 25 and100 million years ago. Existing data on the proportion of paralogous sequence variants (PSVs), multisite variants (MSVs) and other types of complex sequence variation suggest that the rediplodization phase is far from over. The aims of this study were to construct a high density linkage map for Atlantic salmon, to characterize the extent of rediploidization and to improve our understanding of genetic differences between sexes in this species. Results A linkage map for Atlantic salmon comprising 29 chromosomes and 5650 single nucleotide polymorphisms (SNPs) was constructed using genotyping data from 3297 fish belonging to 143 families. Of these, 2696 SNPs were generated from ESTs or other gene associated sequences. Homeologous chromosomal regions were identified through the mapping of duplicated SNPs and through the investigation of syntenic relationships between Atlantic salmon and the reference genome sequence of the threespine stickleback (Gasterosteus aculeatus). The sex-specific linkage maps spanned a total of 2402.3 cM in females and 1746.2 cM in males, highlighting a difference in sex specific recombination rate (1.38:1) which is much lower than previously reported in Atlantic salmon. The sexes, however, displayed striking differences in the distribution of recombination sites within linkage groups, with males showing recombination strongly localized to telomeres. Conclusion The map presented here represents a valuable resource for addressing important questions of interest to evolution (the process of re-diploidization), aquaculture and salmonid life history biology and not least as a resource to aid the assembly of the forthcoming Atlantic salmon reference genome sequence.</p
    corecore