23 research outputs found

    Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage

    Get PDF
    The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primer

    Modeling and optimization of algae growth

    Get PDF
    The wastewater from greenhouses has a high amount of mineral contamination\ud and an environmentally-friendly method of removal is to use algae\ud to clean this runoff water. The algae consume the minerals as part of their\ud growth process. In addition to cleaning the water, the created algal bio-mass\ud has a variety of applications including production of bio-diesel, animal feed,\ud products for pharmaceutical and cosmetic purposes, or it can even be used as\ud a source of heating or electricity .\ud The aim of this paper is to develop a model of algae production and use\ud this model to investigate how best to optimize algae farms to satisfy the dual\ud goals of maximizing growth and removing mineral contaminants.\ud With this aim in mind the paper is split into five main sections. In the\ud first a review of the biological literature is undertaken with the aim of determining\ud what factors effect the growth of algae. The second section contains\ud a review of exciting mathematical models from the literature, and for\ud each model a steady-state analysis is performed. Moreover, for each model\ud the strengths and weaknesses are discussed in detail. In the third section,a new two-stage model for algae production is proposed, careful estimation\ud of parameters is undertaken and numerical solutions are presented. In the\ud next section, a new one-dimensional spatial-temporal model is presented,\ud numerically solved and optimization strategies are discussed. Finally, these\ud elements are brought together and recommendations of how to continue are\ud drawn

    Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage

    Get PDF
    The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primers

    Identification of a stable, non-canonically regulated nrf2 form in lung cancer cells

    No full text
    Nrf2 (nuclear factor erythroid 2 (NF-E2)-related factor 2) transcription factor is recognized for its pro-survival and cell protective role upon exposure to oxidative, chemical, or metabolic stresses. Nrf2 controls a number of cellular processes such as proliferation, differentiation, apoptosis, autophagy, lipid synthesis, and metabolism and glucose metabolism and is a target of activation in chronic diseases like diabetes, neurodegenerative, and inflammatory diseases. The dark side of Nrf2 is revealed when its regulation is imbalanced (e.g., via oncogene activation or mutations) and under such conditions constitutively active Nrf2 promotes cancerogenesis, metastasis, and radio-and chemoresistance. When there is no stress, Nrf2 is instantly degraded via Keap1-Cullin 3 (Cul3) pathway but despite this, cells exhibit a basal activation of Nrf2 target genes. It is yet not clear how Nrf2 maintains the expression of its targets under homeostatic conditions. Here, we found a stable 105 kDa Nrf2 form that is resistant to Keap1-Cul3-mediated degradation and translocates to the nucleus of lung cancer cells. RNA-Seq analysis indicate that it might originate from the exon 2 or exon 3-truncated transcripts. This stable 105 kDa Nrf2 form might help explain the constitutive activity of Nrf2 under normal cellular conditions

    Genes sharing the protein family domain decrease the performance of classification with RNA-seq genomic signatures

    No full text
    Background The experience with running various types of classification on the CAMDA neuroblastoma dataset have led us to the conclusion that the results are not always obvious and may differ depending on type of analysis and selection of genes used for classification. This paper aims in pointing out several factors that may influence the downstream machine learning analysis. In particular those factors are: type of the primary analysis, type of the classifier and increased correlation between the genes sharing a protein domain. They influence the analysis directly, but also interplay between them may be important. We have compiled the gene-domain database and used it for analysis to see the differences between the genes that share a domain versus the rest of the genes in the datasets. Results The major findings are: pairs of genes that share a domain have an increased Spearman’s correlation coefficients of counts; genes sharing a domain are expected to have a lower predictive power due to increased correlation. For most of the cases it can be seen with the higher number of misclassified samples; classifiers performance may vary depending on a method, still in most cases using genes sharing a domain in the training set results in a higher misclassification rate; increased correlation in genes sharing a domain results most often in worse performance of the classifiers regardless of the primary analysis tools used, even if the primary analysis alignment yield varies. Conclusions The effect of sharing a domain is likely more a results of real biological co-expression than just sequence similarity and artifacts of mapping and counting. Still, this is more difficult to conclude and needs further research. The effect is interesting itself, but we also point out some practical aspects in which it may influence the RNA sequencing analysis and RNA biomarker use. In particular it means that a gene signature biomarker set build out of RNA-sequencing results should be depleted for genes sharing common domains. It may cause to perform better when applying classification

    Ambiguous genes due to aligners and their impact on RNA-seq data analysis

    No full text
    The main scope of the study is ambiguous genes, i.e. genes whose expression is difficult to estimate from the data produced by next-generation sequencing technologies. We focused on the RNA sequencing (RNA-Seq) type of experiment performed on the Illumina platform. It is crucial to identify such genes and understand the cause of their difficulty, as these genes may be involved in some diseases. By giving misleading results, they could contribute to a misunderstanding of the cause of certain diseases, which could lead to inappropriate treatment. We thought that the ambiguous genes would be difficult to map because of their complex structure. So we looked at RNA-seq analysis using different mappers to find genes that would have different measurements from the aligners. We were able to identify such genes using a generalized linear model with two factors: mappers and groups introduced by the experiment. A large proportion of ambiguous genes are pseudogenes. High sequence similarity of pseudogenes to functional genes may indicate problems in alignment procedures. In addition, predictive analysis verified the performance of difficult genes in classification. The effectiveness of classifying samples into specific groups was compared, including the expression of difficult and not difficult genes as covariates. In almost all cases considered, ambiguous genes have less predictive power.ISSN:2045-232

    Biomass Price Prediction Based on the Example of Poland

    No full text
    The aim of the study was to test the applicability of forecasting in the analysis of the variability of prices and supply of wood in Poland. It relies on the autoregressive integrated model (ARIMA) that takes into account the level of cyclic, seasonal, and irregular fluctuations and the long-term trend as tools for the assessment of the predictions of the prices of selected medium-sized wood assortments. Elements of the time series were determined taking into account the cyclical character of the quarterly distribution. The data included quarterly information about the supply (amount) and prices (value) of wood sold by state forests in the years 2018–2022. The analysis was conducted for the most popular assortments: logging slash (M2, M2ZE), firewood S4, and medium-sized wood S2AP. In the period studied (years 2018–2022), the average rate of price variation was widely scattered. The average rate of price variation for the M2ZE assortment amounted to 7%. The average rate for M2 assortment was 1%, while the medium-sized S2AP assortment displayed the greatest variation of 99%. This means that between 2018 and the present, the price increased by nearly 100%. No major fluctuations were observed for the S4 assortment and its average rate of variation amounted to 0%. The analysis found seasonal variation was observed only for S4 firewood, the price of which went up each year in October, November, and December. For this reason, the forecast was made with the seasonal autoregressive integrated moving average (SARIMA) version of the model. It is difficult to forecast the price of wood due to variations in the market and the impact of global factors related to fluctuations in supply

    Biomass Price Prediction Based on the Example of Poland

    No full text
    The aim of the study was to test the applicability of forecasting in the analysis of the variability of prices and supply of wood in Poland. It relies on the autoregressive integrated model (ARIMA) that takes into account the level of cyclic, seasonal, and irregular fluctuations and the long-term trend as tools for the assessment of the predictions of the prices of selected medium-sized wood assortments. Elements of the time series were determined taking into account the cyclical character of the quarterly distribution. The data included quarterly information about the supply (amount) and prices (value) of wood sold by state forests in the years 2018–2022. The analysis was conducted for the most popular assortments: logging slash (M2, M2ZE), firewood S4, and medium-sized wood S2AP. In the period studied (years 2018–2022), the average rate of price variation was widely scattered. The average rate of price variation for the M2ZE assortment amounted to 7%. The average rate for M2 assortment was 1%, while the medium-sized S2AP assortment displayed the greatest variation of 99%. This means that between 2018 and the present, the price increased by nearly 100%. No major fluctuations were observed for the S4 assortment and its average rate of variation amounted to 0%. The analysis found seasonal variation was observed only for S4 firewood, the price of which went up each year in October, November, and December. For this reason, the forecast was made with the seasonal autoregressive integrated moving average (SARIMA) version of the model. It is difficult to forecast the price of wood due to variations in the market and the impact of global factors related to fluctuations in supply
    corecore