10 research outputs found

    Aging Shapes the Population-Mean and -Dispersion of Gene Expression in Human Brains

    Get PDF
    Human aging is associated with cognitive decline and an increased risk of neurodegenerative disease. Our objective for this study was to evaluate potential relationships between age and variation in gene expression across different regions of the brain. We analyzed the Genotype-Tissue Expression (GTEx) data from 54 to 101 tissue samples across 13 brain regions in post-mortem donors of European descent aged between 20 and 70 years at death. After accounting for the effects of covariates and hidden confounding factors, we identified 1446 protein-coding genes whose expression in one or more brain regions is correlated with chronological age at a false discovery rate of 5%. These genes are involved in various biological processes including apoptosis, mRNA splicing, amino acid biosynthesis, and neurotransmitter transport. The distribution of these genes among brain regions is uneven, suggesting variable regional responses to aging. We also found that the aging response of many genes, e.g., TP37 and C1QA, depends on individuals' genotypic backgrounds. Finally, using dispersion-specific analysis, we identified genes such as IL7R, MS4A4E, and TERF1/TERF2 whose expressions are differentially dispersed by aging, i.e., variances differ between age groups. Our results demonstrate that age-related gene expression is brain region-specific, genotype-dependent, and associated with both mean and dispersion changes. Our findings provide a foundation for more sophisticated gene expression modeling in the studies of age-related neurodegenerative diseases

    Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana

    Get PDF
    Background:Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs.Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants. Results:Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations. Conclusions:The identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants.Funding supports were in part from US National Science Foundation (No. 1541737 to QQL), the Hundred Talent Plans of Fujian Province and Xiamen City (to QQL). This project was also funded by the National Natural Science Foundation of China (Nos. 61201358 and 61174161), the Natural Science Foundation of Fujian Province of China (No. 2012J01154), and the specialized Research Fund for the Doctoral Program of Higher Education of China (Nos. 20120121120038 and 20130121130004), and the Fundamental Research Funds for the Central Universities in China (Xiamen University: Nos. 2013121025, 201412G009, and 2014X0234)

    Aging shapes the population-mean and -dispersion of gene expression in human brains

    No full text
    Human aging is associated with cognitive decline and an increased risk of neurodegenerative disease. Our objective for this study was to evaluate potential relationships between age and variation in gene expression across different regions of the brain. We analyzed the Genotype-Tissue Expression (GTEx) data from 54 and 101 tissue samples across 13 brain regions in post-mortem donors of European descent aged between 20 and 70 years at death. After accounting for the effects of covariates and hidden confounding factors, we identified 1,446 protein-coding genes whose expression in one or more brain regions is correlated with chronological age at a false discovery rate of 5%. These genes are involved in various biological processes including apoptosis, mRNA splicing, amino acid biosynthesis, and neurotransmitter transport. The distribution of these genes among brain regions is uneven, suggesting variable regional responses to aging. We also found that the aging response of many genes, e.g., TP37 and C1QA, depends on individuals’ genotypic backgrounds. Finally, using dispersion-specific analysis, we identified genes such as IL7R, MS4A4E, and TERF1/TERF2 whose expressions are differentially dispersed by aging, i.e., variances differ between age groups. Our results demonstrate that age-related gene expression is brain region-specific, genotype-dependent, and associated with both mean and dispersion changes. Our findings provide a foundation for more sophisticated gene expression modeling in the studies of age-related neurodegenerative diseases

    SRGS: sparse partial least squares-based recursive gene selection for gene regulatory network inference

    No full text
    Abstract Background The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data. Results We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies. Conclusions It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS

    Integration of Multi-Feature Fusion and PLS-DA in Protein Secondary Structure Prediction

    No full text
    Protein structure prediction has become one of the central problems in the field of modern computational biology. Protein secondary structure prediction is the basis of the spatial structure prediction of proteins. This paper presents a novel method for protein secondary structure prediction, which integrates multi-feature fusion and partial least square discriminant analysis (PLS-DA). Multi-feature fusion can make full use of the available information of proteins; however, it also leads to high-dimensional and redundant features. Then PLS-DA is utilized to deal with the fused protein data, which can effectively extract features from the protein data and remove the redundant information. Several benchmark datasets are used to verify the performance of the proposed method. The experiment results show that the proposed method gives satisfying prediction results of protein secondary structure compared with existing methods. Therefore the integration of multi-feature fusion and PLS-DA can fully utilize the available protein information, effectively reduce dimension and achieve robust classification in the multi-category analysis of protein secondary structure

    Integration of Multi-Feature Fusion and PLS-DA in Protein Secondary Structure Prediction

    Get PDF
    Protein structure prediction has become one of the central problems in the field of modern computational biology. Protein secondary structure prediction is the basis of the spatial structure prediction of proteins. This paper presents a novel method for protein secondary structure prediction, which integrates multi-feature fusion and partial least square discriminant analysis (PLS-DA). Multi-feature fusion can make full use of the available information of proteins; however, it also leads to high-dimensional and redundant features. Then PLS-DA is utilized to deal with the fused protein data, which can effectively extract features from the protein data and remove the redundant information. Several benchmark datasets are used to verify the performance of the proposed method. The experiment results show that the proposed method gives satisfying prediction results of protein secondary structure compared with existing methods. Therefore the integration of multi-feature fusion and PLS-DA can fully utilize the available protein information, effectively reduce dimension and achieve robust classification in the multi-category analysis of protein secondary structure

    Extensive evaluation on the performance and behaviour of TCP congestion control protocols under varied network scenarios

    No full text
    In recent decades, many TCP Congestion Control (CC) protocols have been proposed to improve the performance and reliability of TCP in various network scenarios. However, CC protocols are usually closely coupled with network conditions such as latency and packet loss. Considering that networks with different properties are common, e.g., wired/wireless LAN and Long Fat Networks (LFNs), investigating both performance and behaviors of CC protocols under varied network scenarios becomes crucial for both network management and development. In this paper, we conduct a comprehensive measurement study on the goodput, RTT, retransmission, friendliness, fairness, convergence time and stability of most widely-used CC protocols over wired LAN/WAN and wireless LAN (both 2.4GHz and 5GHz Wi-Fi). We also conduct comparative studies with respect to transmission cost, congested reverse path and bottleneck queue size in network simulator. Based on our analysis, we reveal several interesting and original observations. We found that the goodput of BBR is at least 22.5% lower than other CC protocols in wireless LAN due to insufficient pacing rate, even though it can always fully utilize the bottleneck bandwidth with low RTT in wired networks. We also observed that the total on-wire data volume of BBR is higher than CUBIC (e.g., 2.37% higher when RTT = 100ms and loss rate = 0.01%). In addition, BBR can fully utilize the bottleneck bandwidth in most queue sizes (≥ 20packets). Surprisingly, we noticed that as the default CC protocol in most modern operating systems, CUBIC is too aggressive and unfriendly in both LAN and wireless LAN, greatly suppressing the goodput of other competing CC protocols. More specifically for CUBIC in wireless LAN, it generates 129% more retransmissions than other CC protocols. Nevertheless, we have also seen that, in scenario with heavily-congested reverse path, CUBIC can provide full utilization on bottleneck bandwidth. Lastly, we also observed that BBR converges very quickly in all evaluated scenarios, while other CC protocols present varied results, e.g., Westwood+ and Veno converge faster in 5GHz Wi-Fi networks than 2.4GHz networks
    corecore