102 research outputs found

    Inferring Bacterial Infiltration in Primary Colorectal Tumors From Host Whole Genome Sequencing Data

    Get PDF
    Colorectal cancer is the third most common cancer worldwide with abysmal survival, thus requiring novel therapy strategies. Numerous studies have frequently observed infiltrating bacteria within the primary tumor tissues derived from patients. These studies have implicated the relative abundance of these bacteria as a contributing factor in tumor progression. Infiltrating bacteria are believed to be among the major drivers of tumorigenesis, progression, and metastasis and, hence, promising targets for new treatments. However, measuring their abundance directly remains challenging. One potential approach is to use the unmapped reads of host whole genome sequencing (hWGS) data, which previous studies have considered as contaminants and discarded. Here, we developed rigorous bioinformatics and statistical procedures to identify tumor-infiltrating bacteria associated with colorectal cancer from such whole genome sequencing data. Our approach used the reads of whole genome sequencing data of colon adenocarcinoma tissues not mapped to the human reference genome, including unmapped paired-end read pairs and single-end reads, the mates of which were mapped. We assembled the unmapped read pairs, remapped all those reads to the collection of human microbiome reference, and then computed their relative abundance of microbes by maximum likelihood (ML) estimation. We analyzed and compared the relative abundance and diversity of infiltrating bacteria between primary tumor tissues and associated normal blood samples. Our results showed that primary tumor tissues contained far more diverse total infiltrating bacteria than normal blood samples. The relative abundance of Bacteroides fragilis, Bacteroides dorei, and Fusobacterium nucleatum was significantly higher in primary colorectal tumors. These three bacteria were among the top ten microbes in the primary tumor tissues, yet were rarely found in normal blood samples. As a validation step, most of these bacteria were also closely associated with colorectal cancer in previous studies with alternative approaches. In summary, our approach provides a new analytic technique for investigating the infiltrating bacterial community within tumor tissues. Our novel cloud-based bioinformatics and statistical pipelines to analyze the infiltrating bacteria in colorectal tumors using the unmapped reads of whole genome sequences can be freely accessed from GitHub at https://github.com/gutmicrobes/UMIB.git

    Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model

    Get PDF
    Colorectal cancer (CRC) is the third most common cancer worldwide. Its incidence is still increasing, and the mortality rate is high. New therapeutic and prognostic strategies are urgently needed. It became increasingly recognized that the gut microbiota composition differs significantly between healthy people and CRC patients. Thus, identifying the difference between gut microbiota of the healthy people and CRC patients is fundamental to understand these microbes' functional roles in the development of CRC. We studied the microbial community structure of a CRC metagenomic dataset of 156 patients and healthy controls, and analyzed the diversity, differentially abundant bacteria, and co-occurrence networks. We applied a modified zero-inflated lognormal (ZIL) model for estimating the relative abundance. We found that the abundance of genera: Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella was significantly different between the healthy and CRC groups. We also found that bacteria such as Streptococcus, Parvimonas, Collinsella, and Citrobacter were uniquely co-occurring within the CRC patients. In addition, we found that the microbial diversity of healthy controls is significantly higher than that of the CRC patients, which indicated a significant negative correlation between gut microbiota diversity and the stage of CRC. Collectively, our results strengthened the view that individual microbes as well as the overall structure of gut microbiota were co-evolving with CRC

    Ksak: A high-throughput tool for alignment-free phylogenetics

    Get PDF
    Phylogenetic tools are fundamental to the studies of evolutionary relationships. In this paper, we present Ksak, a novel high-throughput tool for alignment-free phylogenetic analysis. Ksak computes the pairwise distance matrix between molecular sequences, using seven widely accepted k-mer based distance measures. Based on the distance matrix, Ksak constructs the phylogenetic tree with standard algorithms. When benchmarked with a golden standard 16S rRNA dataset, Ksak was found to be the most accurate tool among all five tools compared and was 19% more accurate than ClustalW2, a high-accuracy multiple sequence aligner. Above all, Ksak was tens to hundreds of times faster than ClustalW2, which helps eliminate the computation limit currently encountered in large-scale multiple sequence alignment. Ksak is freely available at https://github.com/labxscut/ksak

    Expression of CD82 in Human Trophoblast and Its Role in Trophoblast Invasion

    Get PDF
    BACKGROUND: Well-controlled trophoblast invasion at maternal-fetal interface is a critical event for the normal development of placenta. CD82 is a member of transmembrane 4 superfamily, which showed important role in inhibiting tumor cell invasion and migration. We surmised that CD82 are participates in trophoblast differentiation during placenta development. METHODOLOGY/PRINCIPAL FINDINGS: CD82 was found to be strongly expressed in human first trimester placental villous and extravillous trophoblast cells as well as in trophoblast cell lines. To investigate whether CD82 plays a role in trophoblast invasion and migration, we further utilized human villous explants culture model on matrigel and invasion/migration assay of trophoblast cell line HTR8/SVneo. CD82 siRNA significantly promoted outgrowth of villous explants in vitro (P<0.01), as well as invasion and migration of HTR8/SVneo cells (P<0.05), whereas the trophoblast proliferation was not affected. The enhanced effect of CD82 siRNA on invasion and migration of trophoblast cells was found associated with increased gelatinolytic activities of matrix metalloproteinase MMP9 while over-expression of CD82 markedly decreased trphoblast cell invasion and migration as well as MMP9 activities. CONCLUSIONS/SIGNIFICANCE: These findings suggest that CD82 is an important negative regulator at maternal-fetal interface during early pregnancy, inhibiting human trophoblast invasion and migration

    Design and baseline characteristics of the finerenone in reducing cardiovascular mortality and morbidity in diabetic kidney disease trial

    Get PDF
    Background: Among people with diabetes, those with kidney disease have exceptionally high rates of cardiovascular (CV) morbidity and mortality and progression of their underlying kidney disease. Finerenone is a novel, nonsteroidal, selective mineralocorticoid receptor antagonist that has shown to reduce albuminuria in type 2 diabetes (T2D) patients with chronic kidney disease (CKD) while revealing only a low risk of hyperkalemia. However, the effect of finerenone on CV and renal outcomes has not yet been investigated in long-term trials. Patients and Methods: The Finerenone in Reducing CV Mortality and Morbidity in Diabetic Kidney Disease (FIGARO-DKD) trial aims to assess the efficacy and safety of finerenone compared to placebo at reducing clinically important CV and renal outcomes in T2D patients with CKD. FIGARO-DKD is a randomized, double-blind, placebo-controlled, parallel-group, event-driven trial running in 47 countries with an expected duration of approximately 6 years. FIGARO-DKD randomized 7,437 patients with an estimated glomerular filtration rate >= 25 mL/min/1.73 m(2) and albuminuria (urinary albumin-to-creatinine ratio >= 30 to <= 5,000 mg/g). The study has at least 90% power to detect a 20% reduction in the risk of the primary outcome (overall two-sided significance level alpha = 0.05), the composite of time to first occurrence of CV death, nonfatal myocardial infarction, nonfatal stroke, or hospitalization for heart failure. Conclusions: FIGARO-DKD will determine whether an optimally treated cohort of T2D patients with CKD at high risk of CV and renal events will experience cardiorenal benefits with the addition of finerenone to their treatment regimen. Trial Registration: EudraCT number: 2015-000950-39; ClinicalTrials.gov identifier: NCT02545049

    CNN-XG: A Hybrid Framework for sgRNA On-Target Prediction

    No full text
    As the third generation gene editing technology, Crispr/Cas9 has a wide range of applications. The success of Crispr depends on the editing of the target gene via a functional complex of sgRNA and Cas9 proteins. Therefore, highly specific and high on-target cleavage efficiency sgRNA can make this process more accurate and efficient. Although there are already many sophisticated machine learning or deep learning models to predict the on-target cleavage efficiency of sgRNA, prediction accuracy remains to be improved. XGBoost is good at classification as the ensemble model could overcome the deficiency of a single classifier to classify, and we would like to improve the prediction efficiency for sgRNA on-target activity by introducing XGBoost into the model. We present a novel machine learning framework which combines a convolutional neural network (CNN) and XGBoost to predict sgRNA on-target knockout efficacy. Our framework, called CNN-XG, is mainly composed of two parts: a feature extractor CNN is used to automatically extract features from sequences and predictor XGBoost is applied to predict features extracted after convolution. Experiments on commonly used datasets show that CNN-XG performed significantly better than other existing frameworks in the predicted classification mode

    Cell Heterogeneity Analysis in Single-Cell RNA-seq Data Using Mixture Exponential Graph and Markov Random Field Model

    No full text
    Advanced single-cell profiling technologies promote exploration of cell heterogeneity, and clustering of single-cell RNA (scRNA-seq) data enables discovery of coexpression genes and network relationships between genes. In particular, single-cell profiling of circulating tumor cells (CTCs) can provide unique insights into tumor heterogeneity (including in triple-negative breast cancer (TNBC)), while scRNA-seq leads to better understanding of subclonal architecture and biological function. Despite numerous reports suggesting a direct correlation between circulating tumor cells (CTCs) and poor clinical outcomes, few studies have provided a thorough heterogeneity characterization of CTCs. In addition, TNBC is a disease with not only intertumor but also intratumor heterogeneity and represents various biological distinct subgroups that may have relationships with immune functions that are not clearly established yet. In this article, we introduce a new scheme for detecting genotypic characterization of single-cell heterogeneities and apply it to CTC and TNBC single-cell RNA-seq data. First, we use an existing mixture exponential family graph model to partition the cell-cell network; then, with the Markov random field model, we obtain more flexible network rewiring. Finally, we find the cell heterogeneity and network relationships according to different high coexpression gene modules in different cell subsets. Our results demonstrate that this scheme provides a reasonable and effective way to model different cell clusters and different biological enrichment gene clusters. Thus, using different internal coexpression genes of different cell clusters, we can infer the differences in tumor composition and diversity

    Predicting CRISPR/Cas9 Repair Outcomes by Attention-Based Deep Learning Framework

    No full text
    As a simple and programmable nuclease-based genome editing tool, the CRISPR/Cas9 system has been widely used in target-gene repair and gene-expression regulation. The DNA mutation generated by CRISPR/Cas9-mediated double-strand breaks determines its biological and phenotypic effects. Experiments have demonstrated that CRISPR/Cas9-generated cellular-repair outcomes depend on local sequence features. Therefore, the repair outcomes after DNA break can be predicted by sequences near the cleavage sites. However, existing prediction methods rely on manually constructed features or insufficiently detailed prediction labels. They cannot satisfy clinical-level-prediction accuracy, which limit the performance of these models to existing knowledge about CRISPR/Cas9 editing. We predict 557 repair labels of DNA, covering the vast majority of Cas9-generated mutational outcomes, and build a deep learning model called Apindel, to predict CRISPR/Cas9 editing outcomes. Apindel, automatically, trains the sequence features of DNA with the GloVe model, introduces location information through Positional Encoding (PE), and embeds the trained-word vector matrixes into a deep learning model, containing BiLSTM and the Attention mechanism. Apindel has better performance and more detailed prediction categories than the most advanced DNA-mutation-predicting models. It, also, reveals that nucleotides at different positions relative to the cleavage sites have different influences on CRISPR/Cas9 editing outcomes

    A Sparse and Low-Rank Regression Model for Identifying the Relationships Between DNA Methylation and Gene Expression Levels in Gastric Cancer and the Prediction of Prognosis

    No full text
    DNA methylation is an important regulator of gene expression that can influence tumor heterogeneity and shows weak and varying expression levels among different genes. Gastric cancer (GC) is a highly heterogeneous cancer of the digestive system with a high mortality rate worldwide. The heterogeneous subtypes of GC lead to different prognoses. In this study, we explored the relationships between DNA methylation and gene expression levels by introducing a sparse low-rank regression model based on a GC dataset with 375 tumor samples and 32 normal samples from The Cancer Genome Atlas database. Differences in the DNA methylation levels and sites were found to be associated with differences in the expressed genes related to GC development. Overall, 29 methylation-driven genes were found to be related to the GC subtypes, and in the prognostic model, we explored five prognoses related to the methylation sites. Finally, based on a low-rank matrix, seven subgroups were identified with different methylation statuses. These specific classifications based on DNA methylation levels may help to account for heterogeneity and aid in personalized treatments
    • …
    corecore