76 research outputs found

    Deep Phenotyping of Non-Alcoholic Fatty Liver Disease Patients with Genetic Factors for Insights into the Complex Disease

    Full text link
    Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for precision medicine. The genomic and phenotypic data (3,408 cases and 4,739 controls) for this study were gathered from participants in Mayo Clinic Tapestry Study (IRB#19-000001) and their electric health records, including their demographic, clinical, and comorbidity data, and the genotype information through whole exome sequencing performed at Helix using the Exome+®^\circledR Assay according to standard procedure (www..helix..com). Factors highly relevant to NAFLD were determined by the chi-square test and stepwise backward-forward regression model. Latent class analysis (LCA) was performed on NAFLD cases using significant indicator variables to identify subgroups. The optimal clustering revealed 5 latent subgroups from 2,013 NAFLD patients (mean age 60.6 years and 62.1% women), while a polygenic risk score based on 6 single-nucleotide polymorphism (SNP) variants and disease outcomes were used to analyze the subgroups. The groups are characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors. Odds ratios were utilized to compare the risk of complex diseases, such as fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure between the clusters. Cluster 2 has a significantly higher complex disease outcome compared to other clusters. Keywords: Fatty liver disease; Polygenic risk score; Precision medicine; Deep phenotyping; NAFLD comorbidities; Latent class analysis.Comment: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 11 page

    Identification of the Hemogenic Endothelial Progenitor and Its Direct Precursor in Human Pluripotent Stem Cell Differentiation Cultures

    Get PDF
    SummaryHemogenic endothelium (HE) has been recognized as a source of hematopoietic stem cells (HSCs) in the embryo. Access to human HE progenitors (HEPs) is essential for enabling the investigation of the molecular determinants of HSC specification. Here, we show that HEPs capable of generating definitive hematopoietic cells can be obtained from human pluripotent stem cells (hPSCs) and identified precisely by a VE-cadherin+CD73−CD235a/CD43− phenotype. This phenotype discriminates true HEPs from VE-cadherin+CD73+ non-HEPs and VE-cadherin+CD235a+CD41a− early hematopoietic cells with endothelial and FGF2-dependent hematopoietic colony-forming potential. We found that HEPs arise at the post-primitive-streak stage of differentiation directly from VE-cadherin-negative KDRbrightAPLNR+PDGFRαlow/− hematovascular mesodermal precursors (HVMPs). In contrast, hemangioblasts, which are capable of forming endothelium and primitive blood cells, originate from more immature APLNR+PDGFRα+ mesoderm. The demarcation of HEPs and HVMPs provides a platform for modeling blood development from endothelium with a goal of facilitating the generation of HSCs from hPSCs

    A study of the relationships between oligonucleotide properties and hybridization signal intensities from NimbleGen microarray datasets

    Get PDF
    Well-defined relationships between oligonucleotide properties and hybridization signal intensities (HSI) can aid chip design, data normalization and true biological knowledge discovery. We clarify these relationships using the data from two microarray experiments containing over three million probes from 48 high-density chips. We find that melting temperature (Tm) has the most significant effect on HSI while length for the long oligonucleotides studied has very little effect. Analysis of positional effect using a linear model provides evidence that the protruding ends of probes contribute more than tethered ends to HSI, which is further validated by specifically designed match fragment sliding and extension experiments. The impact of sequence similarity (SeqS) on HSI is not significant in comparison with other oligonucleotide properties. Using regression and regression tree analysis, we prioritize these oligonucleotide properties based on their effects on HSI. The implications of our discoveries for the design of unbiased oligonucleotides are discussed. We propose that isothermal probes designed by varying the length is a viable strategy to reduce sequence bias, though imposing selection constraints on other oligonucleotide properties is also essential

    Genome-wide analyses as part of the international FTLD-TDP whole-genome sequencing consortium reveals novel disease risk factors and increases support for immune dysfunction in FTLD

    Get PDF
    Frontotemporal lobar degeneration with neuronal inclusions of the TAR DNA-binding protein 43 (FTLD-TDP) represents the most common pathological subtype of FTLD. We established the international FTLD-TDP whole genome sequencing consortium to thoroughly characterize the known genetic causes of FTLD-TDP and identify novel genetic risk factors. Through the study of 1,131 unrelated Caucasian patients, we estimated that C9orf72 repeat expansions and GRN loss-of-function mutations account for 25.5% and 13.9% of FTLD-TDP patients, respectively. Mutations in TBK1 (1.5%) and other known FTLD genes (1.4%) were rare, and the disease in 57.7% of FTLD-TDP patients was unexplained by the known FTLD genes. To unravel the contribution of common genetic factors to the FTLD-TDP etiology in these patients, we conducted a two-stage association study comprising the analysis of whole-genome sequencing data from 517 FTLD-TDP patients and 838 controls, followed by targeted genotyping of the most associated genomic loci in 119 additional FTLD-TDP patients and 1653 controls. We identified three genome-wide significant FTLD-TDP risk loci: one new locus at chromosome 7q36 within the DPP6 gene led by rs118113626 (pvalue=4.82e-08, OR=2.12), and two known loci: UNC13A, led by rs1297319 (pvalue=1.27e-08, OR=1.50) and HLA-DQA2 led by rs17219281 (pvalue=3.22e-08, OR=1.98). While HLA represents a locus previously implicated in clinical FTLD and related neurodegenerative disorders, the association signal in our study is independent from previously reported associations. Through inspection of our whole genome sequence data for genes with an excess of rare loss-of-function variants in FTLD-TDP patients (n≥3) as compared to controls (n=0), we further discovered a possible role for genes functioning within the TBK1-related immune pathway (e.g. DHX58, TRIM21, IRF7) in the genetic etiology of FTLD-TDP. Together, our study based on the largest cohort of unrelated FTLD-TDP patients assembled to date provides a comprehensive view of the genetic landscape of FTLD-TDP, nominates novel FTLD-TDP risk loci, and strongly implicates the immune pathway in FTLD-TDP pathogenesis

    Identification Of Genetic Variation In Highly Divergent Regions Using Whole Exome Sequencing

    No full text
    University of Minnesota Ph.D. dissertation. December 2016. Major: Biomedical Informatics and Computational Biology. Advisors: Susan Slager, Claudia Neuhauser. 1 computer file (PDF); iv, 170 pages.Whole exome sequencing is widely used for identifying disease-associated variants in both clinic and research settings. Using this technology to accurately identify genetic variants is essential, yet major challenges remain in highly divergent but medically important genomic regions. We developed an analytical workflow enabling sensitive and accurate variant discovery for highly divergent genomic regions from whole exome sequencing data. Our workflow combines both mapping- and de novo assembly-based approaches, for which the tools were selected and optimized through extensive evaluation of their performance across different coverage depths and divergence levels, the two key factors profoundly impacting variant detection. We used simulated exome reads for an initial assessment and then public exome data from a well-studied CEPH individual NA12878 for more focused evaluations. Our analysis revealed that the 25 combinations between five mappers and five callers had comparable performance in the non-HLA regions as expected, which have approximately 0.1-0.4% divergence. However, they differed markedly in the HLA region in which different haplotypes can show up to 10-15% divergence. We also evaluated the effect of post-alignment processing and provide a practical guideline regarding the application of local realignment and base quality score recalibration in designing analytical workflows. We transferred our findings into a highly sensitive and computationally efficient workflow for mapping-based variant discovery. It excels in both sensitivity and speed through our two-tier mapping strategy, not only in regions of high divergence but also in lowly divergent regions. To utilize the local phasing information and identify transmitted variants, we also developed a de novo assembly-based variant calling workflow for whole exome data. It performs well over a wide range of coverage depths and divergence levels. In fact, for SNP detection from the HLA region, it is far more superior to all other existing methods based on both simulated and multiple benchmarked exome datasets. Finally, we incorporated the mapping- and de novo assembly-based approaches into a single pipeline, providing the flexibility of variant detection through executing either or both methods. Our pipeline should be particularly useful for WES projects focusing on diseases that are associated with HLA or other highly divergent regions

    Performance analysis of generalized block diagonal structured random matrices in compressive sensing

    No full text
    In compressive sensing practice, the choice of compression matrix reflects the important tradeoffs between the reconstruction performance and the implementation cost. Motivated by practical signal processing applications, this paper advocates a family of generalized block diagonal (GBD) structured random matrices for the implementation simplicity and reduced memory requirements. The restricted isometry property of such structured matrices is established to reveal the minimum number of measurements required for the perfect reconstruction of a sparse signal with high probability. The reconstruction performance of GBD random matrices is compared with that of conventional dense random matrices via both the theoretical derivation and the empirical simulations. For moderate-size sparse signals, the GBD random matrices are shown to enjoy several nice structural benefits in practical implementations, at minimal extra cost in terms of the number of required measurements. © 2012 IEEE
    corecore