545 research outputs found

    NOVEL COMPUTATIONAL METHODS FOR CANCER GENOMICS DATA ANALYSIS

    Get PDF
    Cancer is a genetic disease responsible for one in eight deaths worldwide. The advancement of next-generation sequencing (NGS) technology has revolutionized the cancer research, allowing comprehensively profiling the cancer genome at great resolution. Large-scale cancer genomics research has sparked the needs for efficient and accurate Bioinformatics methods to analyze the data. The research presented in this dissertation focuses on three areas in cancer genomics: cancer somatic mutation detection; cancer driver genes identification and transcriptome profiling on single-cell level. NGS data analysis involves a series of complicated data transformation that convert raw sequencing data to the information that is interpretable by cancer researchers. The first project in the dissertation established a robust, reproducible and scalable cancer genomics data analysis workflow management system that automates the best practice mutation calling pipelines to detect somatic single nucleotide polymorphisms, insertion, deletion and copy number variation from NGS data. It integrates mutation annotation, clinically actionable therapy prediction and data visualization that streamlines the sequence-to-report data transformation. In order to differentiate the driver mutations buried among a vast pool of passenger mutations from a somatic mutation calling project, we developed MEScan in the second project, a novel method that enables genome-scale driver mutations identification based on mutual exclusivity test using cancer somatic mutation data. MEScan implements an efficient statistical framework to de novo screen mutual exclusive patterns and in the meantime taking into account the patient-specific and gene-specific background mutation rate and adjusting the heterogenous mutation frequency. It outperforms several existing methods based on simulation studies and real-world datasets. Genome-wide screening using existing TCGA somatic mutation data discovers novel cancer-specific and pan-cancer mutually exclusive patterns. Bulk RNA sequencing (RNA-Seq) has become one of the most commonly used techniques for transcriptome profiling in a wide spectrum of biomedical and biological research. Analyzing bulk RNA-Seq reads to quantify expression at each gene locus is the first step towards the identification of differentially expressed genes for downstream biological interpretation. Recent advances in single-cell RNA-seq (scRNA-seq) technology allows cancer biologists to profile gene expression on higher resolution cellular level. Preprocessing scRNA-seq data to quantify UMI-based gene count is the key to characterize intra-tumor cellular heterogeneity and identify rare cells that governs tumor progression, metastasis and treatment resistance. Despite its popularity, summarizing gene count from raw sequencing reads remains the one of the most time-consuming steps with existing tools. Current pipelines do not balance the efficiency and accuracy in large-scale gene count summarization in both bulk and scRNA-seq experiments. In the third project, we developed a light-weight k-mer based gene counting algorithm, FastCount, to accurately and efficiently quantify gene-level abundance using bulk RNA-seq or UMI-based scRNA-seq data. It achieves at least an order-of-magnitude speed improvement over the current gold standard pipelines while providing competitive accuracy

    Novel polyelectrolyte hydrogel membrane for ethanol dehydration via pervaporation

    Get PDF
    Pervaporation is an important membrane technology for separating azeotropic, close-boiling point, and heat sensitive mixtures, due to its cost-effective, energy-saving, and environment benignity. Recently, we fabricated a new composite membrane with a strong polyelectrolyte (PE) hydrogel as the active layer on polyethersulfone ultrafiltration support membrane. The active layer was graft polymerized using polyvinyl sulfonic acid as the polymer monomer and N,N′-methylenbisacrylamide (MBAA) as a cross linker monomer by the UV-photo initiation method. Due to the high ion-exchange capacity of the composite membrane, the goal of the current study was to investigate the new composite membrane for ethanol dehydration by pervaporation. In this study, the effects of polymerization conditions on the pervaporation performance were investigated: monomer concentration (6.25-42% VSA), cross-linker concentration (1-10% MBAA), modification time (2.5-15 min), and molecular weight cut-off of the PES membranes (4 and 150KDa). The performances of the composite membrane for dehydration of ethanol were studied and the results were analyzed based on the membrane properties, namely ion exchange capacity and degree of grafting (thickness). It was found that the DG increased with the VSA concentration, MBAA fraction, and irradiation time (at constant irradiation intensity 100 mW/cm2). However, the IEC was lower as the monomer concentration decreased or when a higher MBAA fraction was used. The performance of membrane was tested at 10% (water: ethanol) feed solution and 50 °C. The ethanol-water selectivity always went through a maximum and decreased as the DG increased. This was attributed to low IEC as the monomer concentration reduced or as the MBAA fraction increased. The selectivity changed in a non-linear manner and was 10-300 while the total flux was very high 15-6 kg m-2 h-1. The selectivity of the new composite membrane is the highest among the PE composite membranes and with outstanding total permeability PV membrane for ethanol dehydration. Furthermore, compared with other composite polymeric membranes, the performance of the new membrane is high (Figure 1). Further studies will focus on incorporating nanoparticles (NPs) to the PE hydrogel to improve the membrane selectivity. Please click Additional Files below to see the full abstract

    The role of sustainable development goals, financial knowledge and investment strategies on the organizational profitability: Moderating impact of government support

    Get PDF
    Recently, sustainable development goals (SDG) and investment strategies and knowledge has become the foremost factors for the high organizational profitability and capture the focus of recent studies and policymakers. Therefore, the current study aims to examine the impact of SDG, investment strategies and financial knowledge on the organizational profitability of manufacturing firms in China. Furthermore, the study examines the role of government support in the interplay between investment plans, financial understanding, and the profitability of organisations. Survey questionnaires and smart-PLS were used to collect data and analyse reliability and correlations. The findings show that SDGs, investment strategies, and financial knowledge all play a substantial role in a company’s profitability.The results also revealed that government support moderates significantly among investment strategies, financial knowledge, and organizational profitability. This study guides the regulators while developing policies regarding SDG and investment strategies with respect to organizational profitability

    Quantifying the Influence of Component Failure Probability on Cascading Blackout Risk

    Get PDF
    The risk of cascading blackouts greatly relies on failure probabilities of individual components in power grids. To quantify how component failure probabilities (CFP) influences blackout risk (BR), this paper proposes a sample-induced semi-analytic approach to characterize the relationship between CFP and BR. To this end, we first give a generic component failure probability function (CoFPF) to describe CFP with varying parameters or forms. Then the exact relationship between BR and CoFPFs is built on the abstract Markov-sequence model of cascading outages. Leveraging a set of samples generated by blackout simulations, we further establish a sample-induced semi-analytic mapping between the unbiased estimation of BR and CoFPFs. Finally, we derive an efficient algorithm that can directly calculate the unbiased estimation of BR when the CoFPFs change. Since no additional simulations are required, the algorithm is computationally scalable and efficient. Numerical experiments well confirm the theory and the algorithm

    Nonabelian cohomology of compact Lie groups

    Full text link
    Given a Lie group GG with finitely many components and a compact Lie group A which acts on GG by automorphisms, we prove that there always exists an A-invariant maximal compact subgroup K of G, and that for every such K, the natural map H1(A,K)H1(A,G)H^1(A,K)\to H^1(A,G) is bijective. This generalizes a classical result of Serre [6] and a recent result in [1].Comment: 7 page

    Deep Span Representations for Named Entity Recognition

    Full text link
    Span-based models are one of the most straightforward methods for named entity recognition (NER). Existing span-based NER systems shallowly aggregate the token representations to span representations. However, this typically results in significant ineffectiveness for long-span entities, a coupling between the representations of overlapping spans, and ultimately a performance degradation. In this study, we propose DSpERT (Deep Span Encoder Representations from Transformers), which comprises a standard Transformer and a span Transformer. The latter uses low-layered span representations as queries, and aggregates the token representations as keys and values, layer by layer from bottom to top. Thus, DSpERT produces span representations of deep semantics. With weight initialization from pretrained language models, DSpERT achieves performance higher than or competitive with recent state-of-the-art systems on eight NER benchmarks. Experimental results verify the importance of the depth for span representations, and show that DSpERT performs particularly well on long-span entities and nested structures. Further, the deep span representations are well structured and easily separable in the feature space

    近年日本的日本汉诗研究

    Get PDF
    在日本的教育界,“汉文”被视为日本语言文化基本教养的一种。日本的“国语”教育课程中,汉诗文在经过“训读法”解读之后,自然而然的转化为日语文化。正是由于这一原因,以日本人创作的汉诗文为对象研究长期以来很难得到正确的定位,直到1980年代才开始逐渐展开。本论旨在整理日本人对于汉诗的研究历程,指明其发展动向,并为今后汉诗研究的展开提供一定的线索。本文为者在2016年度教育部人文社会科学重点研究基地重大项目《日本汉诗汇编与研究》(批准号:16JJD750021)阶段性成果之一

    Causal Effect Estimation in Sequencing Studies: A Bayesian Method to Account for Confounder Adjustment Uncertainty

    Get PDF
    Estimating the causal effect of a single nucleotide variant (SNV) on clinical phenotypes is of interest in many genetic studies. The effect estimation may be confounded by other SNVs as a result of linkage disequilibrium as well as demographic and clinical characteristics. Because a large number of these other variables, which we call potential confounders, are collected, it is challenging to select and adjust for the variables that truly confound the causal effect. The Bayesian adjustment for confounding (BAC) method has been proposed as a general method to estimate the average causal effect in the presence of a large number of potential confounders under the assumption of no unmeasured confounders. In this paper, we explore the application of BAC in genetic studies using Genetic Analysis Workshop 19 exome sequencing data. Our results show that BAC can efficiently estimate the causal effect of genetic variants with adjustment for confounding. Consequently, BAC may serve as a useful tool for genome-wide association studies data analysis to effectively assess the causal effect of genetic variants and the impact of potential interventions
    corecore