155 research outputs found

    Statistical Methods For Genomic And Transcriptomic Sequencing

    Get PDF
    Part 1: High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but CNV profiling from whole-exome sequencing (WES) is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for WES data. CODEX includes a Poisson latent factor model, which includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based segmentation procedure that explicitly models the count-based WES data. CODEX is compared to existing methods on germline CNV detection in HapMap samples using microarray-based gold standard and is further evaluated on 222 neuroblastoma samples with matched normal, with focus on somatic CNVs within the ATRX gene. Part 2: Cancer is a disease driven by evolutionary selection on somatic genetic and epigenetic alterations. We propose Canopy, a method for inferring the evolutionary phylogeny of a tumor using both somatic copy number alterations and single nucleotide alterations from one or more samples derived from a single patient. Canopy is applied to bulk sequencing datasets of both longitudinal and spatial experimental designs and to a transplantable metastasis model derived from human cancer cell line MDA-MB-231. Canopy successfully identifies cell populations and infers phylogenies that are in concordance with existing knowledge and ground truth. Through simulations, we explore the effects of key parameters on deconvolution accuracy, and compare against existing methods. Part 3: Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing (scRNA-seq) allows the comparison of expression distribution between the two alleles of a diploid organism and thus the characterization of allele-specific bursting. We propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters, and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that, globally, cis control in gene expression overwhelmingly manifests as differences in burst frequency

    High-performance InAs-based interband cascade lasers

    Get PDF
    Currently, there are only two types of mid-infrared lasers that are capable of continuous-wave (CW) operation above room temperature: quantum cascade (QC) lasers and interband cascade (IC) lasers. Both of them share the cascade feature for carrier recycling. The most successful QC lasers, based on the inter-subband transition and the well-established InGaAs/InAlAs/InP material system, are able to deliver several watts of optical power. In contrast, IC lasers, based on the interband transition and the unique InAs/GaSb/AlSb type-II broken-bandgap material system, have the threshold power density more than an order of magnitude lower than that of QC lasers (e.g., 0.3 kW/cm2 vs. 11 kW/cm2). As a result, IC lasers become a better solution for low-power applications in the mid-infrared region. GaSb-based IC lasers have achieved the best performance around 3.7 μm with a threshold current density as low as 100 A/cm2 at 300 K. However, their waveguide cladding layers, consisting of thick InAs/AlSb superlattice, have a low thermal conductivity and are challenging to grow by molecular beam epitaxy. These problems become more severe at longer lasing wavelengths due to the requirement of thicker cladding layers. InAs-based IC lasers, utilizing highly doped InAs as the optical cladding layer, have been developed to address these issues. The goal of this dissertation is to use modeling and experiments to explore several aspects of InAs-based IC lasers, including far-field patterns, high-temperature operation, long-wavelength operation, wide-tunability, and single frequency mode operation. The beam quality is critical for the laser application. The higher-order spatial modes naturally appear when the laser ridge is wider than the lasing wavelength in the medium. For InAs-based IC lasers with a thin top cladding layer, the top contact configuration can have a major influence on the spatial modes, which are observed in the measurement of far-field patterns. The physical origin is identified by waveguide modeling based on an effective index method. Radical design innovations, including “shortened injector” and “carrier rebalancing,” have significantly improved the performance of both GaSb-based and InAs-based IC lasers. Furthermore, a hybrid waveguide, consisting of an inner cladding layer with InAs/AlSb superlattice and an outer cladding layer with highly doped InAs, has significantly increased the modal gain of InAs-based IC lasers. As a result, CW operations above room temperature have been achieved at wavelengths of 4.6~4.8 μm. The threshold current density, 247 A/cm2 at 300 K in pulsed mode, is the lowest ever reported among the mid-infrared semiconductor lasers at similar wavelengths. The pulsed operating temperature is as high as 377 K. Long-wavelength operations are vigorously explored. With the hybrid waveguide mentioned above, the lasing temperature reaches 324 K at a wavelength of 6.4 μm. Further design improvement and optimization are presented. In addition, the lasing wavelength is extended to 11.2 μm at 130 K. Several things are found to hinder the progress. The waveguide loss is dramatically increased, mainly because the lasing wavelength approaches the plasmon wavelength of the heavily-doped InAs. The negative differential resistance is observed and may be related to the unexpected high threshold. A wide tuning range is highly desirable for many applications such as spectroscopy and biochemical analysis. A repeatable, large electrical tunable range of 180 cm-1 (or 900 nm in wavelength near λ~7 μm) is achieved by a novel active region consisting of three InAs quantum wells. This challenges the conventional idea that the carrier density pinning at the threshold level would not allow a significant tuning by Stark effect. The gain analysis, based on the calculation of the field dependent wavefunction overlap, well explains the physical mechanism. This strategy is very useful for the design of tunable lasers. For sensitive detection of important gas molecules such as carbonyl sulfide/COS (4.5 μm), single-mode distributed feedback IC lasers are highly desirable for tunable laser absorption spectroscopy. A grating is patterned using interference lithography to etch through the thin top cladding into the top spacer layer of the IC laser structure. Single-mode emission with a side mode suppression ratio of 30 dB is obtained in continuous wave operation at temperatures up to 180 K near 4.5 μm. A total tuning range of 16 nm is achieved for a single device, with a temperature-tuning rate of 0.4 nm/K and a current-tuning rate of 0.016 nm/mA. The impact of the grating on device performance is evaluated and discussed in comparison with Fabry–Perot lasers

    Scaling Research Support for Early-Stage Researchers with Crowdsourcing

    Full text link
    Support from peers and experts, such as feedback on research artefacts, is an important component of developing research skills. The support is especially helpful for early-stage researchers (ESRs), typically PhD students at the critical stage of learning research skills. Currently, such support mainly comes from a small circle of advisors and colleagues. Gaining access to quality and diverse support outside a research group is challenging for most ESRs. This thesis presents several studies to advance the fundamental and practical understanding of designing systems to scale support for research skills development for ESRs. First, we conduct a systematic literature review on crowdsourcing for education that summarizes existing efforts in the research and application domain. This study also highlights the need for studies on crowdsourcing support for research skills development. Then, based on findings from the first study, we conducted another systematic literature review study on crowdsourcing support for project-based learning and research skills development. The third study explores the qualitative empirical understanding of how ESRs leverage current socio-technical affordances for distributed support in their research activities. This study reveals opportunities afforded by socio-technical systems and challenges faced by ESRs when seeking and adopting support from online research communities. The fourth study explores quantitative empirical understandings of the most desired types of feedback from external researchers that need to be prioritized to offer, and the challenges that need to be prioritized to solve. Building on the findings from the four studies above, we proposed a theoretical framework -- Researchersourcing -- that guides the understanding and designing of socio-technical systems that scale the support for research skills development. Accordingly, in the fifth study, we design and evaluate a crowdsourcing pipeline and a system to scale feedback on research drafts and ease the burdens of reviewing research drafts

    Aggregating Dependent Signals with Heavy-Tailed Combination Tests

    Full text link
    Combining dependent p-values to evaluate the global null hypothesis presents a longstanding challenge in statistical inference, particularly when aggregating results from diverse methods to boost signal detection. P-value combination tests using heavy-tailed distribution based transformations, such as the Cauchy combination test and the harmonic mean p-value, have recently garnered significant interest for their potential to efficiently handle arbitrary p-value dependencies. Despite their growing popularity in practical applications, there is a gap in comprehensive theoretical and empirical evaluations of these methods. This paper conducts an extensive investigation, revealing that, theoretically, while these combination tests are asymptotically valid for pairwise quasi-asymptotically independent test statistics, such as bivariate normal variables, they are also asymptotically equivalent to the Bonferroni test under the same conditions. However, extensive simulations unveil their practical utility, especially in scenarios where stringent type-I error control is not necessary and signals are dense. Both the heaviness of the distribution and its support substantially impact the tests' non-asymptotic validity and power, and we recommend using a truncated Cauchy distribution in practice. Moreover, we show that under the violation of quasi-asymptotic independence among test statistics, these tests remain valid and, in fact, can be considerably less conservative than the Bonferroni test. We also present two case studies in genetics and genomics, showcasing the potential of the combination tests to significantly enhance statistical power while effectively controlling type-I errors

    An Application Ontology for Reproducibility of Machine Learning Solutions

    Get PDF
    With Artificial Intelligence and Machine Learning (ML) on the rise, organisations of different scales and nature are looking to utilise ML systems to support their day-to-day operations. Many enterprises find it difficult to adapt existing ML solutions to their organisations without huge investments in solution understanding, customisation, infrastructure enablement and workforce training. Some organisations utilise external service providers to provision their standard analytics services, and this often leads to solutions that either do not fit well with their organisation goals or may lead to the loss of expert knowledge behind the establishment of the AI system. This paper aims to address some of these challenges by proposing an ontology for ensuring the reproducibility of ML models in research as well as their integration within application environments. Our work will ensure that the knowledge about a developed ML system or process is accumulated and recorded within an organisation and can be used in the future, either by new employees or other teams within the organisation. This approach can also be utilised by researchers and developers of ML systems to record and publish metadata of their studies, ensuring that future researchers can reuse their work with minimal effort

    CODEX: A Normalization and Copy Number Variation Detection Method for Whole Exome Sequencing

    Get PDF
    High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures

    MULTI-VESSELS COLLISION AVOIDANCE STRATEGY FOR AUTONOMOUS SURFACE VEHICLES BASED ON GENETIC ALGORITHM IN CONGESTED PORT ENVIRONMENT

    Get PDF
    An improved genetic collision avoidance algorithm is proposed in this study to address the problem that Autonomous Surface Vehicles (ASV) need to comply with the collision avoidance rules at sea in congested sea areas. Firstly, a collision risk index model for ASV safe encounters is established taking into account the international rules for collision avoidance. The ASV collision risk index and the distance of safe encounters are taken as boundary values of the correlation membership function of the collision risk index model to calculate the optimal heading of ASV in real-time. Secondly, the genetic coding, fitness function, and basic parameters of the genetic algorithm are designed to construct the collision avoidance decision system. Finally, the simulation of collision avoidance between ASV and several obstacle vessels is performed, including the simulation of three collision avoidance states head-on situation, crossing situation, and overtaking situation. The results show that the proposed intelligent genetic algorithm considering the rules of collision avoidance at sea can effectively avoid multiple other vessels in different situations

    Differential damage and repair of DNA-adducts induced by anti-cancer drug cisplatin across mouse organs

    Get PDF
    The platinum-based drug cisplatin is a widely used first-line therapy for several cancers. Cisplatin interacts with DNA mainly in the form of Pt-d(GpG) di-adduct, which stalls cell proliferation and activates DNA damage response. Although cisplatin shows a broad spectrum of anticancer activity, its utility is limited due to acquired drug resistance and toxicity to non-targeted tissues. Here, by integrating genome-wide high-throughput Damage-seq, XR-seq, and RNA-seq approaches, along with publicly available epigenomic data, we systematically study the genome-wide profiles of cisplatin damage formation and excision repair in mouse kidney, liver, lung and spleen. We find different DNA damage and repair spectra across mouse organs, which are associated with tissue-specific transcriptomic and epigenomic profiles. The framework and the multi-omics data we present here constitute an unbiased foundation for understanding the mechanisms of cellular response to cisplatin. Our approach should be applicable for studying drug resistance and for tailoring cancer chemotherapy regimens

    CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing

    Get PDF
    Abstract High-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, as a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs

    Altered Functional Connectivity Density in Subtypes of Parkinson’s Disease

    Get PDF
    Parkinson’s disease (PD) can be classified into tremor-dominant and akinetic-rigid subtypes, each of which exhibits a unique clinical course and prognosis. The neural basis for these disparate manifestations is not well-understood, however. This study comprehensively investigated the altered functional connectivity patterns of these two subtypes. Twenty-five tremor-dominant patients, 25 akinetic-rigid patients and 26 normal control subjects participated in this study. Resting-state functional MRI data were analyzed using functional connectivity density (FCD) and seed-based functional connectivity approaches. Correlations between neuroimaging measures and clinical variables were also calculated. Compared with normal control, increased global FCD occurred most extensively in frontal lobe and cerebellum in both subtypes. Compared with akinetic-rigid patients, the tremor-dominant patients showed significantly increased global FCD in the cerebellum and decreased global FCD in portions of the bilateral frontal lobe. Furthermore, different subtypes demonstrated different cerebello-cortical functional connectivity patterns. Moreover, the identified FCD and functional connectivity correlated significantly with clinical variables in the PD patients, and particularly the FCD indices distinguished the different subtypes with high sensitivity (95%) and specificity (80%). These findings indicate that the functional connectivity patterns in the cerebellum and frontal lobe are altered in both subtypes of PD, especially cerebellum are highly related to tremor
    corecore