939 research outputs found

    Statistical Methods for Two Problems in Cancer Research: Analysis of RNA-seq Data from Archival Samples and Characterization of Onset of Multiple Primary Cancers

    Get PDF
    My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research. The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity of coverage, the variance and correlation of overall gene expression, patterns of measuring coding sequence expression, phenotypic patterns of gene expression, and measurements from representative multi-gene signatures. Our results showed that the principle determinant of variance from these protocols was use of exon capture probes, followed by the conditions of preservation (FF versus FFPE), then phenotypic differences between breast cancers. We also successfully identified one protocol, with RNase H-based ribosomal RNA (rRNA) depletion, exhibited least variability of gene expression measurements, strongest correlation between FF and FFPE samples, and was generally representative of the transcriptome. In the second topic, we focused on TP53 penetrance estimation for multiple primary cancers (MPC). The study was motivated by the high proportion of MPC patients observed in Li-Fraumeni syndrome (LFS) families, but no MPC risk estimates so far have been provided for a better clinical management of LFS. To this end, we proposed a Bayesian recurrent event model based on a non-homogeneous Poisson process in order to estimate a set of penetrance for MPC related to LFS. Toward the associated inference, we employed the familywise likelihood that allows for utilizing genetic information inherited through the family. The ascertainment bias, which is inevitable in rare disease studies, was also properly adjusted by inverse probability weighting scheme. We applied the proposed method to the LFS data, a family cohort collected through pediatric sarcoma patients at MD Anderson Cancer Center from 1944 to 1982. Both internal and external validation studies show that the proposed model provides reliable penetrance estimates for MPC in LFS, which, to the best of our knowledge, have never been reported in the LFS literatures yet. The research I conducted during my PhD study will be useful to translational scientists who want to obtain accurate gene expression by applying RNA-seq technology to FFPE tumor tissue samples. This research will also be helpful to genetic counselors or genetic epidemiologists who need high-resolution penetrance estimates for primary cancer risk assessment

    PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

    Full text link
    Vision-and-Language Navigation (VLN) requires the agent to follow language instructions to navigate through 3D environments. One main challenge in VLN is the limited availability of photorealistic training environments, which makes it hard to generalize to new and unseen environments. To address this problem, we propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text. Specifically, we collect room descriptions by captioning the room images in existing Matterport3D environments, and leverage a state-of-the-art text-to-image diffusion model to generate the new panoramic environments. We use recursive outpainting over the generated images to create consistent 360-degree panorama views. Our new panoramic environments share similar semantic information with the original environments by conditioning on text descriptions, which ensures the co-occurrence of objects in the panorama follows human intuition, and creates enough diversity in room appearance and layout with image outpainting. Lastly, we explore two ways of utilizing PanoGen in VLN pre-training and fine-tuning. We generate instructions for paths in our PanoGen environments with a speaker built on a pre-trained vision-and-language model for VLN pre-training, and augment the visual observation with our panoramic environments during agents' fine-tuning to avoid overfitting to seen environments. Empirically, learning with our PanoGen environments achieves the new state-of-the-art on the Room-to-Room, Room-for-Room, and CVDN datasets. Pre-training with our PanoGen speaker data is especially effective for CVDN, which has under-specified instructions and needs commonsense knowledge. Lastly, we show that the agent can benefit from training with more generated panoramic environments, suggesting promising results for scaling up the PanoGen environments.Comment: Project Webpage: https://pano-gen.github.io

    Improving Vision-and-Language Navigation by Generating Future-View Image Semantics

    Full text link
    Vision-and-Language Navigation (VLN) is the task that requires an agent to navigate through the environment based on natural language instructions. At each step, the agent takes the next action by selecting from a set of navigable locations. In this paper, we aim to take one step further and explore whether the agent can benefit from generating the potential future view during navigation. Intuitively, humans will have an expectation of how the future environment will look like, based on the natural language instructions and surrounding views, which will aid correct navigation. Hence, to equip the agent with this ability to generate the semantics of future navigation views, we first propose three proxy tasks during the agent's in-domain pre-training: Masked Panorama Modeling (MPM), Masked Trajectory Modeling (MTM), and Action Prediction with Image Generation (APIG). These three objectives teach the model to predict missing views in a panorama (MPM), predict missing steps in the full trajectory (MTM), and generate the next view based on the full instruction and navigation history (APIG), respectively. We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step. Empirically, our VLN-SIG achieves the new state-of-the-art on both the Room-to-Room dataset and the CVDN dataset. We further show that our agent learns to fill in missing patches in future views qualitatively, which brings more interpretability over agents' predicted actions. Lastly, we demonstrate that learning to predict future view semantics also enables the agent to have better performance on longer paths.Comment: CVPR 2023 (Project webpage: https://jialuli-luka.github.io/VLN-SIG

    Complex Image Generation SwinTransformer Network for Audio Denoising

    Full text link
    Achieving high-performance audio denoising is still a challenging task in real-world applications. Existing time-frequency methods often ignore the quality of generated frequency domain images. This paper converts the audio denoising problem into an image generation task. We first develop a complex image generation SwinTransformer network to capture more information from the complex Fourier domain. We then impose structure similarity and detailed loss functions to generate high-quality images and develop an SDR loss to minimize the difference between denoised and clean audios. Extensive experiments on two benchmark datasets demonstrate that our proposed model is better than state-of-the-art methods

    Research on the Marketing Strategy of Online Education -- Taking New Oriental as an Example

    Get PDF
    In recent years, with the development of society and the progress of science and technology, online learning has penetrated into people's daily life, and people's demand for high-quality curriculum products is more and more strong. From a macro perspective, the continuous growth of national financial investment in education, the continuous upgrading of China's consumption structure, the development of 5G technology and the popularization of AI intelligence make online teaching less limited. The online education industry is showing an explosive growth trend. More and more online education institutions are listed for financing, and the market value is soaring. However, in 2019, except for GSX, the latest online learning platforms such as New Oriental, Speak English Fluently and Sunlands, have been in a state of loss. Most of these agencies seize the market by increasing advertising investment, but at the same time, they also bring huge marketing costs, which affect the financial performance of the company. With the enhancement of Matthew effect, large-scale educational institutions occupy a large market through free classes and low-price classes, while small and medium-sized institutions with weak capital strength are often unable to afford high sales costs, facing the risk of capital chain rupture. Taking new Oriental online as an example, this paper analyzes the problems existing in the marketing strategies of online education institutions. It also puts forward suggestions on four aspects, which are target market, differentiated value, marketing mix and marketing mode, so as to make sure that online education institutions can control marketing expenses and achieve profits by improving course quality, expanding marketing channels and implementing precise positioning

    DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

    Full text link
    Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approaches to improve the performance of speech enhancement. The proposed model consists of two parts: a complex Swin-Unet in the spectrogram domain and a dual-path transformer network (DPTnet) in the waveform domain. We first construct a complex Swin-Unet network in the spectrogram domain and perform speech enhancement in the complex audio spectrum. We then introduce improved DPT by adding memory-compressed attention. Our model is capable of learning multi-domain features to reduce existing noise on different domains in a complementary way. The experimental results on the BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our method can achieve better performance compared to state-of-the-art methods.Comment: IEEE DDP conferenc

    Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

    Full text link
    Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that often emerges in early childhood. ASD assessment typically involves an observation protocol including note-taking and ratings of child's social behavior conducted by a trained clinician. A robust machine learning (ML) model that is capable of labeling adult and child audio has the potential to save significant time and labor in manual coding children's behaviors. This may assist clinicians capture events of interest, better communicate events with parents, and educate new clinicians. In this study, we leverage the self-supervised learning model, Wav2Vec 2.0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks. We apply this system to two-channel audio recordings of brief 3-5 minute clinician-child interactions using the Rapid-ABC corpus. We propose a novel technique by introducing auxiliary features extracted from W2V2-based automatic speech recognition (ASR) system for children under 4 years old to improve children's VC task. We test our proposed method of improving children's VC task on two corpora (Rapid-ABC and BabbleCor) and observe consistent improvements. Furthermore, we reach, or perhaps outperform, the state-of-the-art performance of BabbleCor.Comment: Submitted to ICASSP 202
    • …
    corecore