939 research outputs found
Statistical Methods for Two Problems in Cancer Research: Analysis of RNA-seq Data from Archival Samples and Characterization of Onset of Multiple Primary Cancers
My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research.
The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity of coverage, the variance and correlation of overall gene expression, patterns of measuring coding sequence expression, phenotypic patterns of gene expression, and measurements from representative multi-gene signatures. Our results showed that the principle determinant of variance from these protocols was use of exon capture probes, followed by the conditions of preservation (FF versus FFPE), then phenotypic differences between breast cancers. We also successfully identified one protocol, with RNase H-based ribosomal RNA (rRNA) depletion, exhibited least variability of gene expression measurements, strongest correlation between FF and FFPE samples, and was generally representative of the transcriptome.
In the second topic, we focused on TP53 penetrance estimation for multiple primary cancers (MPC). The study was motivated by the high proportion of MPC patients observed in Li-Fraumeni syndrome (LFS) families, but no MPC risk estimates so far have been provided for a better clinical management of LFS. To this end, we proposed a Bayesian recurrent event model based on a non-homogeneous Poisson process in order to estimate a set of penetrance for MPC related to LFS. Toward the associated inference, we employed the familywise likelihood that allows for utilizing genetic information inherited through the family. The ascertainment bias, which is inevitable in rare disease studies, was also properly adjusted by inverse probability weighting scheme. We applied the proposed method to the LFS data, a family cohort collected through pediatric sarcoma patients at MD Anderson Cancer Center from 1944 to 1982. Both internal and external validation studies show that the proposed model provides reliable penetrance estimates for MPC in LFS, which, to the best of our knowledge, have never been reported in the LFS literatures yet.
The research I conducted during my PhD study will be useful to translational scientists who want to obtain accurate gene expression by applying RNA-seq technology to FFPE tumor tissue samples. This research will also be helpful to genetic counselors or genetic epidemiologists who need high-resolution penetrance estimates for primary cancer risk assessment
PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires the agent to follow language
instructions to navigate through 3D environments. One main challenge in VLN is
the limited availability of photorealistic training environments, which makes
it hard to generalize to new and unseen environments. To address this problem,
we propose PanoGen, a generation method that can potentially create an infinite
number of diverse panoramic environments conditioned on text. Specifically, we
collect room descriptions by captioning the room images in existing
Matterport3D environments, and leverage a state-of-the-art text-to-image
diffusion model to generate the new panoramic environments. We use recursive
outpainting over the generated images to create consistent 360-degree panorama
views. Our new panoramic environments share similar semantic information with
the original environments by conditioning on text descriptions, which ensures
the co-occurrence of objects in the panorama follows human intuition, and
creates enough diversity in room appearance and layout with image outpainting.
Lastly, we explore two ways of utilizing PanoGen in VLN pre-training and
fine-tuning. We generate instructions for paths in our PanoGen environments
with a speaker built on a pre-trained vision-and-language model for VLN
pre-training, and augment the visual observation with our panoramic
environments during agents' fine-tuning to avoid overfitting to seen
environments. Empirically, learning with our PanoGen environments achieves the
new state-of-the-art on the Room-to-Room, Room-for-Room, and CVDN datasets.
Pre-training with our PanoGen speaker data is especially effective for CVDN,
which has under-specified instructions and needs commonsense knowledge. Lastly,
we show that the agent can benefit from training with more generated panoramic
environments, suggesting promising results for scaling up the PanoGen
environments.Comment: Project Webpage: https://pano-gen.github.io
Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
Vision-and-Language Navigation (VLN) is the task that requires an agent to
navigate through the environment based on natural language instructions. At
each step, the agent takes the next action by selecting from a set of navigable
locations. In this paper, we aim to take one step further and explore whether
the agent can benefit from generating the potential future view during
navigation. Intuitively, humans will have an expectation of how the future
environment will look like, based on the natural language instructions and
surrounding views, which will aid correct navigation. Hence, to equip the agent
with this ability to generate the semantics of future navigation views, we
first propose three proxy tasks during the agent's in-domain pre-training:
Masked Panorama Modeling (MPM), Masked Trajectory Modeling (MTM), and Action
Prediction with Image Generation (APIG). These three objectives teach the model
to predict missing views in a panorama (MPM), predict missing steps in the full
trajectory (MTM), and generate the next view based on the full instruction and
navigation history (APIG), respectively. We then fine-tune the agent on the VLN
task with an auxiliary loss that minimizes the difference between the view
semantics generated by the agent and the ground truth view semantics of the
next step. Empirically, our VLN-SIG achieves the new state-of-the-art on both
the Room-to-Room dataset and the CVDN dataset. We further show that our agent
learns to fill in missing patches in future views qualitatively, which brings
more interpretability over agents' predicted actions. Lastly, we demonstrate
that learning to predict future view semantics also enables the agent to have
better performance on longer paths.Comment: CVPR 2023 (Project webpage: https://jialuli-luka.github.io/VLN-SIG
Complex Image Generation SwinTransformer Network for Audio Denoising
Achieving high-performance audio denoising is still a challenging task in
real-world applications. Existing time-frequency methods often ignore the
quality of generated frequency domain images. This paper converts the audio
denoising problem into an image generation task. We first develop a complex
image generation SwinTransformer network to capture more information from the
complex Fourier domain. We then impose structure similarity and detailed loss
functions to generate high-quality images and develop an SDR loss to minimize
the difference between denoised and clean audios. Extensive experiments on two
benchmark datasets demonstrate that our proposed model is better than
state-of-the-art methods
Research on the Marketing Strategy of Online Education -- Taking New Oriental as an Example
In recent years, with the development of society and the progress of science and technology, online learning has penetrated into people's daily life, and people's demand for high-quality curriculum products is more and more strong. From a macro perspective, the continuous growth of national financial investment in education, the continuous upgrading of China's consumption structure, the development of 5G technology and the popularization of AI intelligence make online teaching less limited. The online education industry is showing an explosive growth trend. More and more online education institutions are listed for financing, and the market value is soaring. However, in 2019, except for GSX, the latest online learning platforms such as New Oriental, Speak English Fluently and Sunlands, have been in a state of loss. Most of these agencies seize the market by increasing advertising investment, but at the same time, they also bring huge marketing costs, which affect the financial performance of the company. With the enhancement of Matthew effect, large-scale educational institutions occupy a large market through free classes and low-price classes, while small and medium-sized institutions with weak capital strength are often unable to afford high sales costs, facing the risk of capital chain rupture. Taking new Oriental online as an example, this paper analyzes the problems existing in the marketing strategies of online education institutions. It also puts forward suggestions on four aspects, which are target market, differentiated value, marketing mix and marketing mode, so as to make sure that online education institutions can control marketing expenses and achieve profits by improving course quality, expanding marketing channels and implementing precise positioning
DCHT: Deep Complex Hybrid Transformer for Speech Enhancement
Most of the current deep learning-based approaches for speech enhancement
only operate in the spectrogram or waveform domain. Although a cross-domain
transformer combining waveform- and spectrogram-domain inputs has been
proposed, its performance can be further improved. In this paper, we present a
novel deep complex hybrid transformer that integrates both spectrogram and
waveform domains approaches to improve the performance of speech enhancement.
The proposed model consists of two parts: a complex Swin-Unet in the
spectrogram domain and a dual-path transformer network (DPTnet) in the waveform
domain. We first construct a complex Swin-Unet network in the spectrogram
domain and perform speech enhancement in the complex audio spectrum. We then
introduce improved DPT by adding memory-compressed attention. Our model is
capable of learning multi-domain features to reduce existing noise on different
domains in a complementary way. The experimental results on the
BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our
method can achieve better performance compared to state-of-the-art methods.Comment: IEEE DDP conferenc
Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that often
emerges in early childhood. ASD assessment typically involves an observation
protocol including note-taking and ratings of child's social behavior conducted
by a trained clinician. A robust machine learning (ML) model that is capable of
labeling adult and child audio has the potential to save significant time and
labor in manual coding children's behaviors. This may assist clinicians capture
events of interest, better communicate events with parents, and educate new
clinicians. In this study, we leverage the self-supervised learning model,
Wav2Vec 2.0 (W2V2), pretrained on 4300h of home recordings of children under 5
years old, to build a unified system that performs both speaker diarization
(SD) and vocalization classification (VC) tasks. We apply this system to
two-channel audio recordings of brief 3-5 minute clinician-child interactions
using the Rapid-ABC corpus. We propose a novel technique by introducing
auxiliary features extracted from W2V2-based automatic speech recognition (ASR)
system for children under 4 years old to improve children's VC task. We test
our proposed method of improving children's VC task on two corpora (Rapid-ABC
and BabbleCor) and observe consistent improvements. Furthermore, we reach, or
perhaps outperform, the state-of-the-art performance of BabbleCor.Comment: Submitted to ICASSP 202
- …