47 research outputs found
Storing Digital Information in Long-Read DNA
There is urgent need for effective and cost-efficient data storage, as the worldwide requirement for data storage is rapidly growing. DNA has introduced a new tool for storing digital information. Recent studies have successfully stored digital information, such as text and gif animation. Previous studies tackled technical hurdles due to errors from DNA synthesis and sequencing. Studies also have focused on a strategy that makes use of 100‒150-bp read sizes in both synthesis and sequencing. In this paper, we a suggest novel data encoding/decoding scheme that makes use of long-read DNA (~1,000 bp). This enables accurate recovery of stored digital information with a smaller number of reads than the previous approach. Also, this approach reduces sequencing time
Prediction of a time-to-event trait using genome wide SNP data
BACKGROUND: A popular objective of many high-throughput genome projects is to discover various genomic markers associated with traits and develop statistical models to predict traits of future patients based on marker values. RESULTS: In this paper, we present a prediction method for time-to-event traits using genome-wide single-nucleotide polymorphisms (SNPs). We also propose a MaxTest associating between a time-to-event trait and a SNP accounting for its possible genetic models. The proposed MaxTest can help screen out nonprognostic SNPs and identify genetic models of prognostic SNPs. The performance of the proposed method is evaluated through simulations. CONCLUSIONS: In conjunction with the MaxTest, the proposed method provides more parsimonious prediction models but includes more prognostic SNPs than some naive prediction methods. The proposed method is demonstrated with real GWAS data
Southern Hemisphere mid- and high-latitudinal AOD, CO, NO2, and HCHO: spatiotemporal patterns revealed by satellite observations
To assess air pollution emitted in Southern Hemisphere mid-latitudes and transported to Antarctica, we investigate the climatological mean and temporal trends in aerosol optical depth (AOD), carbon monoxide (CO), nitrogen dioxide (NO2), and formaldehyde (HCHO) columns using satellite observations. Generally, all these measurements exhibit sharp peaks over and near the three nearby inhabited continents: South America, Africa, and Australia. This pattern indicates the large emission effect of anthropogenic activities and biomass burning processes. High AOD is also found over the Southern Atlantic Ocean, probably because of the sea salt production driven by strong winds. Since the pristine Antarctic atmosphere can be polluted by transport of air pollutants from the mid-latitudes, we analyze the 10-day back trajectories that arrive at Antarctic ground stations in consideration of the spatial distribution of mid-latitudinal AOD, CO, NO2, and HCHO. We find that the influence of mid-latitudinal emission differs across Antarctic regions: western Antarctic regions show relatively more back trajectories from the mid-latitudes, while the eastern Antarctic regions do not show large intrusions of mid-latitudinal air masses. Finally, we estimate the long-term trends in AOD, CO, NO2, and HCHO during the past decade (2005-2016). While CO shows a significant negative trend, the others show overall positive trends. Seasonal and regional differences in trends are also discussed
Critical Boundary Sine-Gordon Revisited
We revisit the exact solution of the two space-time dimensional quantum field
theory of a free massless boson with a periodic boundary interaction and
self-dual period. We analyze the model by using a mapping to free fermions with
a boundary mass term originally suggested in ref.[22]. We find that the entire
SL(2,C) family of boundary states of a single boson are boundary sine-Gordon
states and we derive a simple explicit expression for the boundary state in
fermion variables and as a function of sine-Gordon coupling constants. We use
this expression to compute the partition function. We observe that the solution
of the model has a strong-weak coupling generalization of T-duality. We then
examine a class of recently discovered conformal boundary states for compact
bosons with radii which are rational numbers times the self-dual radius. These
have simple expression in fermion variables. We postulate sine-Gordon-like
field theories with discrete gauge symmmetries for which they are the
appropriate boundary states.Comment: 33 pages, 1 figure, references added, typos correcte
Ovarian Cancer Prognostic Prediction Model Using RNA Sequencing Data
Ovarian cancer is one of the leading causes of cancer-related deaths in gynecological malignancies. Over 70% of ovarian cancer cases are high-grade serous ovarian cancers and have high death rates due to their resistance to chemotherapy. Despite advances in surgical and pharmaceutical therapies, overall survival rates are not good, and making an accurate prediction of the prognosis is not easy because of the highly heterogeneous nature of ovarian cancer. To improve the patient’s prognosis through proper treatment, we present a prognostic prediction model by integrating high-dimensional RNA sequencing data with their clinical data through the following steps: gene filtration, pre-screening, gene marker selection, integrated study of selected gene markers and prediction model building. These steps of the prognostic prediction model can be applied to other types of cancer besides ovarian cancer
Recommended from our members
A procedure for the determination of a flow duration curve at an ungaged basin
The purpose of this study is to develop a method for predicting monthly flow duration curves for ungaged basins that are suitable for estimating average annual flow, and installed capacity and average annual energy generation at potential sites for hydropower development. The procedures were tested by developing monthly rainfall duration curves for five sample watersheds and then developing flow duration curves from the rainfall data. The methods were evaluated by comparing the predicted monthly flow duration curves to daily and monthly flow duration curves based on field data from the selected sites because a plant's potential energy output can be computed directly from a flow duration curve. The methods tested fit duration curves based on field data reasonably well and are suitable for preliminary evaluation of hydropower developments in ungaged basins
Pathway-Driven Discovery of Rare Mutational Impact on Cancer
Identifying driver mutation is important in understanding disease mechanism and future application of custom tailored therapeutic decision. Functional analysis of mutational impact usually focuses on the gene expression level of the mutated gene itself. However, complex regulatory network may cause differential gene expression among functional neighbors of the mutated gene. We suggest a new approach for discovering rare mutations that have real impact in the context of pathway; the philosophy of our method is iteratively combining rare mutations until no more mutations can be added under the condition that the combined mutational event can statistically discriminate pathway level mRNA expression between groups with and without mutational events. Breast cancer patients with somatic mutation and mRNA expression were analyzed by our approach. Our approach is shown to sensitively capture mutations that change pathway level mRNA expression, concurrently discovering important mutations previously reported in breast cancer such as TP53, PIK3CA, and RB1. In addition, out of 15,819 genes considered in breast cancer, our approach identified mutational events of 32 genes showing pathway level mRNA expression differences
Deep Learning for Integrated Analysis of Insulin Resistance with Multi-Omics Data
Technological advances in next-generation sequencing (NGS) have made it possible to uncover extensive and dynamic alterations in diverse molecular components and biological pathways across healthy and diseased conditions. Large amounts of multi-omics data originating from emerging NGS experiments require feature engineering, which is a crucial step in the process of predictive modeling. The underlying relationship among multi-omics features in terms of insulin resistance is not well understood. In this study, using the multi-omics data of type II diabetes from the Integrative Human Microbiome Project, from 10,783 features, we conducted a data analytic approach to elucidate the relationship between insulin resistance and multi-omics features, including microbiome data. To better explain the impact of microbiome features on insulin classification, we used a developed deep neural network interpretation algorithm for each microbiome feature’s contribution to the discriminative model output in the samples
Gene expression based prediction of prognostic outcome in ovarian cancer
Gene expression provides rich information. Successful application has made to predict prognosis of several cancers such as breast and colon. However, although ovarian cancer is the fifth leading death cancer to women, precise prediction of survival outcome is not available yet. Thus there is a still urgent need for optimized treatment decision. Recent studies made use of public gene expression data sources to predict the clinical outcome of ovarian cancer. Typically, two steps approach has tried. First step is figuring out significant genes by univariate Cox regression model. Second step is providing a statistic that will combine the effect of selected genes in terms of survival risk. One of drawback of the two steps approach is low reproducibility. Statistics for risk group classification built in the train set often fails to be validated when the statistic is applied to the data set. Applying the scheme to the RNAseq data from The Cancer Genome Atlas(TCGA) has shown that the classification results of the patient's prognosis was classified higher and lower risk patient of the patient's prognosis. We applied median standard to the classification of existing scheme and suggested other schemes for the successive work.N