177 research outputs found
MIBEN: Robust Multiple Imputation with the Bayesian Elastic Net
Correctly specifying the imputation model when conducting multiple imputation remains one of the most significant challenges in missing data analysis. This dissertation introduces a robust multiple imputation technique, Multiple Imputation with the Bayesian Elastic Net (MIBEN), as a remedy for this difficulty. A Monte Carlo simulation study was conducted to assess the performance of the MIBEN technique and compare it to several state-of-the-art multiple imputation methods
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level: A Monte Carlo Simulation to Assess the Tenability of the SuperMatrix Approach
A Monte Carlo Simulation Study was conducted to assess the tenability of a novel treatment of missing data. Through aggregating multiply-imputed data sets prior to model estimation, the proposed technique allows researchers to reap the benefits of a principled missing data tool (i.e., multiple imputation), while maintaining the simplicity of complete case analysis. In terms of the accuracy of model fit indices derived from confirmatory factor analyses, the proposed technique was found to perform universally better than a naive ad hoc technique consisting of averaging the multiple estimates of model fit derived from a traditionally conceived implementation of multiple imputation. However, the proposed technique performed considerably worse in this task than did full information maximum likelihood (FIML) estimation. Absolute fit indices and residual based fit indices derived from the proposed technique demonstrated an unacceptable degree of bias in assessing direct model fit, but incremental fit indices led to acceptable conclusions regarding model fit. Chi-squared difference values derived from the proposed technique were unbiased across all study conditions (except for those with very poor parameterizations) and were consistently more accurate than such values derived from the ad hoc comparison condition. It was also found that Chi-squared difference values derived from FIML-based models were negatively biased to an unacceptable degree in any conditions with greater than 10% missing. Implications, limitations and future directions of the current work are discussed
US Cosmic Visions: New Ideas in Dark Matter 2017: Community Report
This white paper summarizes the workshop "U.S. Cosmic Visions: New Ideas in
Dark Matter" held at University of Maryland on March 23-25, 2017.Comment: 102 pages + reference
The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment
The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in
operation since July 2014. This paper describes the second data release from
this phase, and the fourteenth from SDSS overall (making this, Data Release
Fourteen or DR14). This release makes public data taken by SDSS-IV in its first
two years of operation (July 2014-2016). Like all previous SDSS releases, DR14
is cumulative, including the most recent reductions and calibrations of all
data taken by SDSS since the first phase began operations in 2000. New in DR14
is the first public release of data from the extended Baryon Oscillation
Spectroscopic Survey (eBOSS); the first data from the second phase of the
Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2),
including stellar parameter estimates from an innovative data driven machine
learning algorithm known as "The Cannon"; and almost twice as many data cubes
from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous
release (N = 2812 in total). This paper describes the location and format of
the publicly available data from SDSS-IV surveys. We provide references to the
important technical papers describing how these data have been taken (both
targeting and observation details) and processed for scientific use. The SDSS
website (www.sdss.org) has been updated for this release, and provides links to
data downloads, as well as tutorials and examples of data use. SDSS-IV is
planning to continue to collect astronomical data until 2020, and will be
followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14
happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov
2017 (this is the "post-print" and "post-proofs" version; minor corrections
only from v1, and most of errors found in proofs corrected
The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III
The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with
new instrumentation and new surveys focused on Galactic structure and chemical
evolution, measurements of the baryon oscillation feature in the clustering of
galaxies and the quasar Ly alpha forest, and a radial velocity search for
planets around ~8000 stars. This paper describes the first data release of
SDSS-III (and the eighth counting from the beginning of the SDSS). The release
includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap,
bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a
third of the Celestial Sphere. All the imaging data have been reprocessed with
an improved sky-subtraction algorithm and a final, self-consistent photometric
recalibration and flat-field determination. This release also includes all data
from the second phase of the Sloan Extension for Galactic Understanding and
Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars
at both high and low Galactic latitudes. All the more than half a million
stellar spectra obtained with the SDSS spectrograph have been reprocessed
through an improved stellar parameters pipeline, which has better determination
of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from
submitted version
Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context
Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas
Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN
Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas
This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing
molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and
correlation with overall survival. TIL map structural patterns were grouped using standard
histopathological parameters. These patterns are enriched in particular T cell subpopulations
derived from molecular measures. TIL densities and spatial structure were differentially enriched
among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial
infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic
patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for
the TCGA image archives with insights into the tumor-immune microenvironment
- …