137 research outputs found

    Fit to Predict? Ecoinformatics for Predicting the Catchability of a Pelagic Fish in Near Real-Time

    Get PDF
    The ocean is a dynamic environment inhabited by a diverse array of highly migratory species, many of which are under direct exploitation in targeted fisheries. The timescales of variability in the marine realm coupled with the extreme mobility of ocean-wandering species such as tuna and billfish complicates fisheries management. Developing ecoinformatics solutions that allow for near real-time prediction of the distributions of highly mobile marine species is an important step towards the maturation of dynamic ocean management and ecological forecasting. Using 25 years (1990-2014) of NOAA fisheries\u27 observer data from the California drift gillnet fishery, we model relative probability of occurrence (presence-absence) and catchability (total catch) of broadbill swordfish Xiphias gladius in the California Current System (CCS). Using freely-available environmental datasets and open source software, we explore the physical drivers of regional swordfish distribution. Comparing models built upon remotely-sensed datasets with those built upon a data-assimilative configuration of the Regional Ocean Modelling System (ROMS), we explore trade-offs in model construction and address how physical data can affect predictive performance and operational capacity. Swordfish catchability was found to be highest in deeper waters (\u3e1500m) with surface temperatures in the 14-20 degrees C range, isothermal layer depth (ILD) of 20-40m, positive sea surface height anomalies and during the new moon

    Integrating Dynamic Subsurface Habitat Metrics Into Species Distribution Models

    Get PDF
    Species distribution models (SDMs) have become key tools for describing and predicting species habitats. In the marine domain, environmental data used in modeling species distributions are often remotely sensed, and as such have limited capacity for interpreting the vertical structure of the water column, or are sampled in situ, offering minimal spatial and temporal coverage. Advances in ocean models have improved our capacity to explore subsurface ocean features, yet there has been limited integration of such features in SDMs. Using output from a data-assimilative configuration of the Regional Ocean Modeling System, we examine the effect of including dynamic subsurface variables in SDMs to describe the habitats of four pelagic predators in the California Current System (swordfish Xiphias gladius, blue sharks Prionace glauca, common thresher sharks Alopias vulpinus, and shortfin mako sharks lsurus oxyrinchus). Species data were obtained from the California Drift Gillnet observer program (1997-2017). We used boosted regression trees to explore the incremental improvement enabled by dynamic subsurface variables that quantify the structure and stability of the water column: isothermal layer depth and bulk buoyancy frequency. The inclusion of these dynamic subsurface variables significantly improved model explanatory power for most species. Model predictive performance also significantly improved, but only for species that had strong affiliations with dynamic variables (swordfish and shortfin mako sharks) rather than static variables (blue sharks and common thresher sharks). Geospatial predictions for all species showed the integration of isothermal layer depth and bulk buoyancy frequency contributed value at the mesoscale level (\u3c 100 km) and varied spatially throughout the study domain. These results highlight the utility of including dynamic subsurface variables in SDM development and support the continuing ecological use of biophysical output from ocean circulation models

    Identifier mapping performance for integrating transcriptomics and proteomics experimental results

    Get PDF
    Background\ud Studies integrating transcriptomic data with proteomic data can illuminate the proteome more clearly than either separately. Integromic studies can deepen understanding of the dynamic complex regulatory relationship between the transcriptome and the proteome. Integrating these data dictates a reliable mapping between the identifier nomenclature resultant from the two high-throughput platforms. However, this kind of analysis is well known to be hampered by lack of standardization of identifier nomenclature among proteins, genes, and microarray probe sets. Therefore data integration may also play a role in critiquing the fallible gene identifications that both platforms emit.\ud \ud Results\ud We compared three freely available internet-based identifier mapping resources for mapping UniProt accessions (ACCs) to Affymetrix probesets identifications (IDs): DAVID, EnVision, and NetAffx. Liquid chromatography-tandem mass spectrometry analyses of 91 endometrial cancer and 7 noncancer samples generated 11,879 distinct ACCs. For each ACC, we compared the retrieval sets of probeset IDs from each mapping resource. We confirmed a high level of discrepancy among the mapping resources. On the same samples, mRNA expression was available. Therefore, to evaluate the quality of each ACC-to-probeset match, we calculated proteome-transcriptome correlations, and compared the resources presuming that better mapping of identifiers should generate a higher proportion of mapped pairs with strong inter-platform correlations. A mixture model for the correlations fitted well and supported regression analysis, providing a window into the performance of the mapping resources. The resources have added and dropped matches over two years, but their overall performance has not changed.\ud \ud Conclusions\ud The methods presented here serve to achieve concrete context-specific insight, to support well-informed decisions in choosing an ID mapping strategy for "omic" data merging

    Predictive modeling for determination of microscopic residual disease at primary cytoreduction: An NRG Oncology/Gynecologic Oncology Group 182 Study

    Get PDF
    Microscopic residual disease following complete cytoreduction (R0) is associated with a significant survival benefit for patients with advanced epithelial ovarian cancer (EOC). Our objective was to develop a prediction model for R0 to support surgeons in their clinical care decisions.Demographic, pathologic, surgical, and CA125 data were collected from GOG 182 records. Patients enrolled prior to September 1, 2003 were used for the training model while those enrolled after constituted the validation data set. Univariate analysis was performed to identify significant predictors of R0 and these variables were subsequently analyzed using multivariable regression. The regression model was reduced using backward selection and predictive accuracy was quantified using area under the receiver operating characteristic area under the curve (AUC) in both the training and the validation data sets.Of the 3882 patients enrolled in GOG 182, 1480 had complete clinical data available for the analysis. The training data set consisted of 1007 patients (234 with R0) while the validation set was comprised of 473 patients (122 with R0). The reduced multivariable regression model demonstrated several variables predictive of R0 at cytoreduction: Disease Score (DS) ( < 0.001), stage ( = 0.009), CA125 ( < 0.001), ascites ( < 0.001), and stage-age interaction ( = 0.01). Applying the prediction model to the validation data resulted in an AUC of 0.73 (0.67 to 0.78, 95% CI). Inclusion of DS enhanced the model performance to an AUC of 0.83 (0.79 to 0.88, 95% CI).We developed and validated a prediction model for R0 that offers improved performance over previously reported models for prediction of residual disease. The performance of the prediction model suggests additional factors (i.e. imaging, molecular profiling, etc.) should be explored in the future for a more clinically actionable tool

    Global expression analysis of cancer/testis genes in uterine cancers reveals a high incidence of BORIS expression

    Get PDF
    Abstract Purpose: Cancer/testis (CT) genes predominantly expressed in the testis (germ cells) and generally not in other normal tissues are aberrantly expressed in human cancers. This highly restricted expression provides a unique opportunity to use these CTgenes for diagnostics, immunotherapeutic, or other targeted therapies. The purpose of this study was to identify those CT genes with the greatest incidence of expression in uterine cancers. Experimental Design: We queried the expression of known and putative CT gene transcripts (representing 79 gene loci) using whole genome gene expression arrays. Specifically, the global gene expressions of uterine cancers (n = 122) and normal uteri (n = 10) were determined using expression data from the Affymetrix HG-U133A and HG-U133B chips. Additionally, we also examined the brother of the regulator of imprinted sites (BORIS) transcript by reverse transcription-PCR and quantitative PCR because its transcript was not represented on the array. Results: Global microarray analysis detected many CT genes expressed in various uterine cancers; however, no individual CT gene was expressed in more than 25% of all cancers. The expression of the two most commonly expressed CT genes on the arrays, MAGEA9 (24 of 122 cancers and 0 of10 normal tissues) and Down syndrome critical region 8 (DSCR8)/MMA1 (16 if 122 cancers and 0 of 10 normal tissues), was confirmed by reverse transcription-PCR methods, validating the array screening approach. In contrast to the relatively low incidence of expression of the other CTgenes, BORIS expression was detected in 73 of 95 (77%) endometrial cancers and 24 of 31 (77%) uterine mixed mesodermal tumors. Conclusions: These data provide the first extensive survey of multiple CT genes in uterine cancers. Importantly, we detected a high frequency of BORIS expression in uterine cancers, suggesting its potential as an immunologic or diagnostic target for these cancers. Given the high incidence of BORIS expression and its possible regulatory role, an examination of BORIS function in the etiology of these cancers is warranted

    Genetic variation in CFH predicts phenytoin-induced maculopapular exanthema in European-descent patients

    Get PDF
    Objective To characterize, among European and Han Chinese populations, the genetic predictors of maculopapular exanthema (MPE), a cutaneous adverse drug reaction common to antiepileptic drugs. Methods We conducted a case-control genome-wide association study of autosomal genotypes, including Class I and II human leukocyte antigen (HLA) alleles, in 323 cases and 1,321 drug-tolerant controls from epilepsy cohorts of northern European and Han Chinese descent. Results from each cohort were meta-analyzed. Results We report an association between a rare variant in the complement factor H–related 4 (CFHR4) gene and phenytoin-induced MPE in Europeans (p = 4.5 × 10–11; odds ratio [95% confidence interval] 7 [3.2–16]). This variant is in complete linkage disequilibrium with a missense variant (N1050Y) in the complement factor H (CFH) gene. In addition, our results reinforce the association between HLA-A*31:01 and carbamazepine hypersensitivity. We did not identify significant genetic associations with MPE among Han Chinese patients. Conclusions The identification of genetic predictors of MPE in CFHR4 and CFH, members of the complement factor H–related protein family, suggest a new link between regulation of the complement system alternative pathway and phenytoin-induced hypersensitivity in European-ancestral patients

    [Comment] Redefine statistical significance

    Get PDF
    The lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on “statistically significant” findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (e.g., multiple testing, P-hacking, publication bias, and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: Statistical standards of evidence for claiming discoveries in many fields of science are simply too low. Associating “statistically significant” findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems. For fields where the threshold for defining statistical significance is P<0.05, we propose a change to P<0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called “significant” but do not meet the new threshold should instead be called “suggestive.” While statisticians have known the relative weakness of using P≈0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new (1, 2), a critical mass of researchers now endorse this change. We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (e.g., genomics and high-energy physics research; see Potential Objections below). We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P-values. However, changing the P-value threshold is simple and might quickly achieve broad acceptance
    corecore