50 research outputs found

    Sequence-Based Classification Using Discriminatory Motif Feature Selection

    Get PDF
    Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all -mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length , such that potentially important, longer () predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http://www.epibiostat.ucsf.edu/biostat/sen/dmfs/

    Pancreatic Cancer Malnutrition and Pancreatic Exocrine Insufficiency in the Course of Chemotherapy in Unresectable Pancreatic Cancer

    Get PDF
    Background: Malnutrition and cachexia are common in patients with advanced pancreatic ductal adenocarcinoma (PDAC) and have a significant influence on the tolerance and response to treatments. If timely identified, malnourished PDAC patients could be treated to increase their capacity to complete the planned treatments and, therefore, possibly, improve their efficacy. Aims: The aim of this study is to assess the impact of nutritional status, pancreatic exocrine insufficiency (PEI), and other clinical factors on patient outcomes in patients with advanced PDAC. Methods: PAncreatic Cancer MAlnutrition and Pancreatic Exocrine INsufficiency in the Course of Chemotherapy in Unresectable Pancreatic Cancer (PAC-MAIN) is an international multicenter prospective observational cohort study. The nutritional status will be determined by means of Mini-Nutritional Assessment score and laboratory blood tests. PEI will be defined by reduced fecal elastase levels. MAIN OUTCOME: adherence to planned chemotherapy in the first 12 weeks following the diagnosis, according to patients' baseline nutritional status and quantified and reported as "percent of standard chemotherapy dose delivered." SECONDARY OUTCOMES: rate of chemotherapy-related toxicity, progression-free survival, survival at 6 months, overall survival, quality of life, and the number of hospitalizations. ANALYSIS: chemotherapy dosing over the first 12 weeks of therapy (i.e., percent of chemotherapy received in the first 12 weeks, as defined above) will be compared between well-nourished and malnourished patients. SAMPLE SIZE: based on an expected percentage of chemotherapy delivered of 70% in well-nourished patients, with a type I error of 0.05 and a type II error of 0.20, a sample size of 93 patients per group will be required in case of a percentage difference of chemotherapy delivered of 20% between well-nourished and malnourished patients, 163 patients per group in case of a difference of 15% between the groups, and 356 patients per group in case of a 10% difference. Centers from Russia, Romania, Turkey, Spain, Serbia, and Italy will participate in the study upon Local Ethics Committee approval. Discussion: PAC-MAIN will provide insights into the role of malnutrition and PEI in the outcomes of PDAC. The study protocol was registered at clinicaltrials.gov as NCT04112836

    Risk and Protective Factors for the Occurrence of Sporadic Pancreatic Endocrine Neoplasms

    Get PDF
    Pancreatic neuroendocrine neoplasms (PNENs) represent 10% of all pancreatic tumors by prevalence. Their incidence has reportedly increased over recent decades in parallel with that of pancreatic adenocarcinoma. PNENs are relatively rare, and of the few institutions that have published potential risk factors, findings have been heterogeneous. Our objective was to investigate the association between potential risk and protective factors for the occurrence of sporadic PNENs across a European population from several institutions. A multinational European case-control study was conducted to examine the association of selected environmental, family and medical exposure factors using a standardized questionnaire in face-to-face interviews. A ratio of 1:3 cases to controls were sex and age matched at each study site. Adjusted univariate and multivariate logistic regression analysis were performed for statistically significant factors. The following results were obtained: In 201 cases and 603 controls, non-recent onset diabetes (OR 2.09, CI 1.27-3.46) was associated with an increased occurrence of PNENs. The prevalence of non-recent onset diabetes was higher both in cases with metastatic disease (TNM stage III-IV) or advanced grade (G3) at the time of diagnosis. The use of metformin in combination with insulin was also associated with a more aggressive phenotype. Drinking coffee was more frequent in cases with localized disease at diagnosis. Our study concluded that non-recent onset diabetes was associated with an increased occurrence of PNENs and the combination of metformin and insulin was consistent with a more aggressive PNEN phenotype. In contrast to previous studies, smoking, alcohol and first-degree family history of cancer were not associated with PNEN occurrence

    Resequencing Candidate Genes Implicates Rare Variants in Asthma Susceptibility

    Get PDF
    Common variation in over 100 genes has been implicated in the risk of developing asthma, but the contribution of rare variants to asthma susceptibility remains largely unexplored. We selected nine genes that showed the strongest signatures of weak purifying selection from among 53 candidate asthma-associated genes, and we sequenced the coding exons and flanking noncoding regions in 450 asthmatic cases and 515 nonasthmatic controls. We observed an overall excess of p values <0.05 (p = 0.02), and rare variants in four genes (AGT, DPP10, IKBKAP, and IL12RB1) contributed to asthma susceptibility among African Americans. Rare variants in IL12RB1 were also associated with asthma susceptibility among European Americans, despite the fact that the majority of rare variants in IL12RB1 were specific to either one of the populations. The combined evidence of association with rare noncoding variants in IL12RB1 remained significant (p = 3.7 × 10−4) after correcting for multiple testing. Overall, the contribution of rare variants to asthma susceptibility was predominantly due to noncoding variants in sequences flanking the exons, although nonsynonymous rare variants in DPP10 and in IL12RB1 were associated with asthma in African Americans and European Americans, respectively. This study provides evidence that rare variants contribute to asthma susceptibility. Additional studies are required for testing whether prioritizing genes for resequencing on the basis of signatures of purifying selection is an efficient means of identifying novel rare variants that contribute to complex disease

    Meta-analysis of genome-wide association studies of asthma in ethnically diverse North American populations.

    Get PDF
    Asthma is a common disease with a complex risk architecture including both genetic and environmental factors. We performed a meta-analysis of North American genome-wide association studies of asthma in 5,416 individuals with asthma (cases) including individuals of European American, African American or African Caribbean, and Latino ancestry, with replication in an additional 12,649 individuals from the same ethnic groups. We identified five susceptibility loci. Four were at previously reported loci on 17q21, near IL1RL1, TSLP and IL33, but we report for the first time, to our knowledge, that these loci are associated with asthma risk in three ethnic groups. In addition, we identified a new asthma susceptibility locus at PYHIN1, with the association being specific to individuals of African descent (P = 3.9 × 10(-9)). These results suggest that some asthma susceptibility loci are robust to differences in ancestry when sufficiently large samples sizes are investigated, and that ancestry-specific associations also contribute to the complex genetic architecture of asthma

    Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer.

    Get PDF
    In 2020, 146,063 deaths due to pancreatic cancer are estimated to occur in Europe and the United States combined. To identify common susceptibility alleles, we performed the largest pancreatic cancer GWAS to date, including 9040 patients and 12,496 controls of European ancestry from the Pancreatic Cancer Cohort Consortium (PanScan) and the Pancreatic Cancer Case-Control Consortium (PanC4). Here, we find significant evidence of a novel association at rs78417682 (7p12/TNS3, P = 4.35 × 10-8). Replication of 10 promising signals in up to 2737 patients and 4752 controls from the PANcreatic Disease ReseArch (PANDoRA) consortium yields new genome-wide significant loci: rs13303010 at 1p36.33 (NOC2L, P = 8.36 × 10-14), rs2941471 at 8q21.11 (HNF4G, P = 6.60 × 10-10), rs4795218 at 17q12 (HNF1B, P = 1.32 × 10-8), and rs1517037 at 18q21.32 (GRP, P = 3.28 × 10-8). rs78417682 is not statistically significantly associated with pancreatic cancer in PANDoRA. Expression quantitative trait locus analysis in three independent pancreatic data sets provides molecular support of NOC2L as a pancreatic cancer susceptibility gene

    Distance-based assessment of the localization of functional annotations in 3D genome reconstructions.

    Get PDF
    BACKGROUND: Recent studies used the contact data or three-dimensional (3D) genome reconstructions from Hi-C (chromosome conformation capture with next-generation sequencing) to assess the co-localization of functional genomic annotations in the nucleus. These analyses dichotomized data point pairs belonging to a functional annotation as close or far based on some threshold and then tested for enrichment of close pairs. We propose an alternative approach that avoids dichotomization of the data and instead directly estimates the significance of distances within the 3D reconstruction. RESULTS: We applied this approach to 3D genome reconstructions for Plasmodium falciparum, the causative agent of malaria, and Saccharomyces cerevisiae and compared the results to previous approaches. We found significant 3D co-localization of centromeres, telomeres, virulence genes, and several sets of genes with developmentally regulated expression in P. falciparum; and significant 3D co-localization of centromeres and long terminal repeats in S. cerevisiae. Additionally, we tested the experimental observation that telomeres form three to seven clusters in P. falciparum and S. cerevisiae. Applying affinity propagation clustering to telomere coordinates in the 3D reconstructions yielded six telomere clusters for both organisms. CONCLUSIONS: Distance-based assessment replicated key findings, while avoiding dichotomization of the data (which previously yielded threshold-sensitive results)
    corecore