511 research outputs found

    Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

    Get PDF
    We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time

    Uniform Approximation Is More Appropriate for Wilcoxon Rank-Sum Test in Gene Set Analysis

    Get PDF
    Gene set analysis is widely used to facilitate biological interpretations in the analyses of differential expression from high throughput profiling data. Wilcoxon Rank-Sum (WRS) test is one of the commonly used methods in gene set enrichment analysis. It compares the ranks of genes in a gene set against those of genes outside the gene set. This method is easy to implement and it eliminates the dichotomization of genes into significant and non-significant in a competitive hypothesis testing. Due to the large number of genes being examined, it is impractical to calculate the exact null distribution for the WRS test. Therefore, the normal distribution is commonly used as an approximation. However, as we demonstrate in this paper, the normal approximation is problematic when a gene set with relative small number of genes is tested against the large number of genes in the complementary set. In this situation, a uniform approximation is substantially more powerful, more accurate, and less intensive in computation. We demonstrate the advantage of the uniform approximations in Gene Ontology (GO) term analysis using simulations and real data sets

    Tutte polynomial of pseudofractal scale-free web

    Full text link
    The Tutte polynomial of a graph is a 2-variable polynomial which is quite important in both combinatorics and statistical physics. It contains various numerical invariants and polynomial invariants, such as the number of spanning trees, the number of spanning forests, the number of acyclic orientations, the reliability polynomial, chromatic polynomial and flow polynomial. In this paper, we study and gain recursive formulas for the Tutte polynomial of pseudofractal scale-free web (PSW) which implies logarithmic complexity algorithm is obtained to calculate the Tutte polynomial of PSW although it is NP-hard for general graph. We also obtain the rigorous solution for the the number of spanning trees of PSW by solving the recurrence relations derived from Tutte polynomial, which give an alternative approach for explicitly determining the number of spanning trees of PSW. Further more, we analysis the all-terminal reliability of PSW and compare the results with that of Sierpinski gasket which has the same number of nodes and edges with PSW. In contrast with the well-known conclusion that scale-free networks are more robust against removal of nodes than homogeneous networks (e.g., exponential networks and regular networks). Our results show that Sierpinski gasket (which is a regular network) are more robust against random edge failures than PSW (which is a scale-free network). Whether it is true for any regular networks and scale-free networks, is still a unresolved problem.Comment: 19pages,7figures. arXiv admin note: text overlap with arXiv:1006.533

    Improving gene-set enrichment analysis of RNA-Seq data with small replicates

    Get PDF
    Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterG- SEA) coded with C++ (Rcpp) is available from CRAN.open

    Topoisomerase II alpha expression and the benefit of adjuvant chemotherapy for postoperative patients with non-small cell lung cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Adjuvant chemotherapy has been shown to improve survival rates of postoperative patients with non-small cell lung cancer (NSCLC). Biomarkers could help select an appropriate chemotherapy for NSCLC patients or predict the efficacy of chemotherapy. The objective of this study was to explore the possible prognostic and predictive role of topoisomerase II alpha (TopIIα) expression level in postoperative NSCLC patients who received adjuvant chemotherapy.</p> <p>Methods</p> <p>Patients with stage I-III NSCLC, who underwent surgery in our hospital from January 2004 to December 2007 and who also received adjuvant chemotherapy after surgery, were analyzed in this study. Expression of TopIIα and Ki67 in paraffin-embedded tissues was detected by immunohistochemistry (IHC). The relationships between clinicopathological characteristics, chemotherapy regimens, the expression of biomarkers and disease free survival (DFS) were analyzed.</p> <p>Results</p> <p>TopIIα and Ki67 were highly expressed in 22.5% and 36.4% of the 151 patients, respectively. Univariate survival analysis showed that male sex (P = 0.036), non-adenocarcinoma (P = 0.004), earlier pathological TNM stage (P = 0.001) or pathological N stage (P < 0.001), and high expression of TopIIα (P = 0.012) were correlated with better DFS, whereas age, smoking history, different chemotherapy regimens, T stage and expression level of Ki67 were of no prognostic significance. Further stratified analysis showed that vinorelbine (NVB)-containing adjuvant regimens were generally associated with better DFS than regimens without NVB in patients with low TopIIα expression, though the difference was not statistically significant (P = 0.065). Pairwise comparisons for patients with low TopIIα expression indicated that the NVB-containing regimen was associated with better DFS than the docetaxel (TXT)-containing regimen (P = 0.047). COX multivariate analysis showed that pathological TNM stage, histological subtype and expression level of TopIIα to be independent of risk factors affecting DFS in postoperative NSCLC patients who received chemotherapy.</p> <p>Conclusions</p> <p>High TopIIα expression was discovered to be correlated with better DFS for postoperative NSCLC patients who received adjuvant chemotherapy. The NVB-containing chemotherapy regimen was more effective than the TXT-containing regimen in improving DFS in patients with low TopIIα expression. TopIIα could be considered to be an independent prognostic biomarker of DFS in postoperative NSCLC patients who received adjuvant chemotherapy.</p

    Genetic Variation of the Human α-2-Heremans-Schmid Glycoprotein (AHSG) Gene Associated with the Risk of SARS-CoV Infection

    Get PDF
    Genetic background may play an important role in the process of SARS-CoV infection and SARS development. We found several proteins that could interact with the nucleocapsid protein of the SARS coronavirus (SARS-CoV). α-2-Heremans-Schmid Glycoprotein (AHSG), which is required for macrophage deactivation by endogenous cations, is associated with inflammatory regulation. Cytochrome P450 Family 3A (CYP4F3A) is an ω-oxidase that inactivates Leukotriene B4 (LTB4) in human neutrophils and the liver. We investigated the association between the polymorphisms of these two inflammation-associated genes and SARS development. The linkage disequilibrium (LD) maps of these two genes were built with Haploview using data on CHB+JPT (version 2) from the HapMap. A total of ten tag SNPs were selected and genotyped. In the Guangzhou cohort study, after adjusting for age and sex, two AHSG SNPs and one CYP4F3 SNP were found to be associated with SARS susceptibility: rs2248690 (adjusted odds ratio [AOR] 2.42; 95% confidence interval [CI] 1.30-4.51); rs4917 (AOR 1.84; 95% CI 1.02-3.34); and rs3794987 (AOR 2.01; 95% CI 1.10–3.68). To further validate the association, the ten tag SNPs were genotyped in the Beijing cohort. After adjusting for age and sex, only rs2248690 (AOR, 1.63; 95% CI, 1.30–2.04) was found to be associated with SARS susceptibility. The combined analysis of the two studies confirmed tag SNP rs2248690 in AHSG as a susceptibility variant (AOR 1.70; 95% CI 1.37–2.09). The statistical analysis of the rs2248690 genotype data among the patients and healthy controls in the HCW cohort, who were all similarly exposed to the SARS virus, also supported the findings. Further, the SNP rs2248690 affected the transcriptional activity of the AHSG promoter and thus regulated the AHSG serum level. Therefore, our study has demonstrated that the AA genotype of rs2268690, which leads to a higher AHSG serum concentration, was significantly associated with protection against SARS development

    A Dynamical Model of Oocyte Maturation Unveils Precisely Orchestrated Meiotic Decisions

    Get PDF
    Maturation of vertebrate oocytes into haploid gametes relies on two consecutive meioses without intervening DNA replication. The temporal sequence of cellular transitions driving eggs from G2 arrest to meiosis I (MI) and then to meiosis II (MII) is controlled by the interplay between cyclin-dependent and mitogen-activated protein kinases. In this paper, we propose a dynamical model of the molecular network that orchestrates maturation of Xenopus laevis oocytes. Our model reproduces the core features of maturation progression, including the characteristic non-monotonous time course of cyclin-Cdks, and unveils the network design principles underlying a precise sequence of meiotic decisions, as captured by bifurcation and sensitivity analyses. Firstly, a coherent and sharp meiotic resumption is triggered by the concerted action of positive feedback loops post-translationally activating cyclin-Cdks. Secondly, meiotic transition is driven by the dynamic antagonism between positive and negative feedback loops controlling cyclin turnover. Our findings reveal a highly modular network in which the coordination of distinct regulatory schemes ensures both reliable and flexible cell-cycle decisions
    corecore