37 research outputs found

    Landscape of somatic single nucleotide variants and indels in colorectal cancer and impact on survival

    Get PDF
    Colorectal cancer (CRC) is a biologically heterogeneous disease. To characterize its mutational profile, we conduct targeted sequencing of 205 genes for 2,105 CRC cases with survival data. Our data shows several findings in addition to enhancing the existing knowledge of CRC. We identify PRKCI, SPZ1, MUTYH, MAP2K4, FETUB, and TGFBR2 as additional genes significantly mutated in CRC. We find that among hypermutated tumors, an increased mutation burden is associated with improved CRC-specific survival (HR=0.42, 95% CI: 0.21-0.82). Mutations in TP53 are associated with poorer CRC-specific survival, which is most pronounced in cases carrying TP53 mutations with predicted 0% transcriptional activity (HR=1.53, 95% CI: 1.21-1.94). Furthermore, we observe differences in mutational frequency of several genes and pathways by tumor location, stage, and sex. Overall, this large study provides deep insights into somatic mutations in CRC, and their potential relationships with survival and tumor features. Large scale sequencing study is of paramount importance to unravel the heterogeneity of colorectal cancer. Here, the authors sequenced 205 cancer genes in more than 2000 tumours and identified additional mutated driver genes, determined that mutational burden and specific mutations in TP53 are associated with survival odds

    On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

    No full text
    Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on inverting'' the distribution of labels, e.g. answering mostlyyes'' when the common training answer was no''. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Henge

    V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices.

    No full text
    AAAI-20 Technical Tracks 7 / AAAI Technical Track: VisionOne of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is a critical concern because generalisation enables robust reasoning over unseen data, whereas leveraging superficial statistics is fragile to even small changes in data distribution. To illuminate the issue and drive progress towards a solution, we propose a test that explicitly evaluates abstract reasoning over visual data. We introduce a large-scale benchmark of visual questions that involve operations fundamental to many high-level vision tasks, such as comparisons of counts and logical operations on complex visual properties. The benchmark directly measures a method's ability to infer high-level relationships and to generalise them over image-based concepts. It includes multiple training/test splits that require controlled levels of generalization. We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement.Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton van den Henge

    Bottom-up and top-down attention for image captioning and visual question answering

    No full text
    Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhan

    Medical data inquiry using a question answering model

    No full text
    Access to hospital data is commonly a difficult, costly and time-consuming process requiring extensive interaction with network administrators. This leads to possible delays in obtaining insights from data, such as diagnosis or other clinical outcomes. Healthcare administrators, medical practitioners, researchers and patients could benefit from a system that could extract relevant information from healthcare data in real-time. In this paper, we present a question answering system that allows health professionals to interact with a large-scale database by asking questions in natural language. This system is built upon the BERT and SQLOVA models, which translate a user's request into an SQL query, which is then passed to the data server to retrieve relevant information. We also propose a deep bilinear similarity model to improve the generated SQL queries by better matching terms in the user's query with the database schema and contents. This system was trained with only 75 real questions and 455 back-translated questions, and was evaluated over 75 additional real questions about a real health information database, achieving a retrieval accuracy of 78%.Zhibin Liao, Lingqiao Liu, Qi Wu, Damien Teney, Chunhua Shen, Anton van den Hengel, Johan Verjan
    corecore