51 research outputs found

    Making accurate and interpretable treatment decisions for binary outcomes

    Full text link
    Optimal treatment rules can improve health outcomes on average by assigning a treatment associated with the most desirable outcome to each individual. Due to an unknown data generation mechanism, it is appealing to use flexible models to estimate these rules. However, such models often lead to complex and uninterpretable rules. In this article, we introduce an approach aimed at estimating optimal treatment rules that have higher accuracy, higher value, and lower loss from the same simple model family. We use a flexible model to estimate the optimal treatment rules and a simple model to derive interpretable treatment rules. We provide an extensible definition of interpretability and present a method that - given a class of simple models - can be used to select a preferred model. We conduct a simulation study to evaluate the performance of our approach compared to treatment rules obtained by fitting the same simple model directly to observed data. The results show that our approach has lower average loss, higher average outcome, and greater power in identifying individuals who can benefit from the treatment. We apply our approach to derive treatment rules of adjuvant chemotherapy in colon cancer patients using cancer registry data. The results show that our approach has the potential to improve treatment decisions

    Exploring cancer survivor needs and preferences for communicating personalized cancer statistics from registry data: Qualitative multimethod study

    Get PDF
    Background: Disclosure of cancer statistics (eg, survival or incidence rates) based on a representative group of patients can help increase cancer survivors’ understanding of their own diagnostic and prognostic situation, and care planning. More recently, there has been an increasing interest in the use of cancer registry data for disclosing and communicating personalized cancer statistics (tailored toward personal and clinical characteristics) to cancer survivors and relatives. Objective: The aim of this study was to explore breast cancer (BCa) and prostate cancer (PCa) survivor needs and preferences for disclosing (what) and presenting (how) personalized statistics from a large Dutch population-based data set, the Netherlands Cancer Registry (NCR). Methods: To elicit survivor needs and preferences for communicating personalized NCR statistics, we created different (non)interactive tools visualizing hypothetical scenarios and adopted a qualitative multimethod study design. We first conducted 2 focus groups (study 1; n=13) for collecting group data on BCa and PCa survivor needs and preferences, using noninteractive sketches of what a tool for communicating personalized statistics might look like. Based on these insights, we designed a revised interactive tool, which was used to further explore the needs and preferences of another group of cancer survivors during individual think-aloud observations and semistructured interviews (study 2; n=11). All sessions were audio-recorded, transcribed verbatim, analyzed using thematic (focus groups) and content analysis (think-aloud observations), and reported in compliance with qualitative research reporting criteria. Results: In both studies, cancer survivors expressed the need to receive personalized statistics from a representative source, with especially a need for survival and conditional survival rates (ie, survival rate for those who have already survived for a certain period). Personalized statistics adjusted toward personal and clinical factors were deemed more relevant and useful to know than generic or average-based statistics. Participants also needed support for correctly interpreting the personalized statistics and putting them into perspective, for instance by adding contextual or comparative information. Furthermore, while thinking aloud, participants experienced a mix of positive (sense of hope) and negative emotions (feelings of distress) while viewing the personalized survival data. Overall, participants preferred simplicity and conciseness, and the ability to tailor the type of visualization and amount of (detailed) statistical information. Conclusions: The majority of our sample of cancer survivors wanted to receive personalized statistics from the NCR. Given the variation in patient needs and preferences for presenting personalized statistics, designers of similar information tools may consider potential tailoring strategies on multiple levels, as well as effective ways for providing supporting information to make sure that the personalized statistics are properly understood. This is encouraging for cancer registries to address this unmet need, but also for those who are developing or implementing personalized data-driven information tools for patients and relatives

    A Case Study on Information Extraction from the Internet: Populating a Movie Ontology

    No full text
    A method to extract information from the internet is discussed. We choose to formally represent information by ontologies, and the information extraction problem is reformulated as an ontology population problem

    Web-Based Artist Categorization

    No full text
    We present a novel approach in categorizing artists into subjective categories such as genre. We base our method on co-occurrences on the web, found with the Google search engine. A direct mapping between artists and categories proved to be unreliable. We use the categories mapped to closely related artists to obtain a more reliable mapping. The method is tested on a genre classification test set with convincing results. Moreover, mood categorization is explored using the same techniques

    Efficient Lyrics Extraction from the Web

    No full text
    We present a novel method to extract lyrics from the Web. The aim is to extract a set of multiple versions of the lyrics to a song. Lyrics can be identified within a text by a regular expression. We use a projection of a document to efficiently identify lyrics within the document by mapping it to a regular expression. We describe a method to cluster the multiple versions of the lyrics by filtering out erroneous texts such as lyrics to other songs. For reasons of efficiency, we do this by comparing fingerprints instead of the texts themselves

    Improving the Accessibility of a Thesaurus-Based Catalog by Web Content Mining

    No full text
    Abstract. In this work we focus on the improvement of the accessibility of a catalog of radio and television productions. As productions can only be searched using the annotated meta-data, the use of a controlled vocabulary plays an important role in the retrieval. However, users can not be expected to have detailed knowledge on the terms defined within the vocabulary. In this work, we present a method that assists users to find appropriate keywords for their queries, making the archive better accessible both for professional users and for the general public. The experimental results show that the algorithm developed can be of assistance for those working with the thesaurus.

    Automatic Ontology Population by Googling

    No full text
    We discuss a method to populate ontologies with the use of googled text fragments. We populate an ontology by the use of hand-crafted domain-specific relation patterns, which can be seen as a generalization of Hearst patterns. The algorithm described uses instances of some class returned by Google to find instances of other classes. A case study on populating an ontology on the movie domain is presented as an illustration of the method. We present the algorithm in detail and discuss the results of our work
    • …
    corecore