    Contextualized Semantic Maps for Retrieval and Summarization of Biomedical Literature

    As the volume of biomedical literature increases, it can be challenging for clinicians to stay up-to-date on this massive store of knowledge. Graphical summarization systems condense knowledge into a more tractable form via "concept maps" -- networks of nodes (concepts) and edges (relations between concepts). In existing graphical summarization systems, the context of the extracted relations (such as study design and study population) is omitted. However, context is crucial for capturing the full meaning of a relation. With context, the user may pose more detailed queries than those accommodated by traditional, context-free maps.This dissertation describes Casama, a system for creating "contextualized semantic maps" to represent the current state of scientific knowledge in the domain of non-small cell lung cancer (NSCLC). A formalism for contextualized semantic maps is presented, including targeted relations, study design context, and study population context. An annotated gold standard conforming to this representation is produced, and methods for extracting these contexts are developed. Contextualized semantic maps are evaluated in an information retrieval task and a summarization usability study, showing significant improvement over PubMed and SemRep

    Local Trends in Global Music Streaming

    Audio streaming services have made it easier for countries around the world to listen to each other's music. This expansion in listeners' access to global content, however, has raised questions about streaming's impact on the import and export flows of music between countries and their preferences for local or global content. Here, we analyze five and a half years of all streaming data from Spotify, a global music streaming service, and find that preferences for local content have increased from 2014 through 2019, reversing previously noted trends. Perhaps correspondingly, both common official language and geographic proximity between countries increasingly shape listener consumption during this period, particularly for younger audiences. Further, we show that these trends persist across different genres, listener age groups, and early- and late-adopters of streaming, providing new insights into this newest phase in the continued evolution of music and its impact on listeners around the world

    Representing and extracting lung cancer study metadata: study objective and study design.

    This paper describes the information retrieval step in Casama (Contextualized Semantic Maps), a project that summarizes and contextualizes current research papers on driver mutations in non-small cell lung cancer. Casama׳s representation of lung cancer studies aims to capture elements that will assist an end-user in retrieving studies and, importantly, judging their strength. This paper focuses on two types of study metadata: study objective and study design. 430 abstracts on EGFR and ALK mutations in lung cancer were annotated manually. Casama׳s support vector machine (SVM) automatically classified the abstracts by study objective with as much as 129% higher F-scores compared to PubMed׳s built-in filters. A second SVM classified the abstracts by epidemiological study design, suggesting strength of evidence at a more granular level than in previous work. The classification results and the top features determined by the classifiers suggest that this scheme would be generalizable to other mutations in lung cancer, as well as studies on driver mutations in other cancer domains