1,890 research outputs found

    Mapping the structure of science through clustering in citation networks : granularity, labeling and visualization

    Get PDF
    The science system is large, and millions of research publications are published each year. Within the field of scientometrics, the features and characteristics of this system are studied using quantitative methods. Research publications constitute a rich source of information about the science system and a means to model and study science on a large scale. The classification of research publications into fields is essential to answer many questions about the features and characteristics of the science system. Comprehensive, hierarchical, and detailed classifications of large sets of research publications are not easy to obtain. A solution for this problem is to use network-based approaches to cluster research publications based on their citation relations. Clustering approaches have been applied to large sets of publications at the level of individual articles (in contrast to the journal level) for about a decade. Such approaches are addressed in this thesis. I call the resulting classifications “algorithmically constructed, publications-level classifications of research publications” (ACPLCs). The aim of the thesis is to improve interpretability and utility of ACPLCs. I focus on some issues that hitherto have not received much attention in the previous literature: (1) Conceptual framework. Such a framework is elaborated throughout the thesis. Using the social science citation theory, I argue that citations contextualize and position publications in the science system. Citations may therefore be used to identify research fields, defined as focus areas of research at various granularity levels. (2) Granularity levels corresponding to conceptual framework. In Articles I and II, a method is proposed on how to adjust the granularity of ACPLCs in order to obtain clusters corresponding to research fields at two granularity levels: topics and specialties. (3) Cluster labeling. Article III addresses labeling of clusters at different semantic levels, from broad and large to narrow and small, and compares the use of data from various bibliographic fields and different term weighting approaches. (4) Visualization. The methods resulting from Articles I-III are applied in Article IV to obtain a classification of about 19 million biomedical articles. I propose a visualization methodology that provides overview of the classification, using clusters at coarse levels, as well as the possibility to zoom into details, using clusters at a granular level. In conclusion, I have improved interpretability and utility of ACPLCs by providing a conceptual framework, adjusting granularity of clusters, labeling clusters and, finally, by visualizing an ACPLC in a way that provides both overview and detail. I have demonstrated how these methods can be applied to obtain ACPLCs that are useful to, for example, identify and explore focus areas of research

    Towards new information resources for public health: From WordNet to MedicalWordNet

    Get PDF
    In the last two decades, WORDNET has evolved as the most comprehensive computational lexicon of general English. In this article, we discuss its potential for supporting the creation of an entirely new kind of information resource for public health, viz. MEDICAL WORDNET. This resource is not to be conceived merely as a lexical extension of the original WORDNET to medical terminology; indeed, there is already a considerable degree of overlap between WORDNET and the vocabulary of medicine. Instead, we propose a new type of repository, consisting of three large collections of (1) medically relevant word forms, structured along the lines of the existing Princeton WORDNET; (2) medically validated propositions, referred to here as medical facts, which will constitute what we shall call MEDICAL FACTNET; and (3) propositions reflecting laypersons’ medical beliefs, which will constitute what we shall call the MEDICAL BELIEFNET. We introduce a methodology for setting up the MEDICAL WORDNET. We then turn to the discussion of research challenges that have to be met in order to build this new type of information resource

    Towards Understanding the Role of Environmental Risk Factors in Psychosis and Beyond: A Data-Driven Network Approach

    Get PDF
    Psychotic disorders impose high burden on both the affected individual and society. Despite extensive research efforts in recent decades, their etiology remains poorly understood, hindering progress in prevention and treatment. Two distinct developments in the field may represent ways forward: First, there is a growing recognition of the importance of several potentially malleable environmental risk factors, such as childhood trauma, stressful life events, or cannabis use, in the onset, progression, and maintenance of psychotic disorders. Second, the ubiquitous common cause model of psychotic disorders is increasingly challenged by alternative conceptualizations of mental disorders, such as the network approach to psychopathology. In the common cause model, symptoms are viewed as mere effects of a common cause (the disorder itself, e.g., ‘schizophrenia’), i.e., symptoms covary because of their joint dependence on an assumed latent disorder entity. This traditional view also assumes that environmental factors influence symptoms via the disorder entity. In contrast, the network approach to psychopathology views mental disorders as networks of directly interacting symptoms and other components, such as environmental risk factors. Patterns of covariation between symptoms and other components are assumed to reflect meaningful relationships and become the focus of analysis. Building upon these developments, this thesis proposes a network approach to disentangle potential pathways by which environmental risk factors increase the risk for psychotic disorders. Specifically, the five presented papers focus on individual symptoms and their associations with common environmental risk factors of psychotic disorders. Network structures were generated from empirical data by estimating unique pairwise relationships, i.e., the associations between any two variables that remain after controlling for all other variables under consideration; primarily in the form of undirected pairwise Markov random fields. The first paper built upon evidence for an affective pathway from childhood trauma to psychosis and demonstrated that a similar pathway applied to exposure to recent stressful life events in at-risk and recent onset psychosis patients. Specifically, results showed that burden of recent life events did not link to positive and negative psychotic symptoms directly, but only indirectly, via symptoms of general psychopathology, such as depression, guilt, and anxiety. The second paper zoomed into the proposed affective pathway via increased stress reactivity through which childhood trauma is thought to contribute to the liability for psychopathology at large, including psychotic disorders. The findings provide a detailed characterization of putative psychological stress processes underlying distinct types of childhood trauma in the general population: childhood trauma reflecting deprivation (i.e., neglect) was exclusively associated with stressful experiences representing low perceived self-efficacy, whereas childhood trauma reflecting threat (i.e., abuse) was specifically associated with stressful experiences reflecting perceived helplessness. The third paper then addressed another important risk factor for psychotic disorders, cannabis use. The results suggest that characteristics of cannabis use in the general population may contribute differentially to the risk for certain psychotic experiences and affective symptoms: Network associations were particularly pronounced between increased frequency of cannabis use and certain delusional experiences, i.e., persecutory delusions and thought broadcasting, on the one hand, and earlier onset of cannabis use and visual hallucinatory experiences and irritability, on the other. The fourth paper investigated which environmental and demographic factors explained heterogeneity in symptom networks of psychosis to highlight potential etiological divergence in risk for psychosis in the general population. Results point to distinct sex-specific etiological mechanisms contributing to psychosis risk: In women, an affective pathway to psychosis may have distinct importance, especially after interpersonal trauma. In men, an ethnic minority background was associated with strong interconnections between individual psychotic experiences, which has been linked to poor outcomes in previous research. The fifth and final paper presented the protocol for an experience sampling study in the help-seeking population of the Early Recognition Center for Mental Disorders of the University Hospital Cologne. A central goal in this project will be to elucidate how personalized symptom networks derived from intensive longitudinal data differ as a function of environmental exposure. In sum, findings from this thesis illustrate that environmental risk factors increase psychosis risk through diverse, potentially sex-specific pathways that often involve affective psychopathology. This confirms the notion that the etiology of psychosis is complex and best understood from a broad, transdiagnostic perspective. The results presented are also relevant for clinical practice as they pave the way for a better selection of appropriate interventions and treatments. In particular, this thesis highlights affective disturbances and negative beliefs as potential intervention targets in the affective pathway to psychosis, especially following trauma and stressful life events. In perspective, the use of personalized network approaches may improve the ability to tailor therapeutic strategies based on the dynamics of a patient’s symptoms and environmental risk factors as captured in daily life. Recently proposed multilayered network approaches have potential to further advance our understanding of psychosis etiology by linking psychological and biological levels of analysis

    Text Mining for Systems Biology and MetNet

    Get PDF
    The rapidly expanding volume of biological and biomedical literature motivates demand for more friendly access. Better automated mining of this literature can help find useful and desired citations and can extract new knowledge from the massive biological literaturome. The research objectives presented here, when met, will provide comprehensive text mining utilities within the MetNet (Metabolic Network Exchange) (Wurtele et al., 2007), platform to help biologists visualize, explore, and analyze the biological literaturome. The overarching research question to be addressed is how to automatically extract biomolecular interactions from numerous biomedical texts. Here are the specific aims of this work. 1. Research on the text empirics of interaction-indicating terms to find more clues to improve the current algorithm applied in PathBinder to more precisely judge whether biomolecular interaction descriptions are present in sentences from the biological literature. 2. Based on these research results, extract interacting biomolecule pairs from literature and use those pairs to construct a biomolecule interaction database and network. 3. Integrate biomolecular interaction-indicating term extraction into MetNet\u27s existing metabolomic network database. 4. Apply all of the above results in PathBinder software. 5. Quantitatively evaluate the success of algorithms developed based on the text empirics results. This work is expected to advance systems biology by answering scientific questions about biological text empirics, by contributing to the engineering task of building MetNet and key constituent subsystems of MetNet, and by supporting the MetNet project through selected maintenance tasks

    Exploiting Latent Features of Text and Graphs

    Get PDF
    As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

    Dimensionality Assessment of the Measure of Mundane Meaning

    Get PDF
    Meaning in life is an important aspect of psychological wellbeing. The Measure of Mundane Meaning (MMM) measures the presence of meaning in life and is unique in its development among participants with experience of trauma. The MMM was hypothesised to comprise of four factors including sense of purpose, high-level action identification, integration of circumstance, and coherence of self-narrative and the aim of the current study is to conduct a dimensionality assessment of the MMM in a general population sample. The study utilised a novel psychometric technique, Exploratory Graph Analysis, to analyse the 36-item MMM. These 893 participants were a combination of clinical and non-clinical samples. Redundancy was assessed using Unique variable analysis (UVA) and the stability of the items was assessed. Random intercept EGA (riEGA), which is a modified version of EGA that can account for wording effects was utilised. In this process, twelve items were removed, and the remaining 24 items formed four dimensions. Using confirmatory factor analysis, this model was found to exhibit good fit, and a multidimensional model was favoured. The final sample consisted of four dimensions, which represented the four hypothesised dimensions. The relationship between the MMM and other measures of meaning, depression, and underlying assumptions was evaluated to assess convergent validity along with developing an understanding of the relationship between the MMM and other underlying constructs. This validation of the MMM provides a measure of meaning in life which corresponds to an underlying theoretical structure of the construct. This can enable more precise measurement of meaning in life, in both research and for clinical purposes.<br/

    Conceptual Ambiguity Surrounding Gamification and Serious Games in Health Care: Literature Review and Development of Game-Based Intervention Reporting Guidelines (GAMING)

    Get PDF
    Background: In health care, the use of game-based interventions to increase motivation, engagement, and overall sustainability of health behaviors is steadily becoming more common. The most prevalent types of game-based interventions in health care research are gamification and serious games. Various researchers have discussed substantial conceptual differences between these 2 concepts, supported by empirical studies showing differences in the effects on specific health behaviors. However, researchers also frequently report cases in which terms related to these 2 concepts are used ambiguously or even interchangeably. It remains unclear to what extent existing health care research explicitly distinguishes between gamification and serious games and whether it draws on existing conceptual considerations to do so. Objective: This study aims to address this lack of knowledge by capturing the current state of conceptualizations of gamification and serious games in health care research. Furthermore, we aim to provide tools for researchers to disambiguate the reporting of game-based interventions. Methods: We used a 2-step research approach. First, we conducted a systematic literature review of 206 studies, published in the Journal of Medical Internet Research and its sister journals, containing terms related to gamification, serious games, or both. We analyzed their conceptualizations of gamification and serious games, as well as the distinctions between the two concepts. Second, based on the literature review findings, we developed a set of guidelines for researchers reporting on game-based interventions and evaluated them with a group of 9 experts from the field. Results: Our results show that less than half of the concept mentions are accompanied by an explicit definition. To distinguish between the 2 concepts, we identified four common approaches: implicit distinction, synonymous use of terms, serious games as a type of gamified system, and distinction based on the full game dimension. Our Game-Based Intervention Reporting Guidelines (GAMING) consist of 25 items grouped into four topics: conceptual focus, contribution, mindfulness about related concepts, and individual concept definitions. Conclusions: Conceptualizations of gamification and serious games in health care literature are strongly heterogeneous, leading to conceptual ambiguity. Following the GAMING can support authors in rigorous reporting on study results of game-based interventions
    corecore