232 research outputs found

    Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

    Get PDF
    In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature

    Automated Name Selection for the Network Scale-Up Method

    Get PDF
    The distribution of the number of acquaintances among members of a society is a relevant feature of its social structure. Furthermore, the number of acquaintances (or "degree") is used for estimating other societal features, such as the size of hard-to-count subpopulations or social cohesion. To estimate the degree, the Network Scale-Up Method (NSUM) asks survey respondents about the number of people they know with a set of first names for which name statistics are available. For this method to be precise, a set of names needs to be selected for the survey that jointly represent the population on a smaller scale in terms of relevant traits such as gender or age. Finding the optimal set of names is a combinatorial problem for which this paper provides a solution approach. The approach can serve other NSUM users, and can be applied to any population for which name statistics distributed over different categories are available. We empirically show that our approach successfully provides subsets of names replicating the population distribution for six countries with very different name statistics

    Low-temperature zircon growth related to hydrothermal alteration of siderite concretions in Mississippian shales, Scotland

    Get PDF
    Zircon occurs in voids and cracks in phosphatic coprolites enclosed in siderite concretions in Mississippian shales near Edinburgh, Scotland. The zircon formed during hydrothermal alteration of early-diagenetic concretions and occurs as spherical aggregates of prismatic crystals, sometimes radiating. Vitrinite reflectance measurements indicate temperatures of ~270°C for the zircon-bearing concretions and the host shales. Molecular parameter values based on dibenzothiophene and phenanthrene distribution and occurrence of di- and tetra-hydro-products of polycyclic aromatic compounds suggest that the rocks experienced relatively high-temperature aqueous conditions related to hydrothermal fluids, perhaps associated with neighboring mafic intrusions. The zircon was dissolved from the concretions, transported in fluids, and reprecipitated in voids. This is the first record of the precipitation of authigenic zircon in sedimentary rock as a new phase, not as outgrowths

    Mitochondrial dysfunction and oxidative stress in patients with chronic kidney disease.

    Get PDF
    Mitochondria abnormalities in skeletal muscle may contribute to frailty and sarcopenia, commonly present in patients with chronic kidney disease (CKD). Dysfunctional mitochondria are also a major source of oxidative stress and may contribute to cardiovascular disease in CKD We tested the hypothesis that mitochondrial structure and function worsens with the severity of CKD Mitochondrial volume density, mitochondrial DNA (mtDNA) copy number, BNIP3, and PGC1α protein expression were evaluated in skeletal muscle biopsies obtained from 27 subjects (17 controls and 10 with CKD stage 5 on hemodialysis). We also measured mtDNA copy number in peripheral blood mononuclear cells (PBMCs), plasma isofurans, and plasma F2-isoprostanes in 208 subjects divided into three groups: non-CKD (eGFR>60 mL/min), CKD stage 3-4 (eGFR 60-15 mL/min), and CKD stage 5 (on hemodialysis). Muscle biopsies from patients with CKD stage 5 revealed lower mitochondrial volume density, lower mtDNA copy number, and higher BNIP3 content than controls. mtDNA copy number in PBMCs was decreased with increasing severity of CKD: non-CKD (6.48, 95% CI 4.49-8.46), CKD stage 3-4 (3.30, 95% CI 0.85-5.75, P = 0.048 vs. non-CKD), and CKD stage 5 (1.93, 95% CI 0.27-3.59, P = 0.001 vs. non-CKD). Isofurans were higher in patients with CKD stage 5 (median 59.21 pg/mL, IQR 41.76-95.36) compared to patients with non-CKD (median 49.95 pg/mL, IQR 27.88-83.46, P = 0.001), whereas F2-isoprostanes did not differ among groups. Severity of CKD is associated with mitochondrial dysfunction and markers of oxidative stress. Mitochondrial abnormalities, which are common in skeletal muscle from patients with CKD stage 5, may explain the muscle dysfunction associated with frailty and sarcopenia in CKD Further studies are required to evaluate mitochondrial function in vivo in patients with different CKD stages

    Word Embeddings for Entity-annotated Texts

    Full text link
    Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information Retrieva

    Localizing the Common Action Among a Few Videos

    Get PDF
    This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containing the same action, without knowing their common class label. To address this task, we introduce a new 3D convolutional network architecture able to align representations from the support videos with the relevant query video segments. The network contains: (\textit{i}) a mutual enhancement module to simultaneously complement the representation of the few trimmed support videos and the untrimmed query video; (\textit{ii}) a progressive alignment module that iteratively fuses the support videos into the query branch; and (\textit{iii}) a pairwise matching module to weigh the importance of different support videos. Evaluation of few-shot common action localization in untrimmed videos containing a single or multiple action instances demonstrates the effectiveness and general applicability of our proposal.Comment: ECCV 202

    Satellite-based trends of solar radiation and cloud parameters in Europe

    Get PDF
    Solar radiation is the main driver of the Earth\u2019s climate. Measuring solar radiation and analysing its interaction with clouds are essential for the understanding of the climate system. The EUMETSAT Satellite Application Facility on Climate Monitoring (CM SAF) generates satellite-based, high-quality climate data records, with a focus on the energy balance and water cycle. Here, multiple of these data records are analyzed in a common framework to assess the consistency in trends and spatio-temporal variability of surface solar radiation, top-of-atmosphere reflected solar radiation and cloud fraction. This multi-parameter analysis focuses on Europe and covers the time period from 1992 to 2015. A high correlation between these three variables has been found over Europe. An overall consistency of the climate data records reveals an increase of surface solar radiation and a decrease in top-of-atmosphere reflected radiation. In addition, those trends are confirmed by negative trends in cloud cover. This consistency documents the high quality and stability of the CM SAF climate data records, which are mostly derived independently from each other. The results of this study indicate that one of the main reasons for the positive trend in surface solar radiation since the 1990\u2019s is a decrease in cloud coverage even if an aerosol contribution cannot be completely ruled out

    Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes

    Full text link
    ICD coding from electronic clinical records is a manual, time-consuming and expensive process. Code assignment is, however, an important task for billing purposes and database organization. While many works have studied the problem of automated ICD coding from free text using machine learning techniques, most use records in the English language, especially from the MIMIC-III public dataset. This work presents results for a dataset with Brazilian Portuguese clinical notes. We develop and optimize a Logistic Regression model, a Convolutional Neural Network (CNN), a Gated Recurrent Unit Neural Network and a CNN with Attention (CNN-Att) for prediction of diagnosis ICD codes. We also report our results for the MIMIC-III dataset, which outperform previous work among models of the same families, as well as the state of the art. Compared to MIMIC-III, the Brazilian Portuguese dataset contains far fewer words per document, when only discharge summaries are used. We experiment concatenating additional documents available in this dataset, achieving a great boost in performance. The CNN-Att model achieves the best results on both datasets, with micro-averaged F1 score of 0.537 on MIMIC-III and 0.485 on our dataset with additional documents.Comment: Accepted at BRACIS 202

    COVID-19 therapy target discovery with context-aware literature mining

    Full text link
    The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert. Development of systems, capable of automatically processing tens of thousands of scientific publications with the aim to enrich existing empirical evidence with literature-based associations is challenging and relevant. We propose a system for contextualization of empirical expression data by approximating relations between entities, for which representations were learned from one of the largest COVID-19-related literature corpora. In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique that leverages SciBERT language model pretrained on a large multi-domain corpus of scientific publications and fine-tuned for domain adaptation on the CORD-19 dataset. The conducted manual evaluation by the medical expert and the quantitative evaluation based on therapy targets identified in the related work suggest that the proposed method can be successfully employed for COVID-19 therapy target discovery and that it outperforms the baseline FastText method by a large margin.Comment: Accepted to the 23rd International Conference on Discovery Science (DS 2020
    corecore