10 research outputs found

    A Method for the Automated, Reliable Retrieval of Publication-Citation Records

    Get PDF
    BACKGROUND: Publication records and citation indices often are used to evaluate academic performance. For this reason, obtaining or computing them accurately is important. This can be difficult, largely due to a lack of complete knowledge of an individual's publication list and/or lack of time available to manually obtain or construct the publication-citation record. While online publication search engines have somewhat addressed these problems, using raw search results can yield inaccurate estimates of publication-citation records and citation indices. METHODOLOGY: In this paper, we present a new, automated method that produces estimates of an individual's publication-citation record from an individual's name and a set of domain-specific vocabulary that may occur in the individual's publication titles. Because this vocabulary can be harvested directly from a research web page or online (partial) publication list, our method delivers an easy way to obtain estimates of a publication-citation record and the relevant citation indices. Our method works by applying a series of stringent name and content filters to the raw publication search results returned by an online publication search engine. In this paper, our method is run using Google Scholar, but the underlying filters can be easily applied to any existing publication search engine. When compared against a manually constructed data set of individuals and their publication-citation records, our method provides significant improvements over raw search results. The estimated publication-citation records returned by our method have an average sensitivity of 98% and specificity of 72% (in contrast to raw search result specificity of less than 10%). When citation indices are computed using these records, the estimated indices are within of the true value 10%, compared to raw search results which have overestimates of, on average, 75%. CONCLUSIONS: These results confirm that our method provides significantly improved estimates over raw search results, and these can either be used directly for large-scale (departmental or university) analysis or further refined manually to quickly give accurate publication-citation records

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Get PDF
    Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

    Relating the expression-based and sequence-based estimates of regulation in the gap gene system of Drosophila melanogaster

    No full text
    Quantitative analysis of Drosophila melanogaster gap gene expression data reveals valuable information about the nature and strengths of interactions in the gap gene network. We first explore different models for fitting the spatiotemporal gene expression data of Drosophila gap gene system and validate our results by computational analysis and comparison with the existing literature. A fundamental problem in systems biology is to associate these results with the inherent cause of gene regulation, namely the binding of the transcription factors (TF) to their respective binding sites. In order to relate these expression-based estimates of gap gene regulation with the sequence-based information of TF binding site composition, we also explore two related problems of (i) finding a set of regulatory weights that is proportional to the binding site occupancy matrix of the transcription factors in current literature and (ii) finding a set of position weight matrices of the TFs that produce a new binding site occupancy matrix showing a greater level of proportionality with our regulatory weights. Our solution to the first problem yielded a regulatory weight matrix incapable of explaining the true causes of gene expression profile despite its relative numerical accuracy in predicting the gene expressions. On the other hand, the second optimization problem could be solved up to a reasonable level of accuracy, but further analysis on the result demonstrated that this optimization problem may be under-constrained. We devise a simple regularization strategy that helps us to reduce the under-constrained nature of the problem

    Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors

    No full text
    In this paper, we extend existing work on latent attribute inference by leveraging the principle of homophily: we evaluate the inference accuracy gained by augmenting the user features with features derived from the Twitter profiles and postings of her friends. We consider three attributes which have varying degrees of assortativity: gender, age, and political affiliation. Our approach yields a significant and robust increase in accuracy for both age and political affiliation, indicating that our approach boosts performance for attributes with moderate to high assortativity. Furthermore, different neighborhood subsets yielded optimal performance for different attributes, suggesting that different subsamples of the user's neighborhood characterize different aspects of the user herself. Finally, inferences using only the features of a user's neighbors outperformed those based on the user's features alone. This suggests that the neighborhood context alone carries substantial information about the user
    corecore