Search CORE

15,978 research outputs found

Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

Author: Buntine Wray
Lim Kar Wai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2016
Field of study

Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin

arXiv.org e-Print Archive

The Australian National University

Predicting Successful Memes using Network and Community Structure

Author: Ahn Yong-Yeol
Menczer Filippo
Weng Lilian
Publication venue
Publication date: 16/05/2014
Field of study

We investigate the predictability of successful memes using their early spreading patterns in the underlying social networks. We propose and analyze a comprehensive set of features and develop an accurate model to predict future popularity of a meme given its early spreading patterns. Our paper provides the first comprehensive comparison of existing predictive frameworks. We categorize our features into three groups: influence of early adopters, community concentration, and characteristics of adoption time series. We find that features based on community structure are the most powerful predictors of future success. We also find that early popularity of a meme is not a good predictor of its future popularity, contrary to common belief. Our methods outperform other approaches, particularly in the task of detecting very popular or unpopular memes.Comment: 10 pages, 6 figures, 2 tables. Proceedings of 8th AAAI Intl. Conf. on Weblogs and social media (ICWSM 2014

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Quantifying single nucleotide variant detection sensitivity in exome sequencing

Author: A McKenna
AJ Coffey
Alison M Meynert
AM Sulonen
Andrew P Jackson
B Lehne
B Timmermann
DN Cooper
E Kalay
H Li
H Li
J Parla
JF Degner
JK Teer
K Fransen
KK Mantripragada
Louise S Bicknell
M Choi
MA Depristo
Martin S Taylor
Matthew E Hurles
MD Mailman
MJ Clark
MN Bainbridge
MW Hahn
R Leinonen
RA Harte
RE Thurman
SB Ng
SB Ng
SB Ng
SS Ajay
The International HapMap 3 Consortium
Y Li
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give “power estimates” for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5–15% of heterozygous and 1–4% of homozygous SNVs in the targeted regions will be missed. CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the “missing heritability” of quantitative traits

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer