Search CORE

3,695 research outputs found

Mining Top-K Frequent Itemsets Through Progressive Sampling

Author: Andrea Pietracaprina
E Cohen
Eli Upfal
Fabio Vandin
J Wang
M Charikar
M Mitzenmacher
Matteo Riondato
RC-W Wong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/06/2010
Field of study

We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets' frequencies with a bounded error. Our first result is an upper bound on the sample size which guarantees that the top-K frequent itemsets mined from a random sample of that size approximate the actual top-K frequent itemsets, with probability larger than a specified value. We show that the upper bound is asymptotically tight when w is constant. Our main algorithmic contribution is a progressive sampling approach, combined with suitable stopping conditions, which on appropriate inputs is able to extract approximate top-K frequent itemsets from samples whose sizes are smaller than the general upper bound. In order to test the stopping conditions, this approach maintains the frequency of all itemsets encountered, which is practical only for small w. However, we show how this problem can be mitigated by using a variation of Bloom filters. A number of experiments conducted on both synthetic and real bench- mark datasets show that using samples substantially smaller than the original dataset (i.e., of size defined by the upper bound or reached through the progressive sampling approach) enable to approximate the actual top-K frequent itemsets with accuracy much higher than what analytically proved.Comment: 16 pages, 2 figures, accepted for presentation at ECML PKDD 2010 and publication in the ECML PKDD 2010 special issue of the Data Mining and Knowledge Discovery journa

arXiv.org e-Print Archive

Crossref

Acceleration of generalized hypergeometric functions through precise remainder asymptotics

Author: A Sidi
A Sidi
AI Bogolubsky
C Brezinski
C Brezinski
C Brezinski
C Brezinski
C Brezinski
C Brezisnki
C Ferreira
C Schneider
CR Adams
D Levin
DA Smith
EJ Weniger
EJ Weniger
EJ Weniger
EJ Weniger
EJ Weniger
EJ Weniger
G Walz
GD Birkhoff
GD Birkhoff
GD Birkhoff
HHH Homeier
I Tweddle
IG Macdonald
J Wimp
J Wimp
Joshua L. Willis
JP Boyd
JP Delhaye
K Knopp
KE Muller
L Fousse
LJ Slater
MY Kalmykov
MY Kalmykov
NE Nørlund
NJ Higham
P Wozny
P Wozny
R Borghi
R Wong
R Wong
RC Forrey
RL Graham
S Charterjee
S Lewanowicz
S Paszkowski
SL Skorokhodov
SL Skorokhodov
SL Skorokhodov
T Fessler
T Høavie
W Becken
W Bühring
W Bühring
W Bühring
W Bühring
W Gautschi
WF Perger
WH Press
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2011
Field of study

We express the asymptotics of the remainders of the partial sums {s_n} of the generalized hypergeometric function q+1_F_q through an inverse power series z^n n^l \sum_k c_k/n^k, where the exponent l and the asymptotic coefficients {c_k} may be recursively computed to any desired order from the hypergeometric parameters and argument. From this we derive a new series acceleration technique that can be applied to any such function, even with complex parameters and at the branch point z=1. For moderate parameters (up to approximately ten) a C implementation at fixed precision is very effective at computing these functions; for larger parameters an implementation in higher than machine precision would be needed. Even for larger parameters, however, our C implementation is able to correctly determine whether or not it has converged; and when it converges, its estimate of its error is accurate.Comment: 36 pages, 6 figures, LaTeX2e. Fixed sign error in Eq. (2.28), added several references, added comparison to other methods, and added discussion of recursion stabilit

arXiv.org e-Print Archive

Crossref

Plasma high sensitivity troponin T levels in adult survivors of childhood leukaemias: determinants and associations with cardiac function

Author: Chan GCF
Cheng FW
Cheuk KLD
Cheung YF
Ho KK
Li CK
Li RC
Li VW
Ling AS
Tsang KC
Wong WK
Yang JY
Yau JP
Yu W
Yuen HL
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

published_or_final_versio

HKU Scholars Hub

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

The PHF21B gene is associated with major depression and modulates the stress response

Author: A Alt
A Bhattacharya
A Chuah
A Djordjevic
A Klajn
A McQuillin
A Ramasamy
AD Lopez
B T Baune
C Dong
C Mastronardi
C Song
C Yu
CL Hyde
CM Durand
CN Johnstone
CONVERGE Consortium
DJ Liu
DJ Liu
DM Altshuler
DM Kokare
EM Jolin
EV Davydov
F Angelucci
F Lan
FB Bertonha
G A Huttley
G Bhatia
G Hedou
GR Abecasis
H Li
H Li
H Li
IA Adzhubei
IC Weiss
J Cohen
J Ellegood
J Gauthier
J I Vélez
J Licinio
J Vandesompele
JI Vélez
JK Pritchard
JM Schwarz
K Phelan
K Wang
KS Kendler
KS Kendler
M Arcos-Burgos
M C Jawahar
M D Lewis
M-L Wong
MA Hakimi
ML Wong
ML Wong
ML Wong
ML Wong
ML Wong
MW Pfaffl
MW Pfaffl
NA Johnson
NS Fearnhead
O Zuk
PC Ng
PF Sullivan
PF Sullivan
R Fogarty
R Jansen
RC Kessler
RC Kessler
RC Lewontin
RE Gur
RJ Shprintzen
S Caplan
S Cohen-Woods
S Cortijo
S Liu
S R Bornstein
S Ripke
SB Ng
SX Tang
T Miladinovic
TF Yuan
U Dannlowski
V Arolt
V Segura
W Bodmer
W Korenblum
Y Benjamini
Z Zhong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Major depressive disorder (MDD) affects around 350 million people worldwide; however, the underlying genetic basis remains largely unknown. In this study, we took into account that MDD is a gene-environment disorder, in which stress is a critical component, and used whole-genome screening of functional variants to investigate the 'missing heritability' in MDD. Genome-wide association studies (GWAS) using single- and multi-locus linear mixed-effect models were performed in a Los Angeles Mexican-American cohort (196 controls, 203 MDD) and in a replication European-ancestry cohort (499 controls, 473 MDD). Our analyses took into consideration the stress levels in the control populations. The Mexican-American controls, comprised primarily of recent immigrants, had high levels of stress due to acculturation issues and the European-ancestry controls with high stress levels were given higher weights in our analysis. We identified 44 common and rare functional variants associated with mild to moderate MDD in the Mexican-American cohort (genome-wide false discovery rate, FDR, <0.05), and their pathway analysis revealed that the three top overrepresented Gene Ontology (GO) processes were innate immune response, glutamate receptor signaling and detection of chemical stimulus in smell sensory perception. Rare variant analysis replicated the association of the PHF21B gene in the ethnically unrelated European-ancestry cohort. The TRPM2 gene, previously implicated in mood disorders, may also be considered replicated by our analyses. Whole-genome sequencing analyses of a subset of the cohorts revealed that European-ancestry individuals have a significantly reduced (50%) number of single nucleotide variants compared with Mexican-American individuals, and for this reason the role of rare variants may vary across populations. PHF21b variants contribute significantly to differences in the levels of expression of this gene in several brain areas, including the hippocampus. Furthermore, using an animal model of stress, we found that Phf21b hippocampal gene expression is significantly decreased in animals resilient to chronic restraint stress when compared with non-chronically stressed animals. Together, our results reveal that including stress level data enables the identification of novel rare functional variants associated with MDD.M-L Wong, M Arcos-Burgos, S Liu, J I Vélez, C Yu, B T Baune, M C Jawahar, V Arolt, U Dannlowski, A Chuah, G A Huttley, R Fogarty, M D Lewis, S R Bornstein, and J Licini

Crossref

Adelaide Research & Scholarship

edocUR

University of Melbourne Institutional Repository

Prevalence of Cataract Surgery and Visual Outcomes in Indian Immigrants in Singapore: The Singapore Indian Eye Study

Author: AE Baranano
AT Broman
B Liu
CE Tan
Ching-Yu Cheng
D Yorston
DH Khoo
Ecosse L. Lamoureux
GV Murthy
J Lau
J Zhao
K Vaidyanathan
L Dandona
L Vijaya
NG Congdon
P Desai
Pedro Gonzalez
PJ Foster
PJ Foster
PK Nirmalan
PP Chiang
Preeti Gupta
Q Yin
R Klein
R Lavanya
R Lavanya
RC Khanna
S Resnikoff
Tay Wan Ting
Tien-Yin Wong
TY Wong
TY Wong
V Nangia
VSE Jeganathan
W Huang
YF Zheng
Yingfeng Zheng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/10/2013
Field of study

10.1371/journal.pone.0075584PLoS ONE810-POLN

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

ScholarBank@NUS

FigShare

Imaging Electronic Correlations in Twisted Bilayer Graphene near the Magic Angle

Author: A Artaud
A Luican
A Thomson
AA Zibrov
AC Neto
AL Efros
Alex Thomson
BE Feldman
BH Moon
D Wong
E Suárez Morell
F Guinea
Felix von Oppen
G Li
G Trambly de Laissardière
G Trambly de Laissardière
Gil Refael
H Yoo
Harpreet Arora
HC Po
HC Po
Hechen Ren
I Brihuega
J Kang
J Kang
Jason Alicea
Jeannette Kemmer
JMB Lopes dos Santos
JMB Lopes dos Santos
JP Eisenstein
K Hejazi
K Kim
K Kim
Kenji Watanabe
L Huder
L Zou
L-J Yin
M Koshino
M Yankowitz
MP Lilly
NNT Nam
OE Dial
R Bistritzer
RC Ashoori
Robert Polski
RW Havener
S Fang
S Huang
S Jung
Stevan Nadj-Perge
T Ohta
Takashi Taniguchi
W Yan
Y Cao
Y Cao
Y Cao
Y-H Song
Yang Peng
Yiran Zhang
YJ Song
Youngjoon Choi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/01/2019
Field of study

Twisted bilayer graphene with a twist angle of around 1.1{\deg} features a pair of isolated flat electronic bands and forms a strongly correlated electronic platform. Here, we use scanning tunneling microscopy to probe local properties of highly tunable twisted bilayer graphene devices and show that the flat bands strongly deform when aligned with the Fermi level. At half filling of the bands, we observe the development of gaps originating from correlated insulating states. Near charge neutrality, we find a previously unidentified correlated regime featuring a substantially enhanced flat band splitting that we describe within a microscopic model predicting a strong tendency towards nematic ordering. Our results provide insights into symmetry breaking correlation effects and highlight the importance of electronic interactions for all filling factors in twisted bilayer graphene.Comment: Main text 9 pages, 4 figures; Supplementary Information 25 page

arXiv.org e-Print Archive

Crossref

Caltech Authors

Admixture Mapping Scans Identify a Locus Affecting Retinal Vascular Caliber in Hypertensive African Americans: the Atherosclerosis Risk in Communities (ARIC) Study

Retinal vascular caliber provides information about the structure and health of the microvascular system and is associated with cardiovascular and cerebrovascular diseases. Compared to European Americans, African Americans tend to have wider retinal arteriolar and venular caliber, even after controlling for cardiovascular risk factors. This has suggested the hypothesis that differences in genetic background may contribute to racial/ethnic differences in retinal vascular caliber. Using 1,365 ancestry-informative SNPs, we estimated the percentage of African ancestry (PAA) and conducted genome-wide admixture mapping scans in 1,737 African Americans from the Atherosclerosis Risk in Communities (ARIC) study. Central retinal artery equivalent (CRAE) and central retinal vein equivalent (CRVE) representing summary measures of retinal arteriolar and venular caliber, respectively, were measured from retinal photographs. PAA was significantly correlated with CRVE (ρ = 0.071, P = 0.003), but not CRAE (ρ = 0.032, P = 0.182). Using admixture mapping, we did not detect significant admixture association with either CRAE (genome-wide score = −0.73) or CRVE (genome-wide score = −0.69). An a priori subgroup analysis among hypertensive individuals detected a genome-wide significant association of CRVE with greater African ancestry at chromosome 6p21.1 (genome-wide score = 2.31, locus-specific LOD = 5.47). Each additional copy of an African ancestral allele at the 6p21.1 peak was associated with an average increase in CRVE of 6.14 µm in the hypertensives, but had no significant effects in the non-hypertensives (P for heterogeneity <0.001). Further mapping in the 6p21.1 region may uncover novel genetic variants affecting retinal vascular caliber and further insights into the interaction between genetic effects of the microvascular system and hypertension

Public Library of Science (PLOS)

CiteSeerX

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

ScholarBank@NUS

Genome-wide signatures of convergent evolution in echolocating mammals

Author: A Schneider
A Stamatakis
A Stamatakis
A Terrinoni
AF Ryan
AG Clark
AJ Drummond
AM Hancock
BP Lewis
EB Kim
EC Teeling
EC Teeling
Elia Stupka
G Jones
G Jones
G Li
G Parra
G Parra
G Zhang
Georgia Tsagkogeorga
HB Zhao
HB Zhao
J Castresana
James A. Cotton
JF Hughes
JI Fasick
Joe Parker
JP Bielawski
JZ Zhang
K Katoh
K Kriener
K Lindblad-Toh
KG Becker
KTJ Davies
M Kanehisa
M Soskine
M Vater
N Lartillot
OR Bininda-Emonds
Paolo Provero
PR Grant
R Li
R She
RC Edgar
RR Hoy
Stephen J. Rossiter
T Junier
TA Castoe
TSK Prasad
V Ranwez
W Huang da
WJ Murphy
WM Fitch
WS Wong
WWL Au
X Zhou
Y Benjamini
Y Liu
Y Liu
Y Liu
Y-B Sun
Y-Y Shen
Yuan Liu
Z Yang
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/09/2013
Field of study

Evolution is typically thought to proceed through divergence of genes, proteins, and ultimately phenotypes(1-3). However, similar traits might also evolve convergently in unrelated taxa due to similar selection pressures(4,5). Adaptive phenotypic convergence is widespread in nature, and recent results from a handful of genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level(6-9). Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution(9,10) although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show for the first time that convergence is not a rare process restricted to a handful of loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four new bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Surprisingly we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognised

Crossref

Southampton (e-Prints Soton)

PubMed Central

Enlighten

Queen Mary Research Online

Codominant scoring of AFLP in association panels

Author: A Wong
AP Dempster
AX Deniau
B Garel
BG Lindsay
C Fraley
D Böhning
Fred A. van Eeuwijk
G Gort
G Gort
G McLachlan
Gerrit Gort
GJ McLachlan
HJ Eck van
HM Meudt
HP Piepho
HP Piepho
J Bezdek
J Chen
J Chen
J Cuesta-Albertos
JW Heath
Keygene Products BV
M Pérez-Enciso
M Vuylsteke
P Castiglioni
P Li
P McCullagh
P Vos
R Berloo van
R Berloo van
R Ihaka
RC Jansen
RC Jansen
RC Jansen
SM Reamon-Büttner
W Jank
Y Lo
Z Liu
ZD Feng
Publication venue: Springer-Verlag
Publication date: 01/01/2010
Field of study

A study on the codominant scoring of AFLP markers in association panels without prior knowledge on genotype probabilities is described. Bands are scored codominantly by fitting normal mixture models to band intensities, illustrating and optimizing existing methodology, which employs the EM-algorithm. We study features that improve the performance of the algorithm, and the unmixing in general, like parameter initialization, restrictions on parameters, data transformation, and outlier removal. Parameter restrictions include equal component variances, equal or nearly equal distances between component means, and mixing probabilities according to Hardy–Weinberg Equilibrium. Histogram visualization of band intensities with superimposed normal densities, and optional classification scores and other grouping information, assists further in the codominant scoring. We find empirical evidence favoring the square root transformation of the band intensity, as was found in segregating populations. Our approach provides posterior genotype probabilities for marker loci. These probabilities can form the basis for association mapping and are more useful than the standard scoring categories A, H, B, C, D. They can also be used to calculate predictors for additive and dominance effects. Diagnostics for data quality of AFLP markers are described: preference for three-component mixture model, good separation between component means, and lack of singletons for the component with highest mean. Software has been developed in R, containing the models for normal mixtures with facilitating features, and visualizations. The methods are applied to an association panel in tomato, comprising 1,175 polymorphic markers on 94 tomato hybrids, as part of a larger study within the Dutch Centre for BioSystems Genomics

Crossref

Springer - Publisher Connector

PubMed Central

Wageningen University & Research Publications