Search CORE

19 research outputs found

Interval Selection in the Streaming Model

Author: AW Kolen
AZ Broder
BV Halldórsson
DS Hochbaum
E Kushilevitz
J Feigenbaum
M Datar
P Indyk
TS Jayram
Y Emek
Publication venue
Publication date: 04/02/2015
Field of study

A set of intervals is independent when the intervals are pairwise disjoint. In the interval selection problem we are given a set

\mathbb{I}

of intervals and we want to find an independent subset of intervals of largest cardinality. Let

\alpha(\mathbb{I})

denote the cardinality of an optimal solution. We discuss the estimation of

\alpha(\mathbb{I})

in the streaming model, where we only have one-time, sequential access to the input intervals, the endpoints of the intervals lie in

\{1,...,n \}

, and the amount of the memory is constrained. For intervals of different sizes, we provide an algorithm in the data stream model that computes an estimate

\hat\alpha

\alpha(\mathbb{I})

that, with probability at least

2/3

, satisfies

\tfrac 12(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I})

. For same-length intervals, we provide another algorithm in the data stream model that computes an estimate

\hat\alpha

\alpha(\mathbb{I})

that, with probability at least

2/3

, satisfies

\tfrac 23(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I})

. The space used by our algorithms is bounded by a polynomial in

\varepsilon^{-1}

and

\log n

. We also show that no better estimations can be achieved using

o(n)

bits of storage. We also develop new, approximate solutions to the interval selection problem, where we want to report a feasible solution, that use

O(\alpha(\mathbb{I}))

space. Our algorithms for the interval selection problem match the optimal results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval Selection, ICALP 2012], but are much simpler.Comment: Minor correction

arXiv.org e-Print Archive

Crossref

Abstract Background Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. Results We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. Conclusions Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

A model-based approach to selection of tag SNPs

Author: A Barron
A Thomas
AP Dempster
B Halldórsson
BV Halldórsson
CE Shannon
CS Carlson
CS Carlson
D Botstein
DC Crawford
DC Crawford
EC Anderson
Fengzhu Sun
G Schwarz
GA McVean
H Akaike
H Mannila
J Besag
JD Wall
JD Wall
JFC Kingman
JN Hirschhorn
K Zhang
K Zhang
K Zhang
L Breiman
L Excoffier
L Li
LE Baum
Lei M Li
LR Rabiner
M Koivisto
M Nothnagel
M Stephens
MJ Daly
N Li
N Patil
Pierre Nicolas
S Lin
SB Gabriel
SE Ptak
T Niu
TG Schulze
The International HapMap Consortium
TM Cover
W Zhai
X Ke
X Sun
Z Liu
Z Meng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

Hal-Diderot

A Class Representative Model for Pure Parsimony Haplotyping under Uncertain Data

Author: B Dahlbäck
BV Halldórsson
D Altshuler
D Brown
D Catanzaro
D Catanzaro
D Catanzaro
D Gusfield
D Gusfield
Daniele Catanzaro
G Lancia
G Lancia
GI Bell
H Stefansson
J Marchini
JD Rioux
JP Hugot
K Ozaki
L Nisticó
L Wang
LA Pennacchio
Luciano Porretta
Martine Labbé
P Van Eerdewegh
RR Hudson
S Gretarsdottir
SJJ Dorman
SS Deeb
Thomas Mailund
TIH Consortium
TIH Consortium
VJ Clark
WH Li
WJ Strittmatter
X Lu
XS Zhang
Y Ogura
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al. [1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

DI-fusion

DIAL UCLouvain

Unexpected large eruptions from buoyant magma bodies within viscoelastic crust

Author: A Burgisser
AE Lange
AV Newman
B Brandsdóttir
BC Haimson
BM Kilbride
BV Óskarsson
D Dzurisin
DA Neave
DA Neave
DD Pollard
E Bali
E Rivalta
E Rivalta
EH Hauri
ER Heimisson
F Sigmundsson
F Sigmundsson
F Sigmundsson
G Currenti
G Larsen
G Seropian
J Hadamard
J Maclennan
J Pearse
JP Ake
JR Lister
K Cashman
KR Anderson
L Caricchi
L Karlstrom
L Karlstrom
LV Danyushevsky
M Bonafede
M Edmonds
M Hartley
ME Hartley
ME Hartley
MG Bato
MM Parks
MT Gudmundsson
MT Gudmundsson
NI Christensen
P Einarsson
P Einarsson
P McLeod
P Segall
PB Kelemen
PM Benson
PM Bruce
R Grapenthin
RA Lange
RF Weinberg
RL Carlson
S Cesa
S Jónsson
SA Halldorsson
SA Halldórsson
SM Roper
T Reverso
T Thordarson
T Yamasaki
TS Hudson
V Pinel
V Pinel
V Pinel
W Degruyter
Y Fialko
ÓG Flóvenz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Large volume effusive eruptions with relatively minor observed precursory signals are at odds with widely used models to interpret volcano deformation. Here we propose a new modelling framework that resolves this discrepancy by accounting for magma buoyancy, viscoelastic crustal properties, and sustained magma channels. At low magma accumulation rates, the stability of deep magma bodies is governed by the magma-host rock density contrast and the magma body thickness. During eruptions, inelastic processes including magma mush erosion and thermal effects, can form a sustained channel that supports magma flow, driven by the pressure difference between the magma body and surface vents. At failure onset, it may be difficult to forecast the final eruption volume; pressure in a magma body may drop well below the lithostatic load, create under-pressure and initiate a caldera collapse, despite only modest precursors

Crossref

Opin visindi

Caltech Authors

Horizon / Pleins textes

White Rose Research Online

Genetic variants linked to education predict longevity

Author: Abdellaoui A
Adams MJ
Ahluwalia TS
Alizadeh BZ
Amador C
Amin N
Attia JR
Bacelis J
Bakshi A
Baumbach C
Baumeister SE
Beauchamp JP
Benjamin DJ
Bennett DA
Berger K
Bertram L
Biino G
Bisgaard H
Bjornsdottir G
Boomsma DI
Borecki IB
Boyle PA
Brandsma JH
Bultmann U
Bønnelykke K
Campbell A
Campbell H
Cappuccio FP
Cesarini D
Chabris CF
Chen GB
Concas MP
Conley DC
Cucca F
Cusi D
Davies G
de Jager PL
de Leeuw C
De Neve JE
de Vlaming R
Deary IJ
Dedoussis GV
Deloukas P
Demuth I
Derringer J
Ding J
Eibich P
Eisele L
Eklund N
Emilsson V
Eriksson JG
Esko T
Evans DM
Faul JD
Feitosa MF
Fischer K
Fontana MA
Forstner AJ
Franke B
Franke L
Furlotte NA
Gale CR
Galesloot TE
Gandin I
Gasparini P
Gejman PV
Gieger C
Girotto G
Grabe HJ
Gratten J
Groenen PJ
Gudnason V
Gunnarsson B
Gupta R
Hagenaars SP
Hall LM
Halldórsson BV
Harris SE
Harris TB
Hayward C
Heath AC
Hill WD
Hinds DA
Hocking LJ
Hofer E
Hoffmann W
Hofman A
Holliday EG
Homuth G
Horan MA
Horikoshi M
Hottenga JJ
Huffman JE
Hypponen E
Iacono WG
Jacobsson B
Johannesson M
Joshi PK
Jugessur A
Järvelin MR
Jöckel KH
Kaakinen MA
Kaasik K
Kalafati IP
Kanoni S
Kaprio J
Kardia SL
Karlsson R
Keltigangas-Järvinen L
Kiemeney LA
Koellinger PD
Kolcic I
Kong A
Koskinen S
Kraja AT
Kroh M
Krueger RF
Kutalik Z
Kähönen M
Lahti J
Laibson DI
Latvala A
Launer LJ
Lebreton MP
Lee JJ
Lehrer SF
Lehtimäki T
Levinson DF
Lichtenstein P
Lichtner P
Liewald DC
Lind PA
Lindgren KO
Liu T
Loukola A
Läll K
Madden PA
Magnusson PK
Mangino M
Marioni RE
Marques-Vidal P
Marten J
Martin NG
McGue M
McMahon G
Meddens GA
Meddens SF
Medland SE
Meisinger C
Meitinger T
Metspalu A
Meyer MN
Mihailov E
Milaneschi Y
Milani L
Miller MB
Montgomery GW
Myhre R
Mägi R
Mäki-Opas T
Nagy R
Nelson CP
Nyholt DR
Okbay A
Oldmeadow C
Ollier WE
Oskarsson S
Palotie A
Paternoster L
Payton A
Pedersen NL
Pendleton N
Penninx BW
Perola M
Pers TH
Pervjakova N
Petrovic KE
Peyrot WJ
Pickrell JK
Pirastu M
Pirastu N
Polasek O
Poot RA
Porteous DJ
Posthuma D
Power C
Province MA
Qian Y
Quaye L
Raitakari O
Rietveld CA
Ring SM
Ritchie SJ
Robino A
Rostapshova O
Rudan I
Rueedi R
Rustichini A
Räikkönen K
Salomaa V
Salvi E
Samani NJ
Sanders AR
Sarin AP
Schlessinger D
Schmidt B
Schmidt H
Schmidt R
Schraut KE
Scott RJ
Shi J
Smith AV
Smith BH
Smith GD
Smith JA
Spector TD
St Pourcain B
Staessen JA
Stefansson K
Steinhagen-Thiessen E
Strauch K
Sørensen TI
Terracciano A
Teumer A
Thom K
Thorleifsson G
Thorsteinsdottir U
Thurik AR
Tiemeier H
Timpson NJ
Timshel P
Tobin MD
Tung JY
Turley P
Uitterlinden AG
Ulivi S
Vaccargiu S
van der Harst P
van der Lee SJ
van der Most PJ
van Duijn CM
van Rooij FJ
Venturini C
Verweij N
Vinkhuyzen AA
Visscher PM
Vitart V
Vollenweider P
Vonk JM
Vozzi D
Vuckovic D
Völker U
Völzke H
Waage J
Ware EB
Weir DR
Wellmann J
Westra HJ
Willemsen G
Wilson JF
Wright AF
Yang J
Zhao W
Zhu Z
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 28/10/2022
Field of study

Educational attainment is associated with many health outcomes, including longevity. It is also known to be substantially heritable. Here, we used data from three large genetic epidemiology cohort studies (Generation Scotland, n = ∼17,000; UK Biobank, n = ∼115,000; and the Estonian Biobank, n = ∼6,000) to test whether education-linked genetic variants can predict lifespan length. We did so by using cohort members’ polygenic profile score for education to predict their parents’ longevity. Across the three cohorts, meta-analysis showed that a 1 SD higher polygenic education score was associated with ∼2.7% lower mortality risk for both mothers (total ndeaths = 79,702) and ∼2.4% lower risk for fathers (total ndeaths = 97,630). On average, the parents of offspring in the upper third of the polygenic score distribution lived 0.55 y longer compared with those of offspring in the lower third. Overall, these results indicate that the genetic contributions to educational attainment are useful in the prediction of human longevity.</p

UTUPub

Approximation algorithms for the test cover problem

Author: De Bontridder KMJ Koen
Halldórsson BV
Halldórsson Magnús M
Hurkens CAJ Cor
Lenstra JK Jan Karel
Ravi R
Stougie L Leen
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2002
Field of study

In the test cover problem a set of m items is given together with a collection of subsets, called tests. A smallest subcollection of tests is to be selected such that for each pair of items there is a test in the selection that contains exactly one of the two items. It is known that the problem is NP-hard and that the greedy algorithm has a performance ratio O(log m). We observe that, unless P=NP, no polynomial-time algorithm can do essentially better. For the case that each test contains at most k items, we give an O(log k)-approximation algorithm. We pay special attention to the case that each test contains at most two items. A strong relation with a problem of packing paths in a graph is established, which implies that even this special case is NP-hard. We prove APX-hardness of both problems, derive performance guarantees for greedy algorithms, and discuss the performance of a series of local improvement heuristics

Repository TU/e

VU Research Portal

CWI's Institutional Repository

Pure OAI Repository