Search CORE

PolyU Institutional Repository

Reversal Distances for Strings with Few Blocks or Small Alphabets

Author: A. Radcliffe
C.A.J. Hurkens
D.A. Christie
G. Watterson
J. Fischer
L. Bulteau
P. Berman
T. Jiang
V. Bafna
X. Chen
Z. Fu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceWe study the String Reversal Distance problem, an extension of the well-known Sorting by Reversals problem. String Reversal Distance takes two strings S and T as input, and asks for a minimum number of reversals to obtain T from S. We consider four variants: String Reversal Distance, String Prefix Reversal Distance (in which any reversal must include the first letter of the string), and the signed variants of these problems, namely Signed String Reversal Distance and Signed String Prefix Reversal Distance. We study algorithmic properties of these four problems, in connection with two parameters of the input strings: the number of blocks they contain (a block being maximal substring such that all letters in the substring are equal), and the alphabet size Σ. For instance, we show that Signed String Reversal Distance and Signed String Prefix Reversal Distance are NP-hard even if the input strings have only one letter

Strobe sequence design for haplotype assembly

Author: A Ritz
Ali Bashir
BV Halldórsson
Christine Lo
D Altshuler
D He
DE Reich
ER Mardis
F Aversa
J Eid
J Marchini
J Shendure
JC Roach
L Ma
MA Levenstien
P Erdos
T Shiina
V Bafna
V Bansal
V Bansal
Vikas Bansal
Vineet Bafna
Z Guo
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. Results We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. Conclusions Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies

arXiv.org e-Print Archive

Vertex Cover Kernelization Revisited: Upper and Lower Bounds for a Refined Parameter

Author: A. Schrijver
A. Soleimanfallah
B. Chor
B.M.P. Jansen
B.M.P. Jansen
C.K. Yap
F.N. Abu-Khzam
F.N. Abu-Khzam
G. Gutin
G. Nemhauser
H. Dell
H.L. Bodlaender
H.L. Bodlaender
H.L. Bodlaender
I. Razgon
J. Chen
J. Chen
J. Díaz
J. Guo
J. Uhlmann
J. Zito
J.F. Buss
J.R. Griggs
L. Cai
L. Fortnow
M. Chlebík
M. Cygan
M. Cygan
M. Dom
M.R. Fellows
M.R. Fellows
M.R. Fellows
M.R. Garey
R. Downey
R. Niedermeier
R. Niedermeier
R. Niedermeier
R.G. Downey
S. Khot
S. Kratsch
S. Mishra
V. Bafna
V. Estivill-Castro
V. Raman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

An important result in the study of polynomial-time preprocessing shows that there is an algorithm which given an instance (G,k) of Vertex Cover outputs an equivalent instance (G',k') in polynomial time with the guarantee that G' has at most 2k' vertices (and thus O((k')^2) edges) with k' <= k. Using the terminology of parameterized complexity we say that k-Vertex Cover has a kernel with 2k vertices. There is complexity-theoretic evidence that both 2k vertices and Theta(k^2) edges are optimal for the kernel size. In this paper we consider the Vertex Cover problem with a different parameter, the size fvs(G) of a minimum feedback vertex set for G. This refined parameter is structurally smaller than the parameter k associated to the vertex covering number vc(G) since fvs(G) <= vc(G) and the difference can be arbitrarily large. We give a kernel for Vertex Cover with a number of vertices that is cubic in fvs(G): an instance (G,X,k) of Vertex Cover, where X is a feedback vertex set for G, can be transformed in polynomial time into an equivalent instance (G',X',k') such that |V(G')| <= 2k and |V(G')| <= O(|X'|^3). A similar result holds when the feedback vertex set X is not given along with the input. In sharp contrast we show that the Weighted Vertex Cover problem does not have a polynomial kernel when parameterized by the cardinality of a given vertex cover of the graph unless NP is in coNP/poly and the polynomial hierarchy collapses to the third level.Comment: Published in "Theory of Computing Systems" as an Open Access publicatio

Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification

Author: A. A. Klammer
Bafna
Danc k
Elias
Field
Frank
Geer
Havilio
Hoopmann
J. A. Bilmes
Kall
Kall
Klammer
M. J. MacCoss
Mann
Pavlidis
S. M. Reynolds
Tabb
Tabb
Tanner
W. S. Noble
Washburn
Yates
Zhang
Zhang
Zubarev
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms

CiteSeerX

Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite level

Author: Bafna Vineet
Bansal Vikas
Bhatia Gaurav
Deleuze Jean Francois
Dib Colette
Frazer Kelly A
Harismendy Olivier
Murray Sarah S
Nakano Masakazu
Scott Michael
Sipe Jack C
Topol Eric J
Turlotte Edouard
Wang Xiaoyun
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Targeted re-sequencing of candidate genes in individuals at the extremes of a quantitative phenotype distribution is a method of choice to gain information on the contribution of rare variants to disease susceptibility. The endocannabinoid system mediates signaling in the brain and peripheral tissues involved in the regulation of energy balance, is highly active in obese patients, and represents a strong candidate pathway to examine for genetic association with body mass index (BMI). Results We sequenced two intervals (covering 188 kb) encoding the endocannabinoid metabolic enzymes fatty-acid amide hydrolase (FAAH) and monoglyceride lipase (MGLL) in 147 normal controls and 142 extremely obese cases. After applying quality filters, we called 1,393 high quality single nucleotide variants, 55% of which are rare, and 143 indels. Using single marker tests and collapsed marker tests, we identified four intervals associated with BMI: the FAAH promoter, the MGLL promoter, MGLL intron 2, and MGLL intron 3. Two of these intervals are composed of rare variants and the majority of the associated variants are located in promoter sequences or in predicted transcriptional enhancers, suggesting a regulatory role. The set of rare variants in the FAAH promoter associated with BMI is also associated with increased level of FAAH substrate anandamide, further implicating a functional role in obesity. Conclusions Our study, which is one of the first reports of a sequence-based association study using next-generation sequencing of candidate genes, provides insights into study design and analysis approaches and demonstrates the importance of examining regulatory elements rather than exclusively focusing on exon sequences

Identifying the favored mutation in a positive selective sweep.

Author: A Ferrer-Admetlla
Ali Akbari
Arya Iranmehr
BF Voight
C Heffelfinger
CD Campbell
DR Schrider
DR Zerbino
G Coop
G Ewing
H Chen
J Ohashi
JJ Vitti
Joseph J Vitti
KJ Galinsky
M DeGiorgio
M Pybus
M Wang
MC Cornelis
MD Shriver
Mehrdad Bakhtiari
MI Jensen-Seaman
MW Nachman
NR Garud
P Azad
P Pavlidis
Pardis C Sabeti
PC Sabeti
PC Sabeti
PC Sabeti
R Nielsen
R Ronen
R Ronen
S Beleza
S Fan
S Gravel
S Wilde
SA Tishkoff
Siavash Mirarab
SR Grossman
T Stobdan
Vineet Bafna
Y Field
Y Kim
ZA Szpiech
Publication venue: eScholarship, University of California
Publication date: 01/04/2018
Field of study

Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations

Dependence of paracentric inversion rate on tract length

Author: A Brehm
AH Sturtevant
B Larget
GA Watterson
I Miklos
J Kececioglu
K Yogeeswaran
M Caceres
R Durrett
R Pinter
Rasmus Nielsen
Rick Durrett
S Hannenhalli
Thomas L York
TL York
V Bafna
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. RESULTS: We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. CONCLUSION: The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted

CiteSeerX

Directory of Open Access Journals

Copenhagen University Research Information System

arXiv.org e-Print Archive

Routes for breaching and protecting genetic privacy

Author: A Acquisti
A Cavoukian
A Kong
A Machanavajjhala
A Narayanan
AD Johnson
AJ Pakstis
AK Manning
AL McGuire
Arvind Narayanan
B Fons
B Malin
B Malin
BA Malin
BM Henn
C Dwork
C Shannon
CD Huff
D Clayton
D He
D Zubakov
DJ Solve
DR Nyholt
DW Craig
EA Zerhouni
EE Schadt
EM Ramos
F Liu
G Church
H Lango Allen
H Li
HK Im
HS Venter
J Burn
J Gitschier
J Kaiser
J Kaye
J Kaye
J Lee
J Marchini
JE Lunshof
JH Park
JM Oliver
JP Roberts
K Benitez
K El Emam
K El Emam
K Silventoinen
KA Tryka
KB Jacobs
KS Kendler
L Kamm
L Sweeney
L Sweeney
LA Sweeney
LA Sweeney
LAP Kohn
LL Rodriguez
M Canim
M Gymrek
M Gymrek
M Kantarcioglu
M Kayser
MD Mailman
N Chatterjee
N Homer
NN Taleb
P Bohannon
P Kwok
P Ohm
P Paillier
PM Visscher
R Braun
R Drmanac
R Khan
R Noumeir
RL Bennett
S Byers
S McClure
S Sankararaman
S Walsh
SE Brenner
SF Terry
SH Friend
T Lumley
TE King
TE King
V Bafna
W Fu
W Hartzog
WG Hill
WW Lowrance
XL Ou
Yaniv Erlich
Z Lin
Publication venue
Publication date: 01/12/2013
Field of study

We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

Princeton University Open Access Repository

Public Library of Science (PLOS)

Signal Transduction Pathways in the Pentameric Ligand-Gated Ion Channels

Author: A Auerbach
A Del Sol
A Jha
AJ Thompson
C Bouzat
C Chennubhotla
C Chennubhotla
C Grosman
CD Dellisanti
D Mowrey
David Mowrey
DJ Cadugan
E Krissinel
E Tassonyi
H Nury
H-M Lu
J Pan
Jie Liang
JY Yen
K Park
M Jansen
MD Winn
Michael N. Nitabach
MS Prevost
N Andersen
N Bocquet
N Bocquet
N Bocquet
N Mukhtasimova
N Unwin
P Emsley
P Purohit
P Purohit
P Purohit
PA Bafna
PD Adams
Pei Tang
PS Miller
Qiang Chen
RA Laskowski
RB Sidje
RJ Hilf
S Chakrapani
T Haliloglu
TL Kash
VB Chen
W Kabsch
WY Lee
WY Lee
X Xiu
Y Weng
Yan Xu
Yuhe Liang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/05/2013
Field of study

The mechanisms of allosteric action within pentameric ligand-gated ion channels (pLGICs) remain to be determined. Using crystallography, site-directed mutagenesis, and two-electrode voltage clamp measurements, we identified two functionally relevant sites in the extracellular (EC) domain of the bacterial pLGIC from Gloeobacter violaceus (GLIC). One site is at the C-loop region, where the NQN mutation (D91N, E177Q, and D178N) eliminated inter-subunit salt bridges in the open-channel GLIC structure and thereby shifted the channel activation to a higher agonist concentration. The other site is below the C-loop, where binding of the anesthetic ketamine inhibited GLIC currents in a concentration dependent manner. To understand how a perturbation signal in the EC domain, either resulting from the NQN mutation or ketamine binding, is transduced to the channel gate, we have used the Perturbation-based Markovian Transmission (PMT) model to determine dynamic responses of the GLIC channel and signaling pathways upon initial perturbations in the EC domain of GLIC. Despite the existence of many possible routes for the initial perturbation signal to reach the channel gate, the PMT model in combination with Yen's algorithm revealed that perturbation signals with the highest probability flow travel either via the β1-β2 loop or through pre-TM1. The β1-β2 loop occurs in either intra- or inter-subunit pathways, while pre-TM1 occurs exclusively in inter-subunit pathways. Residues involved in both types of pathways are well supported by previous experimental data on nAChR. The direct coupling between pre-TM1 and TM2 of the adjacent subunit adds new insight into the allosteric signaling mechanism in pLGICs. © 2013 Mowrey et al

Directory of Open Access Journals