240 research outputs found
On Feedback Vertex Set: New Measure and New Structures
We present a new parameterized algorithm for the {feedback vertex set}
problem ({\sc fvs}) on undirected graphs. We approach the problem by
considering a variation of it, the {disjoint feedback vertex set} problem ({\sc
disjoint-fvs}), which finds a feedback vertex set of size that has no
overlap with a given feedback vertex set of the graph . We develop an
improved kernelization algorithm for {\sc disjoint-fvs} and show that {\sc
disjoint-fvs} can be solved in polynomial time when all vertices in have degrees upper bounded by three. We then propose a new
branch-and-search process on {\sc disjoint-fvs}, and introduce a new
branch-and-search measure. The process effectively reduces a given graph to a
graph on which {\sc disjoint-fvs} becomes polynomial-time solvable, and the new
measure more accurately evaluates the efficiency of the process. These
algorithmic and combinatorial studies enable us to develop an
-time parameterized algorithm for the general {\sc fvs} problem,
improving all previous algorithms for the problem.Comment: Final version, to appear in Algorithmic
Reversal Distances for Strings with Few Blocks or Small Alphabets
International audienceWe study the String Reversal Distance problem, an extension of the well-known Sorting by Reversals problem. String Reversal Distance takes two strings S and T as input, and asks for a minimum number of reversals to obtain T from S. We consider four variants: String Reversal Distance, String Prefix Reversal Distance (in which any reversal must include the first letter of the string), and the signed variants of these problems, namely Signed String Reversal Distance and Signed String Prefix Reversal Distance. We study algorithmic properties of these four problems, in connection with two parameters of the input strings: the number of blocks they contain (a block being maximal substring such that all letters in the substring are equal), and the alphabet size Σ. For instance, we show that Signed String Reversal Distance and Signed String Prefix Reversal Distance are NP-hard even if the input strings have only one letter
Strobe sequence design for haplotype assembly
Abstract Background Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. Results We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. Conclusions Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies
Vertex Cover Kernelization Revisited: Upper and Lower Bounds for a Refined Parameter
An important result in the study of polynomial-time preprocessing shows that
there is an algorithm which given an instance (G,k) of Vertex Cover outputs an
equivalent instance (G',k') in polynomial time with the guarantee that G' has
at most 2k' vertices (and thus O((k')^2) edges) with k' <= k. Using the
terminology of parameterized complexity we say that k-Vertex Cover has a kernel
with 2k vertices. There is complexity-theoretic evidence that both 2k vertices
and Theta(k^2) edges are optimal for the kernel size. In this paper we consider
the Vertex Cover problem with a different parameter, the size fvs(G) of a
minimum feedback vertex set for G. This refined parameter is structurally
smaller than the parameter k associated to the vertex covering number vc(G)
since fvs(G) <= vc(G) and the difference can be arbitrarily large. We give a
kernel for Vertex Cover with a number of vertices that is cubic in fvs(G): an
instance (G,X,k) of Vertex Cover, where X is a feedback vertex set for G, can
be transformed in polynomial time into an equivalent instance (G',X',k') such
that |V(G')| <= 2k and |V(G')| <= O(|X'|^3). A similar result holds when the
feedback vertex set X is not given along with the input. In sharp contrast we
show that the Weighted Vertex Cover problem does not have a polynomial kernel
when parameterized by the cardinality of a given vertex cover of the graph
unless NP is in coNP/poly and the polynomial hierarchy collapses to the third
level.Comment: Published in "Theory of Computing Systems" as an Open Access
publicatio
Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification
Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms
Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite level
Abstract Background Targeted re-sequencing of candidate genes in individuals at the extremes of a quantitative phenotype distribution is a method of choice to gain information on the contribution of rare variants to disease susceptibility. The endocannabinoid system mediates signaling in the brain and peripheral tissues involved in the regulation of energy balance, is highly active in obese patients, and represents a strong candidate pathway to examine for genetic association with body mass index (BMI). Results We sequenced two intervals (covering 188 kb) encoding the endocannabinoid metabolic enzymes fatty-acid amide hydrolase (FAAH) and monoglyceride lipase (MGLL) in 147 normal controls and 142 extremely obese cases. After applying quality filters, we called 1,393 high quality single nucleotide variants, 55% of which are rare, and 143 indels. Using single marker tests and collapsed marker tests, we identified four intervals associated with BMI: the FAAH promoter, the MGLL promoter, MGLL intron 2, and MGLL intron 3. Two of these intervals are composed of rare variants and the majority of the associated variants are located in promoter sequences or in predicted transcriptional enhancers, suggesting a regulatory role. The set of rare variants in the FAAH promoter associated with BMI is also associated with increased level of FAAH substrate anandamide, further implicating a functional role in obesity. Conclusions Our study, which is one of the first reports of a sequence-based association study using next-generation sequencing of candidate genes, provides insights into study design and analysis approaches and demonstrates the importance of examining regulatory elements rather than exclusively focusing on exon sequences
Identifying the favored mutation in a positive selective sweep.
Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations
Dependence of paracentric inversion rate on tract length
BACKGROUND: We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. RESULTS: We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. CONCLUSION: The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted
Routes for breaching and protecting genetic privacy
We are entering the era of ubiquitous genetic information for research,
clinical care, and personal curiosity. Sharing these datasets is vital for
rapid progress in understanding the genetic basis of human diseases. However,
one growing concern is the ability to protect the genetic privacy of the data
originators. Here, we technically map threats to genetic privacy and discuss
potential mitigation strategies for privacy-preserving dissemination of genetic
data.Comment: Draft for comment
Signal Transduction Pathways in the Pentameric Ligand-Gated Ion Channels
The mechanisms of allosteric action within pentameric ligand-gated ion channels (pLGICs) remain to be determined. Using crystallography, site-directed mutagenesis, and two-electrode voltage clamp measurements, we identified two functionally relevant sites in the extracellular (EC) domain of the bacterial pLGIC from Gloeobacter violaceus (GLIC). One site is at the C-loop region, where the NQN mutation (D91N, E177Q, and D178N) eliminated inter-subunit salt bridges in the open-channel GLIC structure and thereby shifted the channel activation to a higher agonist concentration. The other site is below the C-loop, where binding of the anesthetic ketamine inhibited GLIC currents in a concentration dependent manner. To understand how a perturbation signal in the EC domain, either resulting from the NQN mutation or ketamine binding, is transduced to the channel gate, we have used the Perturbation-based Markovian Transmission (PMT) model to determine dynamic responses of the GLIC channel and signaling pathways upon initial perturbations in the EC domain of GLIC. Despite the existence of many possible routes for the initial perturbation signal to reach the channel gate, the PMT model in combination with Yen's algorithm revealed that perturbation signals with the highest probability flow travel either via the β1-β2 loop or through pre-TM1. The β1-β2 loop occurs in either intra- or inter-subunit pathways, while pre-TM1 occurs exclusively in inter-subunit pathways. Residues involved in both types of pathways are well supported by previous experimental data on nAChR. The direct coupling between pre-TM1 and TM2 of the adjacent subunit adds new insight into the allosteric signaling mechanism in pLGICs. © 2013 Mowrey et al
- …