240 research outputs found

    On Feedback Vertex Set: New Measure and New Structures

    Full text link
    We present a new parameterized algorithm for the {feedback vertex set} problem ({\sc fvs}) on undirected graphs. We approach the problem by considering a variation of it, the {disjoint feedback vertex set} problem ({\sc disjoint-fvs}), which finds a feedback vertex set of size kk that has no overlap with a given feedback vertex set FF of the graph GG. We develop an improved kernelization algorithm for {\sc disjoint-fvs} and show that {\sc disjoint-fvs} can be solved in polynomial time when all vertices in GFG \setminus F have degrees upper bounded by three. We then propose a new branch-and-search process on {\sc disjoint-fvs}, and introduce a new branch-and-search measure. The process effectively reduces a given graph to a graph on which {\sc disjoint-fvs} becomes polynomial-time solvable, and the new measure more accurately evaluates the efficiency of the process. These algorithmic and combinatorial studies enable us to develop an O(3.83k)O^*(3.83^k)-time parameterized algorithm for the general {\sc fvs} problem, improving all previous algorithms for the problem.Comment: Final version, to appear in Algorithmic

    Reversal Distances for Strings with Few Blocks or Small Alphabets

    Get PDF
    International audienceWe study the String Reversal Distance problem, an extension of the well-known Sorting by Reversals problem. String Reversal Distance takes two strings S and T as input, and asks for a minimum number of reversals to obtain T from S. We consider four variants: String Reversal Distance, String Prefix Reversal Distance (in which any reversal must include the first letter of the string), and the signed variants of these problems, namely Signed String Reversal Distance and Signed String Prefix Reversal Distance. We study algorithmic properties of these four problems, in connection with two parameters of the input strings: the number of blocks they contain (a block being maximal substring such that all letters in the substring are equal), and the alphabet size Σ. For instance, we show that Signed String Reversal Distance and Signed String Prefix Reversal Distance are NP-hard even if the input strings have only one letter

    Strobe sequence design for haplotype assembly

    Get PDF
    Abstract Background Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. Results We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. Conclusions Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies

    Vertex Cover Kernelization Revisited: Upper and Lower Bounds for a Refined Parameter

    Get PDF
    An important result in the study of polynomial-time preprocessing shows that there is an algorithm which given an instance (G,k) of Vertex Cover outputs an equivalent instance (G',k') in polynomial time with the guarantee that G' has at most 2k' vertices (and thus O((k')^2) edges) with k' <= k. Using the terminology of parameterized complexity we say that k-Vertex Cover has a kernel with 2k vertices. There is complexity-theoretic evidence that both 2k vertices and Theta(k^2) edges are optimal for the kernel size. In this paper we consider the Vertex Cover problem with a different parameter, the size fvs(G) of a minimum feedback vertex set for G. This refined parameter is structurally smaller than the parameter k associated to the vertex covering number vc(G) since fvs(G) <= vc(G) and the difference can be arbitrarily large. We give a kernel for Vertex Cover with a number of vertices that is cubic in fvs(G): an instance (G,X,k) of Vertex Cover, where X is a feedback vertex set for G, can be transformed in polynomial time into an equivalent instance (G',X',k') such that |V(G')| <= 2k and |V(G')| <= O(|X'|^3). A similar result holds when the feedback vertex set X is not given along with the input. In sharp contrast we show that the Weighted Vertex Cover problem does not have a polynomial kernel when parameterized by the cardinality of a given vertex cover of the graph unless NP is in coNP/poly and the polynomial hierarchy collapses to the third level.Comment: Published in "Theory of Computing Systems" as an Open Access publicatio

    Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification

    Get PDF
    Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms

    Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite level

    Get PDF
    Abstract Background Targeted re-sequencing of candidate genes in individuals at the extremes of a quantitative phenotype distribution is a method of choice to gain information on the contribution of rare variants to disease susceptibility. The endocannabinoid system mediates signaling in the brain and peripheral tissues involved in the regulation of energy balance, is highly active in obese patients, and represents a strong candidate pathway to examine for genetic association with body mass index (BMI). Results We sequenced two intervals (covering 188 kb) encoding the endocannabinoid metabolic enzymes fatty-acid amide hydrolase (FAAH) and monoglyceride lipase (MGLL) in 147 normal controls and 142 extremely obese cases. After applying quality filters, we called 1,393 high quality single nucleotide variants, 55% of which are rare, and 143 indels. Using single marker tests and collapsed marker tests, we identified four intervals associated with BMI: the FAAH promoter, the MGLL promoter, MGLL intron 2, and MGLL intron 3. Two of these intervals are composed of rare variants and the majority of the associated variants are located in promoter sequences or in predicted transcriptional enhancers, suggesting a regulatory role. The set of rare variants in the FAAH promoter associated with BMI is also associated with increased level of FAAH substrate anandamide, further implicating a functional role in obesity. Conclusions Our study, which is one of the first reports of a sequence-based association study using next-generation sequencing of candidate genes, provides insights into study design and analysis approaches and demonstrates the importance of examining regulatory elements rather than exclusively focusing on exon sequences

    Identifying the favored mutation in a positive selective sweep.

    Get PDF
    Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations

    Dependence of paracentric inversion rate on tract length

    Get PDF
    BACKGROUND: We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. RESULTS: We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. CONCLUSION: The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted

    Routes for breaching and protecting genetic privacy

    Full text link
    We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

    Signal Transduction Pathways in the Pentameric Ligand-Gated Ion Channels

    Get PDF
    The mechanisms of allosteric action within pentameric ligand-gated ion channels (pLGICs) remain to be determined. Using crystallography, site-directed mutagenesis, and two-electrode voltage clamp measurements, we identified two functionally relevant sites in the extracellular (EC) domain of the bacterial pLGIC from Gloeobacter violaceus (GLIC). One site is at the C-loop region, where the NQN mutation (D91N, E177Q, and D178N) eliminated inter-subunit salt bridges in the open-channel GLIC structure and thereby shifted the channel activation to a higher agonist concentration. The other site is below the C-loop, where binding of the anesthetic ketamine inhibited GLIC currents in a concentration dependent manner. To understand how a perturbation signal in the EC domain, either resulting from the NQN mutation or ketamine binding, is transduced to the channel gate, we have used the Perturbation-based Markovian Transmission (PMT) model to determine dynamic responses of the GLIC channel and signaling pathways upon initial perturbations in the EC domain of GLIC. Despite the existence of many possible routes for the initial perturbation signal to reach the channel gate, the PMT model in combination with Yen's algorithm revealed that perturbation signals with the highest probability flow travel either via the β1-β2 loop or through pre-TM1. The β1-β2 loop occurs in either intra- or inter-subunit pathways, while pre-TM1 occurs exclusively in inter-subunit pathways. Residues involved in both types of pathways are well supported by previous experimental data on nAChR. The direct coupling between pre-TM1 and TM2 of the adjacent subunit adds new insight into the allosteric signaling mechanism in pLGICs. © 2013 Mowrey et al
    corecore