624 research outputs found

    Covering Pairs in Directed Acyclic Graphs

    Full text link
    The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation sequencing technologies in bioinformatics. The first problem (MinPCRP), given a DAG and a set of pairs of vertices, asks for a minimum cardinality set of paths "covering" all the vertices such that both vertices of each pair belong to the same path. For this problem, we show that, while it is NP-hard to compute if there exists a solution consisting of at most three paths, it is possible to decide in polynomial time whether a solution consisting of at most two paths exists. The second problem (MaxRPSP), given a DAG and a set of pairs of vertices, asks for a path containing the maximum number of the given pairs of vertices. We show its NP-hardness and also its W[1]-hardness when parametrized by the number of covered pairs. On the positive side, we give a fixed-parameter algorithm when the parameter is the maximum overlapping degree, a natural parameter in the bioinformatics applications of the problem

    Parameterized Complexity of the k-anonymity Problem

    Full text link
    The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the kk-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least kk and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be APX-hard even when the records values are over a binary alphabet and k=3k=3, and when the records have length at most 8 and k=4k=4 . In this paper we study how the complexity of the problem is influenced by different parameters. In this paper we follow this direction of research, first showing that the problem is W[1]-hard when parameterized by the size of the solution (and the value kk). Then we exhibit a fixed parameter algorithm, when the problem is parameterized by the size of the alphabet and the number of columns. Finally, we investigate the computational (and approximation) complexity of the kk-anonymity problem, when restricting the instance to records having length bounded by 3 and k=3k=3. We show that such a restriction is APX-hard.Comment: 22 pages, 2 figure

    The zero exemplar distance problem

    Full text link
    Given two genomes with duplicate genes, \textsc{Zero Exemplar Distance} is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that \textsc{Zero Exemplar Distance} for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this paper, we give a very simple alternative proof of this result. We also study the problem \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of \textsc{Zero Exemplar Distance} admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem \textsc{Exemplar Longest Common Subsequence} in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order is fixed-parameter tractable if the parameter is the maximum number of chromosomes in each genome.Comment: Strengthened and reorganize

    On the Complexity of tt-Closeness Anonymization and Related Problems

    Full text link
    An important issue in releasing individual data is to protect the sensitive information from being leaked and maliciously utilized. Famous privacy preserving principles that aim to ensure both data privacy and data integrity, such as kk-anonymity and ll-diversity, have been extensively studied both theoretically and empirically. Nonetheless, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. The tt-closeness principle has been proposed to fix this, which also has the benefit of supporting numerical sensitive attributes. However, in contrast to kk-anonymity and ll-diversity, the theoretical aspect of tt-closeness has not been well investigated. We initiate the first systematic theoretical study on the tt-closeness principle under the commonly-used attribute suppression model. We prove that for every constant tt such that 0t<10\leq t<1, it is NP-hard to find an optimal tt-closeness generalization of a given table. The proof consists of several reductions each of which works for different values of tt, which together cover the full range. To complement this negative result, we also provide exact and fixed-parameter algorithms. Finally, we answer some open questions regarding the complexity of kk-anonymity and ll-diversity left in the literature.Comment: An extended abstract to appear in DASFAA 201

    Approximating Clustering of Fingerprint Vectors with Missing Values

    Full text link
    The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.Comment: 13 pages, 4 figure

    Migration and Legal Precarity in the Time of Pandemic : Qualitative Research on the Italian Case

    Get PDF
    The COVID-19 pandemic has unequally impacted the lives of Italian subjects. The article uses evidence from forty-seven semi-structured interviews with various migrant groups to illuminate how temporalities embedded in Italy’s migration governance shape migrants’ precarious legal status and access to welfare. The authors show that whereas migrants with secure legal status or citizenship have not engaged significantly with Italian bureaucracies, they have no easy access to welfare as it is contingent on their employment and financial status. Migrants with precarious status have been the worst hit by the pandemic’s secondary effects across several fronts. These findings have implications for policy and future research

    Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss

    Get PDF
    Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234

    Mariages mixtes, migration f&#233;minine et travail domestique: un regard sur la situation italienne

    Get PDF
    The article stimulates a reflection on the theme of mixed marriages in Italy, with special reference to the so-called \u201ccaregivers\u2019 marriages\u201d. In recent years, these have been subject to an increasing stigmatization in the Italian public debate, contributing to legitimize some recent reforms in the national pension system. Drawing on the narrative of a young domestic worker married to an older Italian man, the article calls for a more complex and multifaceted vision of these processes, which have so far received little attention from the Italian and international research in the field.L\u2019article propose une r\ue9flexion sur le th\ue8me des mariages mixtes en Italie, et en particulier desdits \uab mariages des assistantes de vie \ue9trang\ue8res \ue0 domicile \ubb. Ceux-ci, en effet, ces derni\ue8res ann\ue9es, ont fait l\u2019objet de lectures posant probl\ue8me dans le d\ue9bat public italien, contribuant \ue0 l\ue9gitimer certaines r\ue9formes r\ue9centes du syst\ue8me des retraites. Prenant comme base des points qui \ue9mergent de l\u2019exp\ue9rience directe d\u2019une jeune employ\ue9e de maison \ue9trang\ue8re, mari\ue9e \ue0 un homme italien plus \ue2g\ue9, l\u2019article vise \ue0 livrer une vision plus complexe de ces processus qui ont encore peu suscit\ue9, dans l\u2019ensemble, l\u2019attention des chercheurs italiens et internationaux sp\ue9cialis\ue9s

    Beyond Perfect phylogeny: Multisample Phylogeny reconstruction via ILP

    Get PDF
    Most of the evolutionary history reconstruction approaches are based on the infinite site assumption which is underlying the Perfect Phylogeny model. This is one of the most used models in cancer genomics. Recent results gives a strong evidence that recurrent and back mutations are present in the evolutionary history of tumors [19], thus showing that more general models then the Perfect phylogeny are required. To address this problem we propose a framework based on the notion of Incomplete Perfect Phylogeny. Our framework incorporates losing and gaining mutations, hence including the Dollo and the Camin-Sokal models, and is described with an Integer Linear Programming (ILP) formulation. Our approach generalizes the notion of persistent phylogeny [1] and the ILP approach [14, 15] proposed to solve the corresponding phylogeny reconstruction problem on character data. The final goal of our paper is to integrate our approach into an ILP formulation of the problem of reconstructing trees on mixed populations, where the input data consists of the fraction of cells in a set of samples that have a certain mutation. This is a fundamental problem in cancer genomics, where the goal is to study the evolutionary history of a tumor. An experimental analysis shows that our ILP approach is able to explain data that do not fit the perfect phylogeny assumption, thereby allowing (1) multiple losses and gains of mutations, and (2) a number of subpopulations that is smaller than the number of input mutations

    Accepting splicing systems with permitting and forbidding words

    Get PDF
    Abstract: In this paper we propose a generalization of the accepting splicingsystems introduced in Mitrana et al. (Theor Comput Sci 411:2414?2422,2010). More precisely, the input word is accepted as soon as a permittingword is obtained provided that no forbidding word has been obtained sofar, otherwise it is rejected. Note that in the new variant of acceptingsplicing system the input word is rejected if either no permitting word isever generated (like in Mitrana et al. in Theor Comput Sci 411:2414?2422,2010) or a forbidding word has been generated and no permitting wordhad been generated before. We investigate the computational power ofthe new variants of accepting splicing systems and the interrelationshipsamong them. We show that the new condition strictly increases thecomputational power of accepting splicing systems. Although there areregular languages that cannot be accepted by any of the splicing systemsconsidered here, the new variants can accept non-regular and even non-context-free languages, a situation that is not very common in the case of(extended) finite splicing systems without additional restrictions. We alsoshow that the smallest class of languages out of the four classes definedby accepting splicing systems is strictly included in the class of context-free languages. Solutions to a few decidability problems are immediatelyderived from the proof of this result
    corecore