127,037 research outputs found

    Achieving k-anonymity using full domain generalization

    Get PDF
    Preserving privacy while publishing data has emerged as key research area in data security and has become a primary issue in publishing person specific sensitive information. How to preserve one's privacy efficiently is a critical issue while publishing data. K-anonymity is a key technique for de-identifying the sensitive datasets. In our work, we have described a framework to implement most of the k-anonymity algorithms and also proposed a novel scheme that produces better results with real-world datasets. Additionally, we suggest a new approach that attains better results by applying a novel approach and exploiting various characteristic of our suggested framework. The proposed approach uses the concept of breadth- search algorithm to generalize the lattice in bottom-up manner. the proposed algorithm generates the paths using predictive tagging of the nodes in the lattice in vertically.the proposed algorithm has less execution time than other full domain generalization algorithms for k-anonymization

    PIntron: a Fast Method for Gene Structure Prediction via Maximal Pairings of a Pattern and a Text

    Full text link
    Current computational methods for exon-intron structure prediction from a cluster of transcript (EST, mRNA) data do not exhibit the time and space efficiency necessary to process large clusters of over than 20,000 ESTs and genes longer than 1Mb. Guaranteeing both accuracy and efficiency seems to be a computational goal quite far to be achieved, since accuracy is strictly related to exploiting the inherent redundancy of information present in a large cluster. We propose a fast method for the problem that combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are highly confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings that are sequences obtained from paths of a graph structure, called Embedding Graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the size of P, T and of the output. PIntron, the software tool implementing our methodology, is able to process in a few seconds some critical genes that are not manageable by other gene structure prediction tools. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when compared with ENCODE data. Detailed experimental data, additional results and PIntron software are available at http://www.algolab.eu/PIntron

    Adaptive Probabilistic Flooding for Multipath Routing

    Full text link
    In this work, we develop a distributed source routing algorithm for topology discovery suitable for ISP transport networks, that is however inspired by opportunistic algorithms used in ad hoc wireless networks. We propose a plug-and-play control plane, able to find multiple paths toward the same destination, and introduce a novel algorithm, called adaptive probabilistic flooding, to achieve this goal. By keeping a small amount of state in routers taking part in the discovery process, our technique significantly limits the amount of control messages exchanged with flooding -- and, at the same time, it only minimally affects the quality of the discovered multiple path with respect to the optimal solution. Simple analytical bounds, confirmed by results gathered with extensive simulation on four realistic topologies, show our approach to be of high practical interest.Comment: 6 pages, 6 figure

    Mining Marked Nodes in Large Graphs

    Get PDF
    abstract: With the rise of the Big Data Era, an exponential amount of network data is being generated at an unprecedented rate across a wide-range of high impact micro and macro areas of research---from protein interaction to social networks. The critical challenge is translating this large scale network data into actionable information. A key task in the data translation is the analysis of network connectivity via marked nodes---the primary focus of our research. We have developed a framework for analyzing network connectivity via marked nodes in large scale graphs, utilizing novel algorithms in three interrelated areas: (1) analysis of a single seed node via it’s ego-centric network (AttriPart algorithm); (2) pathway identification between two seed nodes (K-Simple Shortest Paths Multithreaded and Search Reduced (KSSPR) algorithm); and (3) tree detection, defining the interaction between three or more seed nodes (Shortest Path MST algorithm). In an effort to address both fundamental and applied research issues, we have developed the LocalForcasting algorithm to explore how network connectivity analysis can be applied to local community evolution and recommender systems. The goal is to apply the LocalForecasting algorithm to various domains---e.g., friend suggestions in social networks or future collaboration in co-authorship networks. This algorithm utilizes link prediction in combination with the AttriPart algorithm to predict future connections in local graph partitions. Results show that our proposed AttriPart algorithm finds up to 1.6x denser local partitions, while running approximately 43x faster than traditional local partitioning techniques (PageRank-Nibble). In addition, our LocalForecasting algorithm demonstrates a significant improvement in the number of nodes and edges correctly predicted over baseline methods. Furthermore, results for the KSSPR algorithm demonstrate a speed-up of up to 2.5x the standard k-simple shortest paths algorithm.Dissertation/ThesisMasters Thesis Computer Science 201