162,766 research outputs found

    PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

    Full text link
    {\it SimRank} is a classic measure of the similarities of nodes in a graph. Given a node uu in graph G=(V,E)G =(V, E), a {\em single-source SimRank query} returns the SimRank similarities s(u,v)s(u, v) between node uu and each node vโˆˆVv \in V. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes nn, which renders them inapplicable for real-time and interactive analysis. { This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size O(m)O(m), where mm is the number of edges in the graph, and guarantees a query time that depends on the {\em reverse PageRank} distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph.} Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.Comment: ACM SIGMOD 201

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    A new approach to understanding T cell development: the isolation and characterization of immature CD4-, CD8-, CD3- T cell cDNAs by subtraction cloning

    Get PDF
    During T cell development in the mammalian thymus, immature T cells are observed that lack the cell surface markers CD4, CD8, and CD3. A subtracted cDNA library was constructed to isolate cDNAs that are specific for these immature T cells. Tissue-specific expression of 97 individual cDNAs were examined using different cell types by Northern blot analysis, and six cDNAs were analyzed by reverse transcriptase (RT) polymerase chain reaction (PCR) detection of RNA. Approximately 50% of the clones could not be detected on Northern blots, and 40% of the clones were expressed by at least one other cell-type including monocytes, mature T cells, and B cells. Eight cDNA clones appear to be specific for the CD4-, CD8-, CD3- T cell line, used to construct the library, as determined by Northern blot analysis. In addition, 330 cDNA clones were subjected to partial automated DNA sequence determination. Database searches, with both nucleotide and protein translations, revealed cDNAs that exhibit interesting similarities to human cell-cycle gene 1, platelet-derived growth factor receptor, c-fms oncogene (CSF-1) receptor, and members of the immunoglobulin gene superfamily. This approach of employing subtraction coupled with large scale partial cDNA sequence determination can be useful to identify genes that may be involved in early T cell growth, cellular recognition or differentiation
    • โ€ฆ
    corecore