162,766 research outputs found
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
{\it SimRank} is a classic measure of the similarities of nodes in a graph.
Given a node in graph , a {\em single-source SimRank query}
returns the SimRank similarities between node and each node . This type of queries has numerous applications in web search and social
networks analysis, such as link prediction, web mining, and spam detection.
Existing methods for single-source SimRank queries, however, incur query cost
at least linear to the number of nodes , which renders them inapplicable for
real-time and interactive analysis.
{ This paper proposes \prsim, an algorithm that exploits the structure of
graphs to efficiently answer single-source SimRank queries. \prsim uses an
index of size , where is the number of edges in the graph, and
guarantees a query time that depends on the {\em reverse PageRank} distribution
of the input graph. In particular, we prove that \prsim runs in sub-linear time
if the degree distribution of the input graph follows the power-law
distribution, a property possessed by many real-world graphs. Based on the
theoretical analysis, we show that the empirical query time of all existing
SimRank algorithms also depends on the reverse PageRank distribution of the
graph.} Finally, we present the first experimental study that evaluates the
absolute errors of various SimRank algorithms on large graphs, and we show that
\prsim outperforms the state of the art in terms of query time, accuracy, index
size, and scalability.Comment: ACM SIGMOD 201
An Extended Stable Marriage Problem Algorithm for Clone Detection
Code cloning negatively affects industrial software and threatens
intellectual property. This paper presents a novel approach to detecting cloned
software by using a bijective matching technique. The proposed approach focuses
on increasing the range of similarity measures and thus enhancing the precision
of the detection. This is achieved by extending a well-known stable-marriage
problem (SMP) and demonstrating how matches between code fragments of different
files can be expressed. A prototype of the proposed approach is provided using
a proper scenario, which shows a noticeable improvement in several features of
clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
Structured Review of Code Clone Literature
This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other
A new approach to understanding T cell development: the isolation and characterization of immature CD4-, CD8-, CD3- T cell cDNAs by subtraction cloning
During T cell development in the mammalian thymus, immature T cells are observed that lack the cell surface markers CD4, CD8, and CD3. A subtracted cDNA library was constructed to isolate cDNAs that are specific for these immature T cells. Tissue-specific expression of 97 individual cDNAs were examined using different cell types by Northern blot analysis, and six cDNAs were analyzed by reverse transcriptase (RT) polymerase chain reaction (PCR) detection of RNA. Approximately 50% of the clones could not be detected on Northern blots, and 40% of the clones were expressed by at least one other cell-type including monocytes, mature T cells, and B cells. Eight cDNA clones appear to be specific for the CD4-, CD8-, CD3- T cell line, used to construct the library, as determined by Northern blot analysis. In addition, 330 cDNA clones were subjected to partial automated DNA sequence determination. Database searches, with both nucleotide and protein translations, revealed cDNAs that exhibit interesting similarities to human cell-cycle gene 1, platelet-derived growth factor receptor, c-fms oncogene (CSF-1) receptor, and members of the immunoglobulin gene superfamily. This approach of employing subtraction coupled with large scale partial cDNA sequence determination can be useful to identify genes that may be involved in early T cell growth, cellular recognition or differentiation
- โฆ