17 research outputs found

    ClueNet: Clustering a temporal network based on topological similarity rather than denseness

    No full text
    <div><p>Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of “topologically related” nodes, where the resulting topology-based clusters are expected to “correlate” well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data—their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.</p></div

    Summary of ClueNet.

    No full text
    <p>Summary of ClueNet.</p

    Pairwise edge overlaps between the snapshots of (a) social Enron and (b) biological aging-related dynamic networks.

    No full text
    <p>The darker the color, the higher the edge overlap between the given snapshots. For the Enron data, the following network construction parameter values are used: <i>t</i><sub><i>w</i></sub> = 2 months and <i>w</i> = 2, but the results are similar for the other tested parameter values. Equivalent results for the other two social networks (hospital and high school), which are similar to the Enron results, are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0195993#pone.0195993.s010" target="_blank">S1 Fig</a>.</p

    Results when ClueNet’s dynamic graphlet-based topological similarities are used on top of the existing denseness-based simulated annealing method.

    No full text
    <p>Results when ClueNet’s dynamic graphlet-based topological similarities are used on top of the existing denseness-based simulated annealing method.</p

    The ranking of all DNC methods used in this study.

    No full text
    <p>The ranking of the methods (ClueNet (its three versions: C-ST, C-D, and C-C), Louvain (L), Infomap (I), Hierarchical Infomap (HI), label propagation (LP), simulated annealing (SA), and Multistep (M)) over all considered social datasets (i.e., the three ground truth partitions corresponding to the three social dynamic networks; the first column) and <b>(b)</b> biological datasets (i.e., the four ground truth partitions corresponding to the biological aging-related dynamic network; the second column) with respect to all of precision, recall, and AMI (F-score is excluded here because it is redundant to precision and recall). Each row corresponds to one of the three versions of ClueNet that is compared to the existing methods: C-ST (top), C-D (middle), and C-C (bottom). The ranking is expressed as a percentage of all cases (across all ground truth partitions and all three partition quality measures) in which the given method yields the <i>k</i><sup><i>th</i></sup> best score across all methods. We rank the methods based on their <i>p</i>-values (i.e., the smaller the <i>p</i>-value, the better the method); in case of ties, we compare the methods based on their raw partition quality scores. The ‘N/A’ rank signifies that the given method did not produce a statistically significant partition under the given partition quality score.</p

    Illustration of how a raw temporal dataset (left) is modeled as a dynamic network (right).

    No full text
    <p>One parameter is the length of the temporal window during which interactions are aggregated. In our illustration, this parameter value is one week (note that weeks begin on Monday and end on Sunday). For example, the network snapshot for week 1 (January 1<sup><i>st</i></sup> through January 7<sup><i>th</i></sup>) will aggregate interactions between nodes A and B, B and C, and C and D. Another parameter is the minimum number of events that must occur between the same nodes within the given time window in order to link these nodes in the corresponding snapshot of the dynamic network. This parameter is set to one in this example.</p

    Detailed method comparison.

    No full text
    <p>Detailed method comparison results for the social Enron (left) and biological aging-related (right) dynamic networks, quantifying the fit of each method (ClueNet (C-ST,C-D,C-C), Louvain (L), Infomap (I), Hierarchical Infomap (HI), label propagation (LP), simulated annealing (SA), and Multistep (M)) to the corresponding ground truth partition in terms of precision. There is one ground truth partition for the Enron network (results shown in the figure). There are four ground truth partitions for the aging-related networks, depending on which aging-related ground truth data is considered (BE2004, BE2008, AD, or SequenceAge; Section Data). Results are shown in this figure for the SequenceAge-based ground truth partition. For each dataset, for each method, we compare the precision score of the partition produced by the given method (red) to the average precision score of its random counterparts (blue) and show the resulting <i>p</i>-value (see Section Measuring partition quality for details). These are representative results for one network/ground truth partition from each of the social and biological domains and one measure of partition quality. Equivalent results for the other three biological aging-related ground truth partitions, for the other two social dynamic networks (hospital and high school), and for the other three partition quality measures (recall, F-score, and AMI) are shown in, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0195993#pone.0195993.s012" target="_blank">S3</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0195993#pone.0195993.s013" target="_blank">S4</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0195993#pone.0195993.s014" target="_blank">S5</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0195993#pone.0195993.s015" target="_blank">S6</a> Figs.</p

    Revealing Missing Parts of the Interactome via Link Prediction

    No full text
    <div><p>Protein interaction networks (PINs) are often used to “learn” new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared <i>immediate</i> neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to <i>comprehensively</i> study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we introduce novel <i>sensitive</i> LP measures that are expected to overcome the limitations of the existing methods.</p><p>We systematically evaluate the new and existing LP measures by introducing “synthetic” noise into PINs and measuring how accurate the measures are in reconstructing the original PINs. Also, we use the LP measures to de-noise the original PINs, and we measure biological correctness of the de-noised PINs with respect to functional enrichment of the predicted interactions. Our main findings are: 1) LP measures that favor nodes which are <i>both</i> “topologically similar” <i>and</i> have large shared <i>extended</i> neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) LP improves biological correctness of the PINs, plus we validate a significant portion of the predicted interactions in independent, external PIN data sources.</p><p>Ultimately, we are less focused on identifying a superior method but more on showing that LP improves biological correctness of PINs, which is its ultimate goal in computational biology. But we note that our new methods outperform each of the existing ones with respect to at least one evaluation criterion. Alarmingly, we find that the different criteria often disagree in identifying the best method(s), which has important implications for LP communities in any domain, including social networks.</p></div

    Graphlet positions of a node, an edge, a non-edge, and a node pair.

    No full text
    <p>All topological positions (“orbits”) in up to 4-node graphlets of a node (top; node shade), an edge (upper middle; solid line), a non-edge (lower middle; broken line), and any node pair, an edge or a non-edge (bottom; wavy line) are shown. For example: 1) in graphlet , the two end nodes are in node orbit 4, while the two middle nodes are in node orbit 5; 2) in , the two “outer” edges are in edge orbit 3, while the “middle” edge is in edge orbit 4; 3) in , the non-edge touching the end nodes is in non-edge orbit 2, while the two non-edges that touch the end nodes and the middle nodes are in non-edge orbit 3; 4) a node pair at node pair orbit 1 touches a at edge orbit 2, if it is an edge, or a at non-edge orbit 1, if it is a non-edge (hence, mutually exclusive edge orbit 2 and non-edge orbit 1 are reconciled into a common node pair orbit 1). There are 15 node, 12 edge, 7 non-edge, and 7 node pair orbits for up to 4-node graphlets. In a graphlet, different orbits are colored differently. All up to 5-node graphlets are used, but only up to 4-node graphlets are illustrated. There are 73 node, 68 edge, 49 non-edge, and 49 node pair orbits for up to 5-node graphlets.</p

    Comparison of different methods in the context of evaluation test 1.

    No full text
    <p>Our best method (“ours”) is compared against existing methods (DP, SN, JC, AA, Katz, LPI, RAI, and RWS) in terms of AUROCs (panel <b>A</b>) and AUPRs (panel <b>B</b>) for synthetically noised AP/MS, HC, and Y2H networks at 5% noise level. Here, “ours” corresponds to using 3–5-node weighted graphlets at  = 0.8. Results for all other noise levels are shown throughout <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0090073#pone.0090073.s001" target="_blank">File S1</a>.</p
    corecore