106 research outputs found

    FS^3: A Sampling based method for top-k Frequent Subgraph Mining

    Get PDF
    Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS^3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS^3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS^3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS^3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size

    Latent Representation and Sampling in Network: Application in Text Mining and Biology.

    Get PDF
    In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains

    Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

    Get PDF
    We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. We evaluate the models in a setup, where contextual information is available. The experimental results on tasks involving classification, clustering, and ranking of sentences show that our model outperforms the best existing models by a wide margin across multiple datasets

    Name Disambiguation from link data in a collaboration graph using temporal and topological features

    Get PDF
    In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

    Generation of higher-order topological insulators using periodic driving

    Full text link
    Topological insulators~(TIs) are a new class of materials that resemble ordinary band insulators in terms of a bulk band gap but exhibit protected metallic states on their boundaries. In this modern direction, higher-order TIs~(HOTIs) are a new class of TIs in dimensions d>1d>1. These HOTIs possess (d−1)(d - 1)-dimensional boundaries that, unlike those of conventional TIs, do not conduct via gapless states but are themselves TIs. Precisely, an nthn^{\rm th} order dd-dimensional higher-order topological insulator is characterized by the presence of boundary modes that reside on its dc=(d−n)d_c=(d-n)-dimensional boundary. For instance, a three-dimensional second (third) order TI hosts gapless (localized) modes on the hinges (corners), characterized by dc=1(0)d_c = 1 (0). Similarly, a second-order TI in two dimensions only has localized corner states (dc=0d_c = 0). These higher-order phases are protected by various crystalline as well as discrete symmetries. The non-equilibrium tunability of the topological phase has been a major academic challenge where periodic Floquet drive provides us golden opportunity to overcome that barrier. Here, we discuss different periodic driving protocols to generate Floquet higher-order TIs while starting from a non-topological or first-order topological phase. Furthermore, we emphasize that one can generate the dynamical anomalous π\pi-modes along with the concomitant 00-modes. The former can be realized only in a dynamical setup. We exemplify the Floquet higher-order topological modes in two and three dimensions in a systematic way. Especially, in two dimensions, we demonstrate a Floquet second-order TI hosting 00- and π\pi corner modes. Whereas a three-dimensional Floquet second-order TI and Floquet third-order TI manifest one- and zero-dimensional hinge and corner modes, respectively.Comment: Invited Topical Review Article for JPCM. This is the published versio

    Name Disambiguation from link data in a collaboration graph

    Get PDF
    poster abstractAbstract—The entity disambiguation task partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is nonintrusive of privacy as it uses only the timestamped graph topology of an anonymized network. Experimental results on two reallife academic collaboration networks show that the proposed method has satisfactory performance
    • …
    corecore