Search CORE

24 research outputs found

Latent Representation and Sampling in Network: Application in Text Mining and Biology.

Author: Saha Tanay Kumar
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains

Purdue E-Pubs

FS^3: A Sampling based method for top-k Frequent Subgraph Mining

Author: Hasan Mohammad Al
Saha Tanay Kumar
Publication venue
Publication date: 02/09/2014
Field of study

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS^3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS^3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS^3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS^3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size

arXiv.org e-Print Archive

CiteSeerX

IUPUIScholarWorks

Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Author: Al Hasan Mohammad
Joty Shafiq
Saha Tanay Kumar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. We evaluate the models in a setup, where contextual information is available. The experimental results on tasks involving classification, clustering, and ranking of sentences show that our model outperforms the best existing models by a wide margin across multiple datasets

IUPUIScholarWorks

Name Disambiguation from link data in a collaboration graph using temporal and topological features

Author: Hasan Mohammad Al
Saha Tanay Kumar
Zhang Baichuan
Publication venue
Publication date: 01/12/2015
Field of study

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

arXiv.org e-Print Archive

IUPUIScholarWorks

Name Disambiguation from link data in a collaboration graph

Author: Al Hasan Mohammad
Saha Tanay Kumar
Zhang Baichuan
Publication venue: Office of the Vice Chancellor for Research
Publication date: 17/04/2015
Field of study

poster abstractAbstract—The entity disambiguation task partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is nonintrusive of privacy as it uses only the timestamped graph topology of an anonymized network. Experimental results on two reallife academic collaboration networks show that the proposed method has satisfactory performance

IUPUIScholarWorks

Engineering anomalous Floquet Majorana modes and their time evolution in helical Shiba chain

Author: Ghosh Arnob Kumar
Mondal Debashish
Nag Tanay
Saha Arijit
Publication venue
Publication date: 05/04/2023
Field of study

We theoretically explore the Floquet generation of Majorana end modes~(MEMs) (both regular

0

- and anomalous

\pi

-modes) implementing a periodic sinusoidal modulation in chemical potential in an experimentally feasible setup based on one-dimensional chain of magnetic impurity atoms having spin spiral configuration fabricated on the surface of most common bulk

s

-wave superconductor. We obtain a rich phase diagram in the parameter space, highlighting the possibility of generating multiple

0

\pi

-MEMs localized at the end of the chain. We also study the real-time evolution of these emergent MEMs, especially when they start to appear in the time domain. These MEMs are topologically characterized by employing the dynamical winding number. We also discuss the possible experimental parameters in connection to our model. Our work paves the way to realize the Floquet MEMs in a magnet-superconductor heterostructure.Comment: 7.5 Pages + 5 PDF figures (Main Text), 4 Pages + 3 PDF figures (Supplementary Material), Comments are welcom

arXiv.org e-Print Archive

Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining

Author: Al Hasan Mohammad
Dhifli Wajdi
Katebi Ataur
Saha Tanay Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Modeling the interface region of a protein complex paves the way for understanding its dynamics and functionalities. Existing works model the interface region of a complex by using different approaches, such as, the residue composition at the interface region, the geometry of the interface residues, or the structural alignment of interface regions. These approaches are useful for ranking a set of docked conformation or for building scoring function for protein-protein docking, but they do not provide a generic and scalable technique for the extraction of interface patterns leading to functional motif discovery. In this work, we model the interface region of a protein complex by graphs and extract interface patterns of the given complex in the form of frequent subgraphs. To achieve this we develop a scalable algorithm for frequent subgraph mining. We show that a systematic review of the mined subgraphs provides an effective method for the discovery of functional motifs that exist along the interface region of a given protein complex

IUPUIScholarWorks