Search CORE

8 research outputs found

Incremental eigenpair computation for graph Laplacian matrices: theory and applications

Author: Al Hasan Mohammad
Chen Pin-Yu
Zhang Baichuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2017
Field of study

The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications, the number of clusters or communities (say, K) is generally unknown a priori. Consequently, the majority of the existing methods either choose K heuristically or they repeat the clustering method with different choices of K and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the Kth smallest eigenpair of the Laplacian matrix given a collection of all previously compute

arXiv.org e-Print Archive

Crossref

IUPUIScholarWorks

Bias-Variance Tradeoff of Graph Laplacian Regularizer

Author: Pin-Yu Chen
Sijia Liu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Towards Name Disambiguation: Relational, Streaming, and Privacy-Preserving Text Data

Author: Zhang Baichuan
Publication venue: 'Purdue University (bepress)'
Publication date: 01/12/2017
Field of study

In the real world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesakes of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensics. To resolve this issue, the name disambiguation task 1 is designed to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing algorithms for this task mainly suffer from the following drawbacks. First, the majority of existing solutions substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable in privacy sensitive domains. Instead we solve the name disambiguation task in restricted setting by leveraging only the relational data in the form of anonymized graphs. Second, most of the existing works for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task should be performed in an online streaming fashion in order to identify records of new ambiguous entities having no preexisting records. Finally, we investigate the potential disclosure risk of textual features used in name disambiguation and propose several algorithms to tackle the task in a privacy-aware scenario. In summary, in this dissertation, we present a number of novel approaches to address name disambiguation tasks from the above three aspects independently, namely relational, streaming, and privacy preserving textual data

Purdue E-Pubs