An individual can be referred by multiple name aliases on the web. Extracting aliases of a name is important in information retrieval, sentiment analysis and name disambiguation. We propose a novel approach to find aliases of a given name using automatically extracted lexical pattern based approach. We exploit set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names and extract large set of candidate aliases from text snippets returned by web search engine. We define numerous ranking scores to evaluate candidate aliases using three approaches: lexical pattern frequency, word co-occurrences in an anchor text and page counts on the web. We introduce notion of a word co-occurrence graph to represent mutual relations between words that appear in anchor text, words in anchor text are represented as nodes in the co-occurrence graph and edge is formed between nodes which link to the same url. The drawback of the existing method is the extracted alias names may be a original of some other person. So we introduce Email id extraction, by this we can overcome the problem. To construct a robust alias detection system, we integrate ranking scores through support vector machines using a single ranking function. Moreover, the aliases extracted using the proposed method are successfully utilized in information retrieval task to improve recall by 20 percent in a relation detection task
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.