164 research outputs found

    Efficient Node Proximity and Node Significance Computations in Graphs

    Get PDF
    abstract: Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds. In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes' fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Mining and Analyzing the Academic Network

    Get PDF
    Social Network research has attracted the interests of many researchers, not only in analyzing the online social networking applications, such as Facebook and Twitter, but also in providing comprehensive services in scientific research domain. We define an Academic Network as a social network which integrates scientific factors, such as authors, papers, affiliations, publishing venues, and their relationships, such as co-authorship among authors and citations among papers. By mining and analyzing the academic network, we can provide users comprehensive services as searching for research experts, published papers, conferences, as well as detecting research communities or the evolutions hot research topics. We can also provide recommendations to users on with whom to collaborate, whom to cite and where to submit.In this dissertation, we investigate two main tasks that have fundamental applications in the academic network research. In the first, we address the problem of expertise retrieval, also known as expert finding or ranking, in which we identify and return a ranked list of researchers, based upon their estimated expertise or reputation, to user-specified queries. In the second, we address the problem of research action recommendation (prediction), specifically, the tasks of publishing venue recommendation, citation recommendation and coauthor recommendation. For both tasks, to effectively mine and integrate heterogeneous information and therefore develop well-functioning ranking or recommender systems is our principal goal. For the task of expertise retrieval, we first proposed or applied three modified versions of PageRank-like algorithms into citation network analysis; we then proposed an enhanced author-topic model by simultaneously modeling citation and publishing venue information; we finally incorporated the pair-wise learning-to-rank algorithm into traditional topic modeling process, and further improved the model by integrating groups of author-specific features. For the task of research action recommendation, we first proposed an improved neighborhood-based collaborative filtering approach for publishing venue recommendation; we then applied our proposed enhanced author-topic model and demonstrated its effectiveness in both cited author prediction and publishing venue prediction; finally we proposed an extended latent factor model that can jointly model several relations in an academic environment in a unified way and verified its performance in four recommendation tasks: the recommendation on author-co-authorship, author-paper citation, paper-paper citation and paper-venue submission. Extensive experiments conducted on large-scale real-world data sets demonstrated the superiority of our proposed models over other existing state-of-the-art methods

    Online Social Networks: Measurements, Analysis and Solutions for Mining Challenges

    Get PDF
    In the last decade, online social networks showed enormous growth. With the rise of these networks and the consequent availability of wealth social network data, Social Network Analysis (SNA) led researchers to get the opportunity to access, analyse and mine the social behaviour of millions of people, explore the way they communicate and exchange information. Despite the growing interest in analysing social networks, there are some challenges and implications accompanying the analysis and mining of these networks. For example, dealing with large-scale and evolving networks is not yet an easy task and still requires a new mining solution. In addition, finding communities within these networks is a challenging task and could open opportunities to see how people behave in groups on a large scale. Also, the challenge of validating and optimizing communities without knowing in advance the structure of the network due to the lack of ground truth is yet another challenging barrier for validating the meaningfulness of the resulting communities. In this thesis, we started by providing an overview of the necessary background and key concepts required in the area of social networks analysis. Our main focus is to provide solutions to tackle the key challenges in this area. For doing so, first, we introduce a predictive technique to help in the prediction of the execution time of the analysis tasks for evolving networks through employing predictive modeling techniques to the problem of evolving and large-scale networks. Second, we study the performance of existing community detection approaches to derive high quality community structure using a real email network through analysing the exchange of emails and exploring community dynamics. The aim is to study the community behavioral patterns and evaluate their quality within an actual network. Finally, we propose an ensemble technique for deriving communities using a rich internal enterprise real network in IBM that reflects real collaborations and communications between employees. The technique aims to improve the community detection process through the fusion of different algorithms

    Link analysis algorithms to handle hanging and spam pages

    Get PDF
    Hanging pages are one of the hidden problems in link structure ranking algorithms and they can be compromised by spammers to induce Link Spam. In this thesis, efficient algorithms have been developed to produce fair and relevant results from Web search engines, by handling hanging pages and detecting Link spam caused by them. The effect of hanging pages in Website Optimisation is also investigated and methodologies were proposed to develop optimised websites

    Learning Relatedness Measures for Entity Linking

    Get PDF
    Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms
    corecore