11 research outputs found

    Internet multimedia information retrieval based on link analysis.

    Get PDF
    Chan Ka Yan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves i-iv (3rd gp.)).Abstracts in English and Chinese.ACKNOWLEDGEMENT --- p.IABSTRACT --- p.II摘要 --- p.IVTABLE OF CONTENT --- p.VILIST OF FIGURE --- p.VIIILIST OF TABLE --- p.IXChapter CHAPTER 1. --- INTRODUCTION --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Importance of hyperlink analysis --- p.2Chapter CHAPTER 2. --- RELATED WORK --- p.4Chapter 2.1 --- Crawling --- p.4Chapter 2.1.1 --- Crawling method for HITS Algorithm --- p.4Chapter 2.1.2 --- Crawling method for Page Rank Algorithm --- p.7Chapter 2.2 --- Ranking --- p.7Chapter 2.2.1 --- Page Rank Algorithm --- p.8Chapter 2.2.2 --- HITS Algorithm --- p.11Chapter 2.2.3 --- PageRank-HITS Algorithm --- p.15Chapter 2.2.4 --- SALSA Algorithm --- p.16Chapter 2.2.5 --- Average and Sim --- p.18Chapter 2.2.6 --- Netscape Approach --- p.19Chapter 2.2.7 --- Cocitation Approach --- p.19Chapter 2.3 --- Multimedia Information Retrieval --- p.20Chapter 2.3.1 --- Octopus --- p.21Chapter CHAPTER 3. --- RESEARCH METHODOLOGY --- p.25Chapter 3.1 --- Research Objective --- p.25Chapter 3.2 --- Proposed Crawling Methodology --- p.26Chapter 3.2.1 --- Collecting Media Objects --- p.26Chapter 3.2.2 --- Filtering the collection of links --- p.29Chapter 3.3 --- Proposed Ranking Methodology --- p.34Chapter 3.3.1 --- Identifying the factors affect ranking --- p.34Chapter 3.3.2 --- Modified Ranking Algorithms --- p.37Chapter CHAPTER 4. --- EXPERIMENTAL RESULTS AND DISCUSSIONS --- p.52Chapter 4.1 --- Experimental Setup --- p.52Chapter 4.1.1 --- Assumptions for the Experiment --- p.53Chapter 4.2 --- Some Observations from Experiment --- p.54Chapter 4.2.1 --- Dangling links --- p.55Chapter 4.2.2 --- "Good Hub = bad Authority, Good Authority = bad Hub?" --- p.55Chapter 4.2.3 --- Setting of weights --- p.56Chapter 4.3 --- Discussion on Experimental Results --- p.57Chapter 4.3.1 --- Relevance --- p.57Chapter 4.3.2 --- Precision and recall --- p.58Chapter 4.3.3 --- Significance testing --- p.61Chapter 4.3.4 --- Ranking --- p.63Chapter 4.4 --- Limitations and Difficulties --- p.67Chapter 4.4.1 --- Small size of the base set --- p.68Chapter 4.4.2 --- Parameter settings --- p.68Chapter 4.4.3 --- Unable to remove all the meaningless links from base set --- p.68Chapter 4.4.4 --- Resources and time-consuming --- p.69Chapter 4.4.5 --- TKC Effect --- p.69Chapter 4.4.6 --- Continuously updated format of HTML codes and file types --- p.70Chapter 4.4.7 --- The object citation habit of authors --- p.70Chapter CHAPTER 5. --- CONCLUSION --- p.71Chapter 5.1 --- Contribution of our Methodology --- p.71Chapter 5.2 --- Possible Improvement --- p.71Chapter 5.3 --- Conclusion --- p.72BIBLIOGRAPHY --- p.IAPPENDIX --- p.A-IChapter A.1 --- One-tailed paired t-test results --- p.A-IChapter A2. --- Anova results --- p.A-I

    Graph similarity and matching

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 85-88).Measures of graph similarity have a broad array of applications, including comparing chemical structures, navigating complex networks like the World Wide Web, and more recently, analyzing different kinds of biological data. This thesis surveys several different notions of similarity, then focuses on an interesting class of iterative algorithms that use the structural similarity of local neighborhoods to derive pairwise similarity scores between graph elements. We have developed a new similarity measure that uses a linear update to generate both node and edge similarity scores and has desirable convergence properties. This thesis also explores the application of our similarity measure to graph matching. We attempt to correctly position a subgraph GB within a graph GA using a maximum weight matching algorithm applied to the similarity scores between GA and GB. Significant performance improvements are observed when the topological information provided by the similarity measure is combined with additional information about the attributes of the graph elements and their local neighborhoods. Matching results are presented for subgraph matching within randomly-generated graphs; an appendix briefly discusses matching applications in the yeast interactome, a graph representing protein-protein interactions within yeast.by Laura Zager.S.M

    Cancer driver gene detection in transcriptional regulatory networks using the structure analysis of weighted regulatory interactions

    Full text link
    Identification of genes that initiate cell anomalies and cause cancer in humans is among the important fields in the oncology researches. The mutation and development of anomalies in these genes are then transferred to other genes in the cell and therefore disrupt the normal functionality of the cell. These genes are known as cancer driver genes (CDGs). Various methods have been proposed for predicting CDGs, most of which based on genomic data and based on computational methods. Therefore, some researchers have developed novel bioinformatics approaches. In this study, we propose an algorithm, which is able to calculate the effectiveness and strength of each gene and rank them by using the gene regulatory networks and the stochastic analysis of regulatory linking structures between genes. To do so, firstly we constructed the regulatory network using gene expression data and the list of regulatory interactions. Then, using biological and topological features of the network, we weighted the regulatory interactions. After that, the obtained regulatory interactions weight was used in interaction structure analysis process. Interaction analysis was achieved using two separate Markov chains on the bipartite graph obtained from the main graph of the gene network. To do so, the stochastic approach for link-structure analysis has been implemented. The proposed algorithm categorizes higher-ranked genes as driver genes. The efficiency of the proposed algorithm, regarding the F-measure value and number of identified driver genes, was compared with 23 other computational and network-based methods

    Extraction and Analysis of Facebook Friendship Relations

    Get PDF
    Online Social Networks (OSNs) are a unique Web and social phenomenon, affecting tastes and behaviors of their users and helping them to maintain/create friendships. It is interesting to analyze the growth and evolution of Online Social Networks both from the point of view of marketing and other of new services and from a scientific viewpoint, since their structure and evolution may share similarities with real-life social networks. In social sciences, several techniques for analyzing (online) social networks have been developed, to evaluate quantitative properties (e.g., defining metrics and measures of structural characteristics of the networks) or qualitative aspects (e.g., studying the attachment model for the network evolution, the binary trust relationships, and the link prediction problem).\ud However, OSN analysis poses novel challenges both to Computer and Social scientists. We present our long-term research effort in analyzing Facebook, the largest and arguably most successful OSN today: it gathers more than 500 million users. Access to data about Facebook users and their friendship relations, is restricted; thus, we acquired the necessary information directly from the front-end of the Web site, in order to reconstruct a sub-graph representing anonymous interconnections among a significant subset of users. We describe our ad-hoc, privacy-compliant crawler for Facebook data extraction. To minimize bias, we adopt two different graph mining techniques: breadth-first search (BFS) and rejection sampling. To analyze the structural properties of samples consisting of millions of nodes, we developed a specific tool for analyzing quantitative and qualitative properties of social networks, adopting and improving existing Social Network Analysis (SNA) techniques and algorithms

    Standards as interdependent artifacts : the case of the Internet

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Engineering Systems Division, 2008.Includes bibliographical references.This thesis has explored a new idea: viewing standards as interdependent artifacts and studying them with network analysis tools. Using the set of Internet standards as an example, the research of this thesis includes the citation network, the author affiliation network, and the co-author network of the Internet standards over the period of 1989 to 2004. The major network analysis tools used include cohesive subgroup decomposition (the algorithm by Newman and Girvan is used), regular equivalence class decomposition (the REGE algorithm and the method developed in this thesis is used), nodal prestige and acquaintance (both calculated from Kleinberg's technique), and some social network analysis tools. Qualitative analyses of the historical and technical context of the standards as well as statistical analyses of various kinds are also used in this research. A major finding of this thesis is that for the understanding of the Internet, it is beneficial to consider its standards as interdependent artifacts. Because the basic mission of the Internet (i.e. to be an interoperable system that enables various services and applications) is enabled, not by one or a few, but by a great number of standards developed upon each other, to study the standards only as stand-alone specifications cannot really produce meaningful understandings about a workable system. Therefore, the general approaches and methodologies introduced in this thesis which we label a systems approach is a necessary addition to the existing approaches. A key finding of this thesis is that the citation network of the Internet standards can be decomposed into functionally coherent subgroups by using the Newman-Girvan algorithm.(cont.) This result shows that the (normative) citations among the standards can meaningfully be used to help us better manage and monitor the standards system. The results in this thesis indicate that organizing the developing efforts of the Internet standards into (now) 121 Working Groups was done in a manner reasonably consistent with achieving a modular (and thus more evolvable) standards system. A second decomposition of the standards network was achieved by employing the REGE algorithm together with a new method developed in this thesis (see the Appendix) for identifying regular equivalence classes. Five meaningful subgroups of the Internet standards were identified, and each of them occupies a specific position and plays a specific role in the network. The five positions are reflected in the names we have assigned to them: the Foundations, the Established, the Transients, the Newcomers, and the Stand-alones. The life cycle among these positions was uncovered and is one of the insights that the systems approach on this standard system gives relative to the evolution of the overall standards system. Another insight concerning evolution of the standard system is the development of a predictive model for promotion of standards to a new status (i.e. Proposed, Draft and Internet Standards as the three ascending statuses). This model also has practical potential to managers of standards setting organizations and to firms (and individuals) interested in efficiently participating in standards setting processes. The model prediction is based on assessing the implicit social influence of the standards (based upon the social network metric, betweenness centrality, of the standards' authors) and the apparent importance of the standard to the network (based upon calculating the standard's prestige from the citation network).(cont.) A deeper understanding of the factors that go into this model was also developed through the analysis of the factors that can predict increased prestige over time for a standard. The overall systems approach and the tools developed and demonstrated in this thesis for the study of the Internet standards can be applied to other standards systems. Application (and extension) to the World Wide Web, electric power system, mobile communication, and others would we believe lead to important improvements in our practical and scholarly understanding of these systems.by Mo-Han Hsieh.Ph.D

    Ontology-based knowledge management for technology intensive industries

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Link prediction and link detection in sequences of large social networks using temporal and local metrics

    Get PDF
    This dissertation builds upon the ideas introduced by Liben-Nowell and Kleinberg in The Link Prediction Problem for Social Networks [42]. Link prediction is the problem of predicting between which unconnected nodes in a graph a link will form next, based on the current structure of the graph. The following research contributions are made: • Highlighting the difference between the link prediction and link detection problems, which have been implicitly regarded as identical in current research. Despite hidden links and forming links having very highly significant differing metric values, they could not be distinguished from each other by a machine learning system using traditional metrics in an initial experiment. However, they could be distinguished from each other in a "simple" network (one where traditional metrics can be used for prediction successfully) using a combination of new graph analysis approaches. • Defining temporal metric statistics by combining traditional statistical measures with measures commonly employed in financial analysis and traditional social network analysis. These metrics are calculated over time for a sequence of sociograms. It is shown that some of the temporal extensions of traditional metrics increase the accuracy of link prediction. • Defining traditional metrics using different radii to those at which they are normally calculated. It is shown that this approach can increase the individual prediction accuracy of certain metrics, marginally increase the accuracy of a group of metrics, and greatly increase metric computation speed without sacrificing information content by computing metrics using smaller radii. It also solves the “distance-three task” (that common neighbour metrics cannot predict links between nodes at a distance greater than three). • Showing that the combination of local and temporal approaches to link prediction can lead to very high prediction accuracies. Furthermore in “complex” networks (ones where traditional metrics cannot be used for prediction successfully) local and temporal metrics become even more useful

    A semantic approach for scalable and self-organized context-aware systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Network analysis of shared interests represented by social bookmarking behaviors

    Get PDF
    Social bookmarking is a new phenomenon characterized by a number of features including active user participation, open and collective discovery of resources, and user-generated metadata. Among others, this study pays particular attention to its nature of being at the intersection of personal information space and social information space. While users of a social bookmarking site create and maintain their own bookmark collections, the users' personal information spaces, in aggregate, build up the information space of the site as a whole. The overall goal of this study is to understand how social information space may emerge when personal information spaces of users intersect and overlap with shared interests. The main purpose of the study is two-fold: first, to see whether and how we can identify shared interest space(s) within the general information space of a social bookmarking site; and second, to evaluate the applicability of social network analysis to this end. Delicious.com, one of the most successful instances of social bookmarking, was chosen as the case. The study was carried out in three phases asking separate yet interrelated questions concerning the overall level of interest overlap, the structural patterns in the network of users connected by shared interests, and the communities of interest within the network. The results indicate that, while individual users of delicious.com have a broad range of diverse interests, there is a considerable level of overlap and commonality, providing a ground for creating implicit networks of users with shared interests. The networks constructed based on common bookmarks revealed intriguing structural patterns commonly found in well-established social systems, including a core periphery structure with a high level of connectivity, which form a basis for efficient information sharing and knowledge transfer. Furthermore, an exploratory analysis of the network communities showed that each community has a distinct theme defining the shared interests of its members, at a high level of coherence. Overall, the results suggest that networks of people with shared interests can be induced from their social bookmarking behaviors and such networks can provide a venue for investigating social mechanisms of information sharing in this new information environment. Future research can be built upon the methods and findings of this study to further explore the implication of the emergent and implicit network of shared interests

    Evaluation of linkage-based web discovery systems

    Get PDF
    In recent years, the widespread use of the WWW has brought information retrieval systems into the homes o f many millions people. Today, we have access to many billions o f documents (web pages) and have (free-of-charge) access to powerful, fast and highly efficient search facilities over these documents provided by search engines such as Google. The "first generation" of web search engines addressed the engineering problems o f web spidering and efficient searching for large numbers o f both users and documents, but they did not innovate much in the approaches taken to searching. Recently, however, linkage analysis has been incorporated into search engine ranking strategies. Anecdotally, linkage analysis appears to have improved retrieval effectiveness o f web search, yet there is little scientific evidence in support o f the claims for better quality retrieval, which is surprising. Participants in the three most recent TREC conferences (1999, 2000 and 2001) have been invited to perform benchmarking o f information retrieval systems on web data and have had the option o f using linkage information as part of their retrieval strategies. The general consensus from the experiments of these participants is that linkage information has not yet been successfully incorporated into conventional retrieval strategies. In this thesis, we present our research into the field o f linkage-based retrieval of web documents. We illustrate that (moderate) improvements in retrieval performance is possible if the undedying test collection contains a higher link density than the test collections used in the three most recent TREC conferences. We examine the linkage structure o f live data from the WWW and coupled with our findings from crawling sections o f the WWW we present a list o f five requirements for a test collection which is to faithfully support experiments into linkage-based retrieval o f documents from the WWW. We also present some o f our own, new, vanants on linkage-based web retrieval and evaluate their performance in comparison to the approaches o f others
    corecore