686 research outputs found

    Influence, originality and similarity in directed acyclic graphs

    Get PDF
    We introduce a framework for network analysis based on random walks on directed acyclic graphs where the probability of passing through a given node is the key ingredient. We illustrate its use in evaluating the mutual influence of nodes and discovering seminal papers in a citation network. We further introduce a new similarity metric and test it in a simple personalized recommendation process. This metric's performance is comparable to that of classical similarity metrics, thus further supporting the validity of our framework.Comment: 6 pages, 4 figure

    We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections

    Get PDF
    We live in an era in which the ways that we can make sense of our past are evolving as more artifacts from that past become digital. At the same time, the responsibilities of traditional gatekeepers who have negotiated the ethics of historical data collection and use, such as librarians and archivists, are increasingly being sidelined by the system builders who decide whether and how to provide access to historical digital collections, often without sufficient reflection on the ethical issues at hand. It is our aim to better prepare system builders to grapple with these issues. This paper focuses discussions around one such digital collection from the dawn of the web, asking what sorts of analyses can and should be conducted on archival copies of the GeoCities web hosting platform that dates to 1994.This research was supported by the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the US National Science Foundation (grants 1618695 and 1704369), the Andrew W. Mellon Foundation, Start Smart Labs, and Compute Canada

    Multiscale mixing patterns in networks

    Full text link
    Assortative mixing in networks is the tendency for nodes with the same attributes, or metadata, to link to each other. It is a property often found in social networks manifesting as a higher tendency of links occurring between people with the same age, race, or political belief. Quantifying the level of assortativity or disassortativity (the preference of linking to nodes with different attributes) can shed light on the factors involved in the formation of links and contagion processes in complex networks. It is common practice to measure the level of assortativity according to the assortativity coefficient, or modularity in the case of discrete-valued metadata. This global value is the average level of assortativity across the network and may not be a representative statistic when mixing patterns are heterogeneous. For example, a social network spanning the globe may exhibit local differences in mixing patterns as a consequence of differences in cultural norms. Here, we introduce an approach to localise this global measure so that we can describe the assortativity, across multiple scales, at the node level. Consequently we are able to capture and qualitatively evaluate the distribution of mixing patterns in the network. We find that for many real-world networks the distribution of assortativity is skewed, overdispersed and multimodal. Our method provides a clearer lens through which we can more closely examine mixing patterns in networks.Comment: 11 pages, 7 figure

    Using Incomplete Information for Complete Weight Annotation of Road Networks -- Extended Version

    Full text link
    We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.Comment: This is an extended version of "Using Incomplete Information for Complete Weight Annotation of Road Networks," which is accepted for publication in IEEE TKD

    Mining of Textual Data from the Web for Speech Recognition

    Get PDF
    Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.The preliminary goals of this project were to get familiar with language modeling for speech recognition and techniques for acquisition of text data from the Web. Speech recognition techniques are introduced and statistical language modeling is described in detail. The text also covers mining models and techniques, information retrieval especially. Specific problems of Web mining are discussed and Google search is introduced. Special attention was paid to detailed description of implementation of the text mining system. However, the main goal of this work was to determine, whether the data acquired from the Web can provide some improvement into the recognition systems. The text is describing experiments, which use the retrieved Web data to update sample language models.

    Improving Search Engine Results by Query Extension and Categorization

    Get PDF
    Since its emergence, the Internet has changed the way in which information is distributed and it has strongly influenced how people communicate. Nowadays, Web search engines are widely used to locate information on the Web, and online social networks have become pervasive platforms of communication. Retrieving relevant Web pages in response to a query is not an easy task for Web search engines due to the enormous corpus of data that the Web stores and the inherent ambiguity of search queries. We present two approaches to improve the effectiveness of Web search engines. The first approach allows us to retrieve more Web pages relevant to a user\u27s query by extending the query to include synonyms and other variations. The second, gives us the ability to retrieve Web pages that more precisely reflect the user\u27s intentions by filtering out those pages which are not related to the user-specified interests. Discovering communities in online social networks (OSNs) has attracted much attention in recent years. We introduce the concept of subject-driven communities and propose to discover such communities by modeling a community using a posting/commenting interaction graph which is relevant to a given subject of interest, and then applying link analysis on the interaction graph to locate the core members of a community
    • …
    corecore