1,310 research outputs found

    Subjective interestingness of subgraph patterns

    Get PDF
    The utility of a dense subgraph in gaining a better understanding of a graph has been formalised in numerous ways, each striking a different balance between approximating actual interestingness and computational efficiency. A difficulty in making this trade-off is that, while computational cost of an algorithm is relatively well-defined, a pattern's interestingness is fundamentally subjective. This means that this latter aspect is often treated only informally or neglected, and instead some form of density is used as a proxy. We resolve this difficulty by formalising what makes a dense subgraph pattern interesting to a given user. Unsurprisingly, the resulting measure is dependent on the prior beliefs of the user about the graph. For concreteness, in this paper we consider two cases: one case where the user only has a belief about the overall density of the graph, and another case where the user has prior beliefs about the degrees of the vertices. Furthermore, we illustrate how the resulting interestingness measure is different from previous proposals. We also propose effective exact and approximate algorithms for mining the most interesting dense subgraph according to the proposed measure. Usefully, the proposed interestingness measure and approach lend themselves well to iterative dense subgraph discovery. Contrary to most existing approaches, our method naturally allows subsequently found patterns to be overlapping. The empirical evaluation highlights the properties of the new interestingness measure given different prior belief sets, and our approach's ability to find interesting subgraphs that other methods are unable to find

    Mining subjectively interesting patterns in rich data

    Get PDF

    On Solving Selected Nonlinear Integer Programming Problems in Data Mining, Computational Biology, and Sustainability

    Get PDF
    This thesis consists of three essays concerning the use of optimization techniques to solve four problems in the fields of data mining, computational biology, and sustainable energy devices. To the best of our knowledge, the particular problems we discuss have not been previously addressed using optimization, which is a specific contribution of this dissertation. In particular, we analyze each of the problems to capture their underlying essence, subsequently demonstrating that each problem can be modeled as a nonlinear (mixed) integer program. We then discuss the design and implementation of solution techniques to locate optimal solutions to the aforementioned problems. Running throughout this dissertation is the theme of using mixed-integer programming techniques in conjunction with context-dependent algorithms to identify optimal and previously undiscovered underlying structure

    Introduction revisiting the Argentine crisis a decade on: changes and continuities

    Get PDF
    This introductory chapter to the book "Argentina Since the 2001 Crisis Recovering the Past, Reclaiming the Future" analyses crisis and its associated responses and subsequent recovery in the context of Argentina’s multiple implosion of 2001-02 whilst also assessing its legacies for the country’s social, cultural, economic and political realms during the last decade. It recognises that "crisis" is a term that is much used in the post-Lehman Brothers world and that the subsequent responses and associated recoveries (or lack of) have been the subject of a cascade of academic, government, media, and think-tank investigation ever since. The chapter instead seeks to understand the nature of how crisis and its impacts should be investigated and interrogated, by rejecting false dichotomies of ‘old’ and ‘new’ and synthesising understanding to form an analysis that draws both elements of continuity and elements of change. Secondly, it argues that crisis manifests itself in a number of realms, and that heuristic devices employed to investigate them must subsequently also be drawn from across a range of disciplinary perspectives. Thirdly, it examines how the 2001-02 crisis in Argentina led to a series of responses that both rejected the neoliberal model yet also recovered elements of it. Finally it outlines the structure of the rest of the book, briefly summarising the chapters in turn

    Introduction revisiting the Argentine crisis a decade on: changes and continuities

    Get PDF
    This introductory chapter to the book "Argentina Since the 2001 Crisis Recovering the Past, Reclaiming the Future" analyses crisis and its associated responses and subsequent recovery in the context of Argentina’s multiple implosion of 2001-02 whilst also assessing its legacies for the country’s social, cultural, economic and political realms during the last decade. It recognises that "crisis" is a term that is much used in the post-Lehman Brothers world and that the subsequent responses and associated recoveries (or lack of) have been the subject of a cascade of academic, government, media, and think-tank investigation ever since. The chapter instead seeks to understand the nature of how crisis and its impacts should be investigated and interrogated, by rejecting false dichotomies of ‘old’ and ‘new’ and synthesising understanding to form an analysis that draws both elements of continuity and elements of change. Secondly, it argues that crisis manifests itself in a number of realms, and that heuristic devices employed to investigate them must subsequently also be drawn from across a range of disciplinary perspectives. Thirdly, it examines how the 2001-02 crisis in Argentina led to a series of responses that both rejected the neoliberal model yet also recovered elements of it. Finally it outlines the structure of the rest of the book, briefly summarising the chapters in turn

    Shaping Online Dialogue: Examining How Community Rules Affect Discussion Structures on Reddit

    Full text link
    Community rules play a key part in enabling or constraining the behaviors of members in online communities. However, little is unknown regarding whether and to what degree changing rules actually affects community dynamics. In this paper, we seek to understand how these behavior-governing rules shape the interactions between users, as well as the structure of their discussion. Using the top communities on Reddit (i.e. subreddits), we first contribute a taxonomy of behavior-based rule categories across Reddit. Then, we use a network analysis perspective to discover how changing implementation of different rule categories affects subreddits' user interaction and discussion networks over a 1.5 year period. Our study find several significant effects, including greater clustering among users when subreddits increase rules focused on structural regulation and how restricting allowable content surprisingly leads to more interactions between users. Our findings contribute to research in proactive moderation through rule setting, as well as lend valuable insights for online community designers and moderators to achieve desired community dynamics

    Untangling Neoliberalism’s Gordian Knot: Cancer Prevention and Control Services for Rural Appalachian Populations

    Get PDF
    In eastern Kentucky, as in much of central Appalachia, current local storylines narrate the frictions and contradictions involved in the structural transition from a post-WWII Fordist industrial economy and a Keynesian welfare state to a Post-Fordist service economy and Neoliberal hollow state, starving for energy to sustain consumer indulgence (Jessop, 1993; Harvey, 2003; 2005). Neoliberalism is the ideological force redefining the “societal infrastructure of language” that legitimates this transition, in part by redefining the key terms of democracy and citizenship, as well as valorizing the market, the individual, and technocratic innovation (Chouliaraki & Fairclough, 1999; Harvey, 2005). This project develops a perspective that understands cancer prevention and control in Appalachiaas part of the structural transition that is realigning community social ties in relation to ideological forces deployed as “commonsense” storylines that “lubricate” frictions that complicates the transition

    Inferring hidden features in the Internet (PhD thesis)

    Full text link
    The Internet is a large-scale decentralized system that is composed of thousands of independent networks. In this system, there are two main components, interdomain routing and traffic, that are vital inputs for many tasks such as traffic engineering, security, and business intelligence. However, due to the decentralized structure of the Internet, global knowledge of both interdomain routing and traffic is hard to come by. In this dissertation, we address a set of statistical inference problems with the goal of extending the knowledge of the interdomain-level Internet. In the first part of this dissertation we investigate the relationship between the interdomain topology and an individual network’s inference ability. We first frame the questions through abstract analysis of idealized topologies, and then use actual routing measurements and topologies to study the ability of real networks to infer traffic flows. In the second part, we study the ability of networks to identify which paths flow through their network. We first discuss that answering this question is surprisingly hard due to the design of interdomain routing systems where each network can learn only a limited set of routes. Therefore, network operators have to rely on observed traffic. However, observed traffic can only identify that a particular route passes through its network but not that a route does not pass through its network. In order to solve the routing inference problem, we propose a nonparametric inference technique that works quite accurately. The key idea behind our technique is measuring the distances between destinations. In order to accomplish that, we define a metric called Routing State Distance (RSD) to measure distances in terms of routing similarity. Finally, in the third part, we study our new metric, RSD in detail. Using RSD we address an important and difficult problem of characterizing the set of paths between networks. The collection of the paths across networks is a great source to understand important phenomena in the Internet as path selections are driven by the economic and performance considerations of the networks. We show that RSD has a number of appealing properties that can discover these hidden phenomena
    corecore