5,327 research outputs found

    Eight Degrees of Separation

    Get PDF
    The paper presents a model of network formation where every connected couple gives a contribution to the aggregate payoff, eventually discounted by their distance, and the resources are split between agents through the Myerson value. As equilibrium concept we adopt a refinement of pairwise stability. The only parameters are the number N of agents and a constant cost k for every agent to maintain any single link. This setup shows a wide multiplicity of equilibria, all of them connected, as k ranges over non trivial cases. We are able to show that, for any N, when the equilibrium is a tree (acyclical connected graph), which happens for high k, and there is no decay, the diameter of such a network never exceeds 8 (i.e. there are no two nodes with distance greater than 8). Adopting no decay and studying only trees, we facilitate the analysis but impose worst–case scenarios: we conjecture that the limit of 8 should apply for any possible non–empty equilibrium with any decay function.Network Formation, Myerson Value

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Precision Medicine: Viable Pathways to Address Existing Research Gaps

    Get PDF
    Precision Medicine (PM) seeks to customize medical treatments for patients based on measurable and identifiable characteristics. Unlike personalized medicine, this effort is not intended to result in tailored care for each patient. Instead, this effort seeks to improve overall care within the medical domain by shifting the focus from one-size-fits-all care to optimized care for specified subgroups. In order for the benefits of PM to be expeditiously realized, the diverse skills sets of the scientific community must be brought to bear on the problem. This research effort explores the intersection of quality engineering (QE) and healthcare to outline how existing methodologies within the QE field could support existing PM research goals. Specifically this work examines how to determine the value of patient characteristics for use in disease prediction models with select machine learning algorithms, proposes a method to incorporate patient risk into treatment decisions through the development of performance functions, and investigates the potential impact of incorrect assumptions on estimation methods used in optimization models

    Similarity search and data mining techniques for advanced database systems.

    Get PDF
    Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques. Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects. The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness. The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections. The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets

    Querying and Efficiently Searching Large, Temporal Text Corpora

    Get PDF
    • 

    corecore