383 research outputs found

    Low-Diameter Clusters in Network Analysis

    Get PDF
    In this dissertation, we introduce several novel tools for cluster-based analysis of complex systems and design solution approaches to solve the corresponding optimization problems. Cluster-based analysis is a subfield of network analysis which utilizes a graph representation of a system to yield meaningful insight into the system structure and functions. Clusters with low diameter are commonly used to characterize cohesive groups in applications for which easy reachability between group members is of high importance. Low-diameter clusters can be mathematically formalized using a clique and an s-club (with relatively small values of s), two concepts from graph theory. A clique is a subset of vertices adjacent to each other and an s-club is a subset of vertices inducing a subgraph with a diameter of at most s. A clique is actually a special case of an s-club with s = 1, hence, having the shortest possible diameter. Two topics of this dissertation focus on graphs prone to uncertainty and disruptions, and introduce several extensions of low-diameter models. First, we introduce a robust clique model in graphs where edges may fail with a certain probability and robustness is enforced using appropriate risk measures. With regard to its ability to capture underlying system uncertainties, finding the largest robust clique is a better alternative to the problem of finding the largest clique. Moreover, it is also a hard combinatorial optimization problem, requiring some effective solution techniques. To this aim, we design several heuristic approaches for detection of large robust cliques and compare their performance. Next, we consider graphs for which uncertainty is not explicitly defined, studying connectivity properties of 2-clubs. We notice that a 2-club can be very vulnerable to disruptions, so we enhance it by reinforcing additional requirements on connectivity and introduce a biconnected 2-club concept. Additionally, we look at the weak 2-club counterpart which we call a fragile 2-club (defined as a 2-club that is not biconnected). The size of the largest biconnected 2-club in a graph can help measure overall system reachability and connectivity, whereas the largest fragile 2-club can identify vulnerable parts of the graph. We show that the problem of finding the largest fragile 2-club is polynomially solvable whereas the problem of finding the largest biconnected 2-club is NP-hard. Furthermore, for the former, we design a polynomial time algorithm and for the latter - combinatorial branch-and-bound and branch-and-cut algorithms. Lastly, we once again consider the s-club concept but shift our focus from finding the largest s-club in a graph to the problem of partitioning the graph into the smallest number of non-overlapping s-clubs. This problem cannot only be applied to derive communities in the graph, but also to reduce the size of the graph and derive its hierarchical structure. The problem of finding the minimum s-club partitioning is a hard combinatorial optimization problem with proven complexity results and is also very hard to solve in practice. We design a branch-and-bound combinatorial optimization algorithm and test it on the problem of minimum 2-club partitioning

    Decomposition algorithms for detecting low-diameter clusters in graphs

    Get PDF
    Detecting low-diameter clusters in graphs is an effective graph-based data mining technique, which has been used to find cohesive subgraphs in a variety of graph models of data. Low pairwise distances within a cluster can facilitate fast communication or good reachability between vertices in the cluster. A k-club is a subset of vertices, which induces a subgraph of diameter at most k. For low values of the parameter k, this model offers a graph-theoretic relaxation of the clique model that formalizes the notion of a low-diameter cluster. The maximum k-club problem is to find a k-club with maximum cardinality in a given graph. The goals of this study are focused on developing decomposition and cutting plane methods for the maximum k-club problem for arbitrary k.Two compact integer programming formulations for the maximum k-club problem were presented by other researchers. These formulations are very effective integer programming approaches presently available to solve the maximum k-club problem for any given value of k. Using model decomposition techniques, we demonstrate how the fundamental optimization problem of finding a maximum size k-club can be solved optimally on large-scale benchmark instances. Our approach circumvents the use of complicated formulations in favor of a simple relaxation based on necessary conditions, combined with canonical hypercube cuts introduced by Balas and Jeroslow. Next, we demonstrate that by using a delayed constraint generation approach in a branch-and-cut algorithm, we can significantly speed-up the performance of an integer programming solver over the direct solution of the implementation of either formulation.Then, we study the problem of detecting large risk-averse 2-clubs in graphs subject to probabilistic edge failures. To achieve risk aversion, we first model the loss in 2-club property due to probabilistic edge failures as a function of the decision (chosen 2-club cluster) and randomness (graph structure). Then, we utilize the conditional value-at-risk of the loss for a given decision as a quantitative measure of risk, which is bounded in the stochastic optimization model. A sequential cutting plane method that solves a series of mixed integer linear programs is developed for solving this problem

    Detecting resilient structures in stochastic networks: A two-stage stochastic optimization approach

    Get PDF
    We propose a two-stage stochastic programming framework for designing or identifying "resilient," or "reparable" structures in graphs whose topology may undergo a stochastic transformation. The reparability of a subgraph satisfying a given property is defined in terms of a budget constraint, which allows for a prescribed number of vertices to be added to or removed from the subgraph so as to restore its structural properties after the observation of random changes to the graph's set of edges. A two-stage stochastic programming model is formulated and is shown to be N P -complete for a broad range of graph-theoretical properties that the resilient subgraph is required to satisfy. A general combinatorial branch-and-bound algorithm is developed, and its computational performance is illustrated on the example of a two-stage stochastic maximum clique problem. © 2016 Wiley Periodicals, Inc. NETWORKS, 201

    USING PROBABILISTIC GRAPHICAL MODELS TO DRAW INFERENCES IN SENSOR NETWORKS WITH TRACKING APPLICATIONS

    Get PDF
    Sensor networks have been an active research area in the past decade due to the variety of their applications. Many research studies have been conducted to solve the problems underlying the middleware services of sensor networks, such as self-deployment, self-localization, and synchronization. With the provided middleware services, sensor networks have grown into a mature technology to be used as a detection and surveillance paradigm for many real-world applications. The individual sensors are small in size. Thus, they can be deployed in areas with limited space to make unobstructed measurements in locations where the traditional centralized systems would have trouble to reach. However, there are a few physical limitations to sensor networks, which can prevent sensors from performing at their maximum potential. Individual sensors have limited power supply, the wireless band can get very cluttered when multiple sensors try to transmit at the same time. Furthermore, the individual sensors have limited communication range, so the network may not have a 1-hop communication topology and routing can be a problem in many cases. Carefully designed algorithms can alleviate the physical limitations of sensor networks, and allow them to be utilized to their full potential. Graphical models are an intuitive choice for designing sensor network algorithms. This thesis focuses on a classic application in sensor networks, detecting and tracking of targets. It develops feasible inference techniques for sensor networks using statistical graphical model inference, binary sensor detection, events isolation and dynamic clustering. The main strategy is to use only binary data for rough global inferences, and then dynamically form small scale clusters around the target for detailed computations. This framework is then extended to network topology manipulation, so that the framework developed can be applied to tracking in different network topology settings. Finally the system was tested in both simulation and real-world environments. The simulations were performed on various network topologies, from regularly distributed networks to randomly distributed networks. The results show that the algorithm performs well in randomly distributed networks, and hence requires minimum deployment effort. The experiments were carried out in both corridor and open space settings. A in-home falling detection system was simulated with real-world settings, it was setup with 30 bumblebee radars and 30 ultrasonic sensors driven by TI EZ430-RF2500 boards scanning a typical 800 sqft apartment. Bumblebee radars are calibrated to detect the falling of human body, and the two-tier tracking algorithm is used on the ultrasonic sensors to track the location of the elderly people

    Polyhedral Combinatorics, Complexity & Algorithms for k-Clubs in Graphs

    Get PDF
    A k-club is a distance-based graph-theoretic generalization of clique, originally introduced to model cohesive subgroups in social network analysis. The k-clubs represent low diameter clusters in graphs and are suitable for various graph-based data mining applications. Unlike cliques, the k-club model is nonhereditary, meaning every subset of a k-club is not necessarily a k-club. This imposes significant challenges in developing theory and algorithms for optimization problems associated with k-clubs.We settle an open problem establishing the intractability of testing inclusion-wise maximality of k-clubs for fixed k>=2. This result is in contrast to polynomial-time verifiability of maximal cliques, and is a direct consequence of k-clubs' nonhereditary nature. A class of graphs for which this problem is polynomial-time solvable is also identified. We propose a distance coloring based upper-bounding scheme and a bounded enumeration based lower-bounding routine and employ them in a combinatorial branch-and-bound algorithm for finding a maximum k-club. Computational results on graphs with up to 200 vertices are also provided.The 2-club polytope of a graph is studied and a new family of facet inducing inequalities for this polytope is discovered. This family of facets strictly contains all known nontrivial facets of the 2-club polytope as special cases, and identifies previously unknown facets of this polytope. The separation complexity of these newly discovered facets is proved to be NP-complete and it is shown that the 2-club polytope of trees can be completely described by the collection of these facets along with the nonnegativity constraints.We also studied the maximum 2-club problem under uncertainty. Given a random graph subject to probabilistic edge failures, we are interested in finding a large "risk-averse" 2-club. Here, risk-aversion is achieved via modeling the loss in 2-club property due to edge failures, as random loss, which is a function of the decision variables and uncertain parameters. Conditional Value-at-Risk (CVaR) is used as a quantitative measure of risk that is constrained in the model. Benders' decomposition scheme is utilized to develop a new decomposition algorithm for solving the CVaR constrainedmaximum 2-club problem. A preliminary experiment is also conducted to compare the computational performance of the developed algorithm with our extension of an existing algorithm from the literature.Industrial Engineering & Managemen

    Theoretical Tools for Network Analysis: Game Theory, Graph Centrality, and Statistical Inference.

    Full text link
    A computer-driven data explosion has made the difficulty of interpreting large data sets of interconnected entities ever more salient. My work focuses on theoretical tools for summarizing, analyzing, and understanding network data sets, or data sets of things and their pairwise connections. I address four network science issues, improving our ability to analyze networks from a variety of domains. I first show that the sophistication of game-theoretic agent decision making can crucially effect network cascades: differing decision making assumptions can lead to dramatically different cascade outcomes. This highlights the importance of diligence when making assumptions about agent behavior on networks and in general. I next analytically demonstrate a significant irregularity in the popular eigenvector centrality, and propose a new spectral centrality measure, nonbacktracking centrality, showing that it avoids this irregularity. This tool contributes a more robust way of ranking nodes, as well as an additional mathematical understanding of the effects of network localization. I next give a new model for uncertain networks, networks in which one has no access to true network data but instead observes only probabilistic information about edge existence. I give a fast maximum-likelihood algorithm for recovering edges and communities in this model, and show that it outperforms a typical approach of thresholding to an unweighted network. This model gives a better tool for understanding and analyzing real-world uncertain networks such as those arising in the experimental sciences. Lastly, I give a new lens for understanding scientific literature, specifically as a hybrid coauthorship and citation network. I use this for exploratory analysis of the Physical Review journals over a hundred-year period, and I make new observations about the interplay between these two networks and how this relationship has changed over time.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133463/1/travisbm_1.pd

    Scalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases

    Get PDF
    The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructures such as RFIDs, sensor networks and web-based applications such as information extraction, data integration, social networking and so on. In my dissertation, I addressed several challenges in managing such data and developed algorithms for efficiently executing queries over large volumes of such data. Specifically, I focused on the following challenges. First, for meaningful analysis of such data, we need the ability to remove noise and infer useful information from uncertain data. To address this challenge, I first developed a declarative system for applying dynamic probabilistic models to databases and data streams. The output of such probabilistic modeling is probabilistic data, i.e., data annotated with probabilities of correctness/existence. Often, the data also exhibits strong correlations. Although there is prior work in managing and querying such probabilistic data using probabilistic databases, those approaches largely assume independence and cannot handle probabilistic data with rich correlation structures. Hence, I built a probabilistic database system that can manage large-scale correlations and developed algorithms for efficient query evaluation. Our system allows users to provide uncertain data as input and to specify arbitrary correlations among the entries in the database. In the back end, we represent correlations as a forest of junction trees, an alternative representation for probabilistic graphical models (PGM). We execute queries over the probabilistic database by transforming them into message passing algorithms (inference) over the junction tree. However, traditional algorithms over junction trees typically require accessing the entire tree, even for small queries. Hence, I developed an index data structure over the junction tree called INDSEP that allows us to circumvent this process and thereby scalably evaluate inference queries, aggregation queries and SQL queries over the probabilistic database. Finally, query evaluation in probabilistic databases typically returns output tuples along with their probability values. However, the existing query evaluation model provides very little intuition to the users: for instance, a user might want to know Why is this tuple in my result? or Why does this output tuple have such high probability? or Which are the most influential input tuples for my query ?'' Hence, I designed a query evaluation model, and a suite of algorithms, that provide users with explanations for query results, and enable users to perform sensitivity analysis to better understand the query results

    Holistic Temporal Situation Interpretation for Traffic Participant Prediction

    Get PDF
    For a profound understanding of traffic situations including a prediction of traf- fic participants’ future motion, behaviors and routes it is crucial to incorporate all available environmental observations. The presence of sensor noise and depen- dency uncertainties, the variety of available sensor data, the complexity of large traffic scenes and the large number of different estimation tasks with diverging requirements require a general method that gives a robust foundation for the de- velopment of estimation applications. In this work, a general description language, called Object-Oriented Factor Graph Modeling Language (OOFGML), is proposed, that unifies formulation of esti- mation tasks from the application-oriented problem description via the choice of variable and probability distribution representation through to the inference method definition in implementation. The different language properties are dis- cussed theoretically using abstract examples. The derivation of explicit application examples is shown for the automated driv- ing domain. A domain-specific ontology is defined which forms the basis for four exemplary applications covering the broad spectrum of estimation tasks in this domain: Basic temporal filtering, ego vehicle localization using advanced interpretations of perceived objects, road layout perception utilizing inter-object dependencies and finally highly integrated route, behavior and motion estima- tion to predict traffic participant’s future actions. All applications are evaluated as proof of concept and provide an example of how their class of estimation tasks can be represented using the proposed language. The language serves as a com- mon basis and opens a new field for further research towards holistic solutions for automated driving

    Error handling in multimodal voice-enabled interfaces of tour-guide robots using graphical models

    Get PDF
    Mobile service robots are going to play an increasing role in the society of humans. Voice-enabled interaction with service robots becomes very important, if such robots are to be deployed in real-world environments and accepted by the vast majority of potential human users. The research presented in this thesis addresses the problem of speech recognition integration in an interactive voice-enabled interface of a service robot, in particular a tour-guide robot. The task of a tour-guide robot is to engage visitors to mass exhibitions (users) in dialogue providing the services it is designed for (e.g. exhibit presentations) within a limited time. In managing tour-guide dialogues, extracting the user goal (intention) for requesting a particular service at each dialogue state is the key issue. In mass exhibition conditions speech recognition errors are inevitable because of noisy speech and uncooperative users of robots with no prior experience in robotics. They can jeopardize the user goal identification. Wrongly identified user goals can lead to communication failures. Therefore, to reduce the risk of such failures, methods for detecting and compensating for communication failures in human-robot dialogue are needed. During the short-term interaction with visitors, the interpretation of the user goal at each dialogue state can be improved by combining speech recognition in the speech modality with information from other available robot modalities. The methods presented in this thesis exploit probabilistic models for fusing information from speech and auxiliary modalities of the robot for user goal identification and communication failure detection. To compensate for the detected communication failures we investigate multimodal methods for recovery from communication failures. To model the process of modality fusion, taking into account the uncertainties in the information extracted from each input modality during human-robot interaction, we use the probabilistic framework of Bayesian networks. Bayesian networks are graphical models that represent a joint probability function over a set of random variables. They are used to model the dependencies among variables associated with the user goals, modality related events (e.g. the event of user presence that is inferred from the laser scanner modality of the robot), and observed modality features providing evidence in favor of these modality events. Bayesian networks are used to calculate posterior probabilities over the possible user goals at each dialogue state. These probabilities serve as a base in deciding if the user goal is valid, i.e. if it can be mapped into a tour-guide service (e.g. exhibit presentation) or is undefined – signaling a possible communication failure. The Bayesian network can be also used to elicit probabilities over the modality events revealing information about the possible cause for a communication failure. Introducing new user goal aspects (e.g. new modality events and related features) that provide auxiliary information for detecting communication failures makes the design process cumbersome, calling for a systematic approach in the Bayesian network modelling. Generally, introducing new variables for user goal identification in the Bayesian networks can lead to complex and computationally expensive models. In order to make the design process more systematic and modular, we adapt principles from the theory of grounding in human communication. When people communicate, they resolve understanding problems in a collaborative joint effort of providing evidence of common shared knowledge (grounding). We use Bayesian network topologies, tailored to limited computational resources, to model a state-based grounding model fusing information from three different input modalities (laser, video and speech) to infer possible grounding states. These grounding states are associated with modality events showing if the user is present in range for communication, if the user is attending to the interaction, whether the speech modality is reliable, and if the user goal is valid. The state-based grounding model is used to compute probabilities that intermediary grounding states have been reached. This serves as a base for detecting if the the user has reached the final grounding state, or wether a repair dialogue sequence is needed. In the case of a repair dialogue sequence, the tour-guide robot can exploit the multiple available modalities along with speech. For example, if the user has failed to reach the grounding state related to her/his presence in range for communication, the robot can use its move modality to search and attract the attention of the visitors. In the case when speech recognition is detected to be unreliable, the robot can offer the alternative use of the buttons modality in the repair sequence. Given the probability of each grounding state, and the dialogue sequence that can be executed in the next dialogue state, a tour-guide robot has different preferences on the possible dialogue continuation. If the possible dialogue sequences at each dialogue state are defined as actions, the introduced principle of maximum expected utility (MEU) provides an explicit way of action selection, based on the action utility, given the evidence about the user goal at each dialogue state. Decision networks, constructed as graphical models based on Bayesian networks are proposed to perform MEU-based decisions, incorporating the utility of the actions to be chosen at each dialogue state by the tour-guide robot. These action utilities are defined taking into account the tour-guide task requirements. The proposed graphical models for user goal identification and dialogue error handling in human-robot dialogue are evaluated in experiments with multimodal data. These data were collected during the operation of the tour-guide robot RoboX at the Autonomous System Lab of EPFL and at the Swiss National Exhibition in 2002 (Expo.02). The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation. On the component level, the technical evaluation is done by calculating accuracies, as objective measures of the performance of the grounding model, and the resulting performance of the user goal identification in dialogue. The benefit of the proposed error handling framework is demonstrated comparing the accuracy of a baseline interactive system, employing only speech recognition for user goal identification, and a system equipped with multimodal grounding models for error handling
    • …
    corecore