86 research outputs found

    Finding similar stocks by detecting cliques in market graphs

    Get PDF
    The stock market provides an abundant source of data. However, when the amount of raw data becomes overwhelming it grows increasingly difficult to know how the stocks interact with each other. Stock data visualization as a market graph serves as one of the most popular way of summarizing important information. When modelling the data as a graph, vertices correspond to stocks and edges correspond to strong correlation in their pricing in a certain period of time. This project presents a technique to find stocks that behave very similarly. Such information helps investors make decisions on which stocks to purchase next. The investors can utilize this information to select a valuable portfolio of stocks showing an increasing price trend. On the other hand, it can also help stock owners to make decisions on whether or not they should sell their stocks

    Market Graph Clustering Via QUBO and Digital Annealing

    Full text link
    Our goal is to find representative nodes of a market graph that best replicate the returns of a broader market graph (index), a common task in the financial industry. We model our reference index as a market graph and express the index tracking problem in a quadratic K-medoids form. We take advantage of a purpose built hardware architecture, the Fujitsu Digital Annealer, to circumvent the NP-hard nature of the problem and solve our formulation efficiently. In this article, we combine three separate areas of the literature, market graph models, K-medoid clustering and quadratic binary optimization modeling, to formulate the index-tracking problem as a quadratic K-medoid graph-clustering problem. Our initial results show we accurately replicate the returns of a broad market index, using only a small subset of its constituent assets. Moreover, our quadratic formulation allows us to take advantage of recent hardware advances, to overcome the NP-hard nature of the problem.Comment: 9 pages, 1 figure, 4 subfigure

    Novel approaches for solving large-scale optimization problems on graphs

    Get PDF
    This dissertation considers a class of closely related NP-hard otpimization problems on graphs that arise in many important applications, including network-based data mining, analysis of the stock market, social networks, coding theory, fault diagnosis, molecular biology, biochemistry and genomics. In particular, the problems of interest include the classical maximum independent set problem (MISP) and maximum clique problem (MCP), their vertex-weighted vesrions, as well as novel optimization models that can be viewed as practical relaxations of their classical counterparts. The concept of clique has been a popular instrument in analysis of networks, and is, essentially, an idealized model of a “closely connected group”, or a cluster. But, at the same time, the restrictive nature of the definition of clique makes the clique model impractical in many applications. This motivated the development of clique relaxation models that relax different properties of a clique. On the one hand, while still possessing some clique-like properties, clique relaxations are not as “perfect” as cliques; and on the other hand, they do not exhibit the disadvantages associated with a clique. Using clique relaxations allows one to compromise between perfectness and flexibility, between ideality and reality, which is a usual issue that an engineer deals with when applying theoretical knowledge to solve practical problems in industry. The clique relaxation models studied in this dissertation were first proposed in the literature on social network analysis, however they have not been well investigated from a mathematical programming perspective. This dissertation considers new techniques for solving the MWISP and clique relaxation problems and investigates their effectiveness from theoretical and computational perspectives. The main results obtained in this work include (i) developing a scale-reduction approach for MWISP based on the concept of critical set and comparing it theoretically with other approaches; (ii) obtaining theoretical complexity results for clique relaxation problems; (iii) developing algorithms for solving the clique relaxation problems exactly; (iv) carrying out computational experiments to demonstrate the performance of the proposed approaches, and, finally, (v) applying the obtained theoretical results to several real-life problems

    Risk-Averse Matchings over Uncertain Graph Databases

    Full text link
    A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a 13\frac{1}{3}-approximation algorithm, and a 15\frac{1}{5}-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a Ω(1k)\Omega(\frac{1}{k})-approximation algorithm, where kk is the rank of the hypergraph (i.e., any hyperedge includes at most kk nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page

    Decomposition algorithms for detecting low-diameter clusters in graphs

    Get PDF
    Detecting low-diameter clusters in graphs is an effective graph-based data mining technique, which has been used to find cohesive subgraphs in a variety of graph models of data. Low pairwise distances within a cluster can facilitate fast communication or good reachability between vertices in the cluster. A k-club is a subset of vertices, which induces a subgraph of diameter at most k. For low values of the parameter k, this model offers a graph-theoretic relaxation of the clique model that formalizes the notion of a low-diameter cluster. The maximum k-club problem is to find a k-club with maximum cardinality in a given graph. The goals of this study are focused on developing decomposition and cutting plane methods for the maximum k-club problem for arbitrary k.Two compact integer programming formulations for the maximum k-club problem were presented by other researchers. These formulations are very effective integer programming approaches presently available to solve the maximum k-club problem for any given value of k. Using model decomposition techniques, we demonstrate how the fundamental optimization problem of finding a maximum size k-club can be solved optimally on large-scale benchmark instances. Our approach circumvents the use of complicated formulations in favor of a simple relaxation based on necessary conditions, combined with canonical hypercube cuts introduced by Balas and Jeroslow. Next, we demonstrate that by using a delayed constraint generation approach in a branch-and-cut algorithm, we can significantly speed-up the performance of an integer programming solver over the direct solution of the implementation of either formulation.Then, we study the problem of detecting large risk-averse 2-clubs in graphs subject to probabilistic edge failures. To achieve risk aversion, we first model the loss in 2-club property due to probabilistic edge failures as a function of the decision (chosen 2-club cluster) and randomness (graph structure). Then, we utilize the conditional value-at-risk of the loss for a given decision as a quantitative measure of risk, which is bounded in the stochastic optimization model. A sequential cutting plane method that solves a series of mixed integer linear programs is developed for solving this problem

    Low-Diameter Clusters in Network Analysis

    Get PDF
    In this dissertation, we introduce several novel tools for cluster-based analysis of complex systems and design solution approaches to solve the corresponding optimization problems. Cluster-based analysis is a subfield of network analysis which utilizes a graph representation of a system to yield meaningful insight into the system structure and functions. Clusters with low diameter are commonly used to characterize cohesive groups in applications for which easy reachability between group members is of high importance. Low-diameter clusters can be mathematically formalized using a clique and an s-club (with relatively small values of s), two concepts from graph theory. A clique is a subset of vertices adjacent to each other and an s-club is a subset of vertices inducing a subgraph with a diameter of at most s. A clique is actually a special case of an s-club with s = 1, hence, having the shortest possible diameter. Two topics of this dissertation focus on graphs prone to uncertainty and disruptions, and introduce several extensions of low-diameter models. First, we introduce a robust clique model in graphs where edges may fail with a certain probability and robustness is enforced using appropriate risk measures. With regard to its ability to capture underlying system uncertainties, finding the largest robust clique is a better alternative to the problem of finding the largest clique. Moreover, it is also a hard combinatorial optimization problem, requiring some effective solution techniques. To this aim, we design several heuristic approaches for detection of large robust cliques and compare their performance. Next, we consider graphs for which uncertainty is not explicitly defined, studying connectivity properties of 2-clubs. We notice that a 2-club can be very vulnerable to disruptions, so we enhance it by reinforcing additional requirements on connectivity and introduce a biconnected 2-club concept. Additionally, we look at the weak 2-club counterpart which we call a fragile 2-club (defined as a 2-club that is not biconnected). The size of the largest biconnected 2-club in a graph can help measure overall system reachability and connectivity, whereas the largest fragile 2-club can identify vulnerable parts of the graph. We show that the problem of finding the largest fragile 2-club is polynomially solvable whereas the problem of finding the largest biconnected 2-club is NP-hard. Furthermore, for the former, we design a polynomial time algorithm and for the latter - combinatorial branch-and-bound and branch-and-cut algorithms. Lastly, we once again consider the s-club concept but shift our focus from finding the largest s-club in a graph to the problem of partitioning the graph into the smallest number of non-overlapping s-clubs. This problem cannot only be applied to derive communities in the graph, but also to reduce the size of the graph and derive its hierarchical structure. The problem of finding the minimum s-club partitioning is a hard combinatorial optimization problem with proven complexity results and is also very hard to solve in practice. We design a branch-and-bound combinatorial optimization algorithm and test it on the problem of minimum 2-club partitioning
    • …
    corecore