9 research outputs found

    Optimization Methods for Cluster Analysis in Network-based Data Mining

    Get PDF
    This dissertation focuses on two optimization problems that arise in network-based data mining, concerning identification of basic community structures (clusters) in graphs: the maximum edge weight clique and maximum induced cluster subgraph problems. We propose a continuous quadratic formulation for the maximum edge weight clique problem, and establish the correspondence between its local optima and maximal cliques in the graph. Subsequently, we present a combinatorial branch-and-bound algorithm for this problem that takes advantage of a polynomial-time solvable nonconvex relaxation of the proposed formulation. We also introduce a linear-time-computable analytic upper bound on the clique number of a graph, as well as a new method of upper-bounding the maximum edge weight clique problem, which leads to another exact algorithm for this problem. For the maximum induced cluster subgraph problem, we present the results of a comprehensive polyhedral analysis. We derive several families of facet-defining valid inequalities for the IUC polytope associated with a graph. We also provide a complete description of this polytope for some special classes of graphs. We establish computational complexity of the separation problems for most of the considered families of valid inequalities, and explore the effectiveness of employing the corresponding cutting planes in an integer (linear) programming framework for the maximum induced cluster subgraph problem

    Graph theoretic generalizations of clique: optimization and extensions

    Get PDF
    This dissertation considers graph theoretic generalizations of the maximum clique problem. Models that were originally proposed in social network analysis literature, are investigated from a mathematical programming perspective for the first time. A social network is usually represented by a graph, and cliques were the first models of "tightly knit groups" in social networks, referred to as cohesive subgroups. Cliques are idealized models and their overly restrictive nature motivated the development of clique relaxations that relax different aspects of a clique. Identifying large cohesive subgroups in social networks has traditionally been used in criminal network analysis to study organized crimes such as terrorism, narcotics and money laundering. More recent applications are in clustering and data mining wireless networks, biological networks as well as graph models of databases and the internet. This research has the potential to impact homeland security, bioinformatics, internet research and telecommunication industry among others. The focus of this dissertation is a degree-based relaxation called k-plex. A distance-based relaxation called k-clique and a diameter-based relaxation called k-club are also investigated in this dissertation. We present the first systematic study of the complexity aspects of these problems and application of mathematical programming techniques in solving them. Graph theoretic properties of the models are identified and used in the development of theory and algorithms. Optimization problems associated with the three models are formulated as binary integer programs and the properties of the associated polytopes are investigated. Facets and valid inequalities are identified based on combinatorial arguments. A branch-and-cut framework is designed and implemented to solve the optimization problems exactly. Specialized preprocessing techniques are developed that, in conjunction with the branch-and-cut algorithm, optimally solve the problems on real-life power law graphs, which is a general class of graphs that include social and biological networks. Computational experiments are performed to study the effectiveness of the proposed solution procedures on benchmark instances and real-life instances. The relationship of these models to the classical maximum clique problem is studied, leading to several interesting observations including a new compact integer programming formulation. We also prove new continuous non-linear formulations for the classical maximum independent set problem which maximize continuous functions over the unit hypercube, and characterize its local and global maxima. Finally, clustering and network design extensions of the clique relaxation models are explored

    EUROCOMB 21 Book of extended abstracts

    Get PDF

    Three Risky Decades: A Time for Econophysics?

    Get PDF
    Our Special Issue we publish at a turning point, which we have not dealt with since World War II. The interconnected long-term global shocks such as the coronavirus pandemic, the war in Ukraine, and catastrophic climate change have imposed significant humanitary, socio-economic, political, and environmental restrictions on the globalization process and all aspects of economic and social life including the existence of individual people. The planet is trapped—the current situation seems to be the prelude to an apocalypse whose long-term effects we will have for decades. Therefore, it urgently requires a concept of the planet's survival to be built—only on this basis can the conditions for its development be created. The Special Issue gives evidence of the state of econophysics before the current situation. Therefore, it can provide excellent econophysics or an inter-and cross-disciplinary starting point of a rational approach to a new era

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Inférence et réseaux complexes

    Get PDF
    Tableau d'honneur de la Faculté des études supérieures et postdoctorales, 2018-2019Les objets d’études de la science moderne sont souvent complexes : sociétés, pandémies, grilles électriques, niches écologiques, etc. La science des réseaux cherche à mieux com- prendre ces systèmes en examinant leur structure. Elle fait abstraction du détail, en rédui- sant tout système à une simple collection de noeuds (les éléments constitutifs du système) connectés par des liens (interactions). Fort d’une vingtaine d’années de recherche, on peut constater que cette approche a mené à de grands succès scientifiques. Cette thèse est consacrée à l’intersection entre la science des réseaux et l’inférence statistique. On y traite de deux problèmes d’inférence classiques : estimation et test d’hypothèses. La partie principale de la thèse est dédiée à l’estimation. Dans un premier temps, on étu- die un modèle génératif bien connu (le modèle stochastique par blocs), développé dans le but d’identifier les régularités de la structure des réseaux complexes. Les contributions origi- nales de cette partie sont (a) l’unification de la grande majorité des méthodes de détection de régularités sous l’égide du modèle par blocs, et (b) une analyse en taille finie de la cohérence de ce modèle. La combinaison de ces analyses place l’ensemble des méthodes de détection de régularités sur des bases statistiques solides. Dans un deuxième temps, on se penche sur le problème de la reconstruction du passé d’un réseau, à partir d’une seule observation. À nouveau, on l’aborde à l’aide de modèles génératifs, le transformant ainsi en un problème d’estimation. Les résultats principaux de cette partie sont des méthodes algorithmiques per- mettant de solutionner la reconstruction efficacement, et l’identification d’une transition de phase dans la qualité de la reconstruction, lorsque le niveau d’inégalité des réseaux étudiés est varié. On se penche finalement sur un traitement par test d’hypothèses des systèmes complexes. Cette partie, plus succincte, est présentée dans un langage mathématique plus général que celui des réseaux, soit celui des complexes simpliciaux. On obtient un modèle aléatoire pour complexe simplicial, ainsi qu’un algorithme d’échantillonnage efficace pour ce modèle. On termine en montrant qu’on peut utiliser ces outils pour tester des hypothèses sur la structure des systèmes complexes réels, via une propriété inaccessible dans la représentation réseau (les groupes d’homologie des complexes).Modern science is often concerned with complex objects of inquiry: intricate webs of social interactions, pandemics, power grids, ecological niches under climatological pressure, etc. When the goal is to gain insights into the function and mechanism of these complex systems, a possible approach is to map their structure using a collection of nodes (the parts of the systems) connected by edges (their interactions). The resulting complex networks capture the structural essence of these systems. Years of successes show that the network abstraction often suffices to understand a plethora of complex phenomena. It can be argued that a principled and rigorous approach to data analysis is chief among the challenges faced by network science today. With this in mind, the goal of this thesis is to tackle a number of important problems at the intersection of network science and statistical inference, of two types: The problems of estimations and the testing of hypotheses. Most of the thesis is devoted to estimation problems. We begin with a thorough analysis of a well-known generative model (the stochastic block model), introduced 40 years ago to identify patterns and regularities in the structure of real networks. The main original con- tributions of this part are (a) the unification of the majority of known regularity detection methods under the stochastic block model, and (b) a thorough characterization of its con- sistency in the finite-size regime. Together, these two contributions put regularity detection methods on firmer statistical foundations. We then turn to a completely different estimation problem: The reconstruction of the past of complex networks, from a single snapshot. The unifying theme is our statistical treatment of this problem, again based on generative model- ing. Our major results are: the inference framework itself; an efficient history reconstruction method; and the discovery of a phase transition in the recoverability of history, driven by inequalities (the more unequal, the harder the reconstruction problem). We conclude with a short section, where we investigate hypothesis testing in complex sys- tems. This epilogue is framed in the broader mathematical context of simplicial complexes, a natural generalization of complex networks. We obtain a random model for these objects, and the associated efficient sampling algorithm. We finish by showing how these tools can be used to test hypotheses about the structure of real systems, using their homology groups
    corecore