148 research outputs found
Low-Diameter Clusters in Network Analysis
In this dissertation, we introduce several novel tools for cluster-based analysis of complex systems and design solution approaches to solve the corresponding optimization problems. Cluster-based analysis is a subfield of network analysis which utilizes a graph representation of a system to yield meaningful insight into the system structure and functions. Clusters with low diameter are commonly used to characterize cohesive groups in applications for which easy reachability between group members is of high importance. Low-diameter clusters can be mathematically formalized using a clique and an s-club (with relatively small values of s), two concepts from graph theory. A clique is a subset of vertices adjacent to each other and an s-club is a subset of vertices inducing a subgraph with a diameter of at most s. A clique is actually a special case of an s-club with s = 1, hence, having the shortest possible diameter.
Two topics of this dissertation focus on graphs prone to uncertainty and disruptions, and introduce several extensions of low-diameter models. First, we introduce a robust clique model in graphs where edges may fail with a certain probability and robustness is enforced using appropriate risk measures. With regard to its ability to capture underlying system uncertainties, finding the largest robust clique is a better alternative to the problem of finding the largest clique. Moreover, it is also a hard combinatorial optimization problem, requiring some effective solution techniques. To this aim, we design several heuristic approaches for detection of large robust cliques and compare their performance.
Next, we consider graphs for which uncertainty is not explicitly defined, studying connectivity properties of 2-clubs. We notice that a 2-club can be very vulnerable to disruptions, so we enhance it by reinforcing additional requirements on connectivity and introduce a biconnected 2-club concept. Additionally, we look at the weak 2-club counterpart which we call a fragile 2-club (defined as a 2-club that is not biconnected). The size of the largest biconnected 2-club in a graph can help measure overall system reachability and connectivity, whereas the largest fragile 2-club can identify vulnerable parts of the graph. We show that the problem of finding the largest fragile 2-club is polynomially solvable whereas the problem of finding the largest biconnected 2-club is NP-hard. Furthermore, for the former, we design a polynomial time algorithm and for the latter - combinatorial branch-and-bound and branch-and-cut algorithms.
Lastly, we once again consider the s-club concept but shift our focus from finding the largest s-club in a graph to the problem of partitioning the graph into the smallest number of non-overlapping s-clubs. This problem cannot only be applied to derive communities in the graph, but also to reduce the size of the graph and derive its hierarchical structure. The problem of finding the minimum s-club partitioning is a hard combinatorial optimization problem with proven complexity results and is also very hard to solve in practice. We design a branch-and-bound combinatorial optimization algorithm and test it on the problem of minimum 2-club partitioning
Recommended from our members
Mapping unstructured mesh codes onto local memory parallel architectures
Initial work on mapping CFD codes onto parallel systems focused upon software which employed structured meshes. Increasingly, many large scale CFD codes are being based upon unstructured meshes. One of the key problems when implementing such large scale unstructured problems on a distributed memory machine is the question of how to partition the underlying computational domain efficiently. It is important that all processors are kept busy for as large a proportion of the time as possible and that the amount, level and frequency of communication should be kept to a minimum.
Proposed techniques for solving the mapping problem have separated out the solution into two distinct phases. The first phase is to partition the computational domain into cohesive sub-regions. The second phase consists of embedding these sub-regions onto the processors. However, it has been shown that performing these two operations in isolation can lead to poor mappings and much less optimal communication time.
In this thesis we develop a technique which simultaneously takes account of the processor topology whilst identifying the cohesive sub-regions. Our approach is based on an unstructured mesh decomposition method that was originally developed by Sadayappan et al [SER90] for a hypercube. This technique forms a basis for a method which enables a decomposition to an arbitrary number of processors on a specified processor network topology. Whilst partitioning the mesh, the optimisation method takes into account the processor topology by minimising the total interprocessor communication.
The problem with this technique is that it is not suitable for dealing with very large meshes since the calculations often require prodigious amounts of computing processing power.
The problem can be overcome by creating clusters of the original elements and using this to create a reduced network which is homomorphic to the original mesh. The technique can now be applied to the image network with comparative ease. The clusters are created using an efficient graph bisection method. The coarseness of the reduced mesh inevitably leads to a degradation of the solution. However, it is possible to refine the resultant partition to recapture some of the richness of the original mesh and hence achieve reasonable partitions.
One of the issues to be addressed is the level of granuality to obtain the best balance between computational efficiency and optimality of the solution. Some progress has been made in trying to find an answer to this important issue.
In this thesis, we show how the above technique can be effectively utilised in large scale computations. Results include testing the above technique on large scale meshes for complex flow domains
Mesoscopic descriptions of complex networks
[spa] El objetivo de la presente tesis es el estudio de las subestructuras que aparecen a un nivel de resolución mesoscópico en las redes complejas. Dichas subestructuras, que en el campo de las redes complejas son denominadas comunidades, intentan agrupar los nodos de una red de manera que los nodos que forman parte de una misma comunidad estén más conectados entre ellos que con el resto de nodos de la red. La importada del análisis de estas estructuras radica en que nos permiten comprender mejor las redes complejas dándonos información sobre la funcionalidad de las comunidades que las componen. Hemos llevado a cabo el estudio de estas estructuras mesoscópicas utilizando la información topológica de las redes, y en cuanto a los métodos empleados éstos se pueden agrupar en dos grandes familias conocidas habitualmente como clustering jerárquico y clustering modular.
Dentro de la primera familia de métodos nos hemos fijado en la existencia de un problema de no unicidad en el clustering jerárquico aglomerativo, y hemos propuesto una solución a dicho problema basada en el uso de una nueva herramienta de clasificación que denominamos multidendrograma. A continuación, hemos aplicado el resultado de una clasificación jerárquica para resolver un problema dentro de las redes complejas financieras. Más concretamente, hemos aprovechado una partición en clusters para resolver de manera más eficiente el problema de optimizar una cartera de valores.
Por lo que respecta a la segunda familia de métodos de clustering estudiados, ésta se basa en la optimización de una función objetivo llamada modularidad El inconveniente que presenta la optimización de la modularidad es su elevado coste computacional, la cual cosa nos ha llevado a idear una reducción analítica del tamaño de las redes complejas de manera que se conserva toda la información necesaria en la red original de cara a hallar la estructura de comunidades que optimice la modularidad. A continuación hemos podido utilizar dicha simplificación de los cálculos en el análisis de toda la mesoescala topológica de las redes complejas. Dicho mesoescala la hemos estudiado añadiendo un mismo valor a todos los nodos de una red que mide su resistencia a formar parte de comunidades, La optimización de la modularidad para estas nuevas instancias de la red original obtenidas a partir de unos valores de resistencia acotados analíticamente, nos permite analizar la mesoescala topológica de las redes. Por último, hemos propuesto una generalización de la función de modularidad donde los bloques constituyentes ya no son solamente arcos sino que pueden ser distintos tipos de motifs. Esto nos permite obtener descripciones más generales de grupos de nodos que incluyen como caso particular a las comunidades
A Multiobjective Evolutionary Conceptual Clustering Methodology for Gene Annotation Within Structural Databases: A Case of Study on the Gene Ontology Database
Current tools and techniques devoted to examine the
content of large databases are often hampered by their inability
to support searches based on criteria that are meaningful to
their users. These shortcomings are particularly evident in data
banks storing representations of structural data such as biological
networks. Conceptual clustering techniques have demonstrated
to be appropriate for uncovering relationships between features
that characterize objects in structural data. However, typical con ceptual clustering approaches normally recover the most obvious
relations, but fail to discover the lessfrequent but more informative
underlying data associations. The combination of evolutionary
algorithms with multiobjective and multimodal optimization
techniques constitutes a suitable tool for solving this problem.
We propose a novel conceptual clustering methodology termed
evolutionary multiobjective conceptual clustering (EMO-CC), re lying on the NSGA-II multiobjective (MO) genetic algorithm. We
apply this methodology to identify conceptual models in struc tural databases generated from gene ontologies. These models
can explain and predict phenotypes in the immunoinflammatory
response problem, similar to those provided by gene expression or
other genetic markers. The analysis of these results reveals that
our approach uncovers cohesive clusters, even those comprising a
small number of observations explained by several features, which
allows describing objects and their interactions from different
perspectives and at different levels of detail.Ministerio de Ciencia y Tecnología TIC-2003-00877Ministerio de Ciencia y Tecnología BIO2004-0270EMinisterio de Ciencia y Tecnología TIN2006-1287
Solutions to decision-making problems in management engineering using molecular computational algorithms and experimentations
制度:新 ; 報告番号:甲3368号 ; 学位の種類:博士(工学) ; 授与年月日:2011/5/23 ; 早大学位記番号:新568
Learning-Based Approaches for Graph Problems: A Survey
Over the years, many graph problems specifically those in NP-complete are
studied by a wide range of researchers. Some famous examples include graph
colouring, travelling salesman problem and subgraph isomorphism. Most of these
problems are typically addressed by exact algorithms, approximate algorithms
and heuristics. There are however some drawback for each of these methods.
Recent studies have employed learning-based frameworks such as machine learning
techniques in solving these problems, given that they are useful in discovering
new patterns in structured data that can be represented using graphs. This
research direction has successfully attracted a considerable amount of
attention. In this survey, we provide a systematic review mainly on classic
graph problems in which learning-based approaches have been proposed in
addressing the problems. We discuss the overview of each framework, and provide
analyses based on the design and performance of the framework. Some potential
research questions are also suggested. Ultimately, this survey gives a clearer
insight and can be used as a stepping stone to the research community in
studying problems in this field.Comment: v1: 41 pages; v2: 40 page
A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.
The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic
- …