    Nagyméretű véletlen gráfok statisztikai vizsgálata = Statistical inference on large random graphs

    Nagyméretű gráfok struktúrájának feltárására alkalmaztunk és fejlesztettünk ki paraméteres és nemparaméteres statisztikai módszereket. Paraméteres vizsgálatok: az ún. általánosított véletlen gráf modellben és az alpha-beta-modellekben a paraméterek maximum likelihood becslésére EM-algoritmust használtunk. A modellt a Rasch-modell páros gráfokra történő alkalmazásával kiterjesztettük a többklaszteres szituációra. Nemparaméteres vizsgálatok: minimális, maximális és reguláris vágások. A klaszterek számára a normált Laplace ill. modularitás mátrix sajátértékeiből következtettünk, míg maguknak a klasztereknek a megkeresésére a k-közép eljárást alkalmaztuk a csúcsreprezentánsok segítségével. Tételeket bizonyítottunk a vágások, a térfogatregularitás mérőszáma, a spektrális rés és a klaszterek k-varianciája közti összefüggésekre, ha csúcsok száma tart a végtelenbe úgy, hogy nincsen domináns csúcs. Általánosítottuk az ún. Newman-Girvan modularitást, és a normált modularitás mátrix nagy abszolút értékű sajátértékeit és azok előjelét használtuk a klaszterek jellegének megállapítására. Az általánosított véletlen gráfok spektrális karakterizációját adtuk a strukturális sajátértékek és sajátalterek segítségével. Vizsgálatainkat kiterjesztettük súlyozott, irányított gráfokra és kontingenciatáblákra is. Foglalkoztunk továbbá minimális többszempontú vágássűrűségek tesztelhetőségével a Lovász L. és társszerzői által konvergens gráfsorozatoknál használt értelemben. | We applied and developed parametric and nonparametric statistical methods to recover the structure of large graphs. Parametric inference: in the so-called generalized random graph model and alpha- beta-models we applied EM-algorithm for the maximum likelihood estimation of the parameters. We extended the model to the several clusters case via the Rasch-model applied to the bipartite graphs formed by the pairs of the clusters. Nonparametric inference: minimal, maximal, and regular cuts. For the number of clusters, we concluded from the spectra of the Laplacian and modularity matrices, whereas we found the clusters by the k-means algorithm applied for the vertex representatives. We proved theorems for the relations between the multiway cuts, the constant of the volume-regularity, and the spectral gap together with the k-variance of the clusters, when the number of the vertices tends to infinity in such a way that there are no dominant vertices. We generalized the notion of the so-called Newman-Girvan modularity and gave the spectral characterization of the generalized random graphs. We extended our findings to weighted and directed graphs, further, to contingency tables. We also investigated the testability of balanced multiway cut densities, where for the testability we used the definitions of Lovász L. and coauthors in the context of convergent graph sequences

    Lower threshold ground state energy and testability of minimal balanced cut density

    Lov\'asz and his coauthors defined the notion of microcanonical ground state energy E^a(G,J)\hat{\mathcal{E}}_\mathbb{a} (G,J) -- borrowed from the statistical physics -- for weighted graphs GG, where a\mathbb{a} is a probability distribution on {1,...,q}\{1,...,q\} and JJ is a symmetric q×qq \times q matrix with real entries. We define a new version of the ground state energy, E^c(G,J)=infaAcE^a(G,J)\hat{\mathcal{E}}^c (G,J)=\inf_{\mathbb{a}\in A_c}\hat{\mathcal{E}}_\mathbb{a} (G,J), called lower threshold ground state energy, where Ac={a:aic,i=1,,q}A_c = \{\mathbb{a} :\, a_i\ge c,\,i=1,\dots, q \}. Both types of energies can be extended for graphons WW, the limit objects of convergent sequences of simple graphs. In the main result of the paper it is stated that if 0c1<c210\leq c_1<c_2 \leq 1, then the convergence of the sequences (E^c2/q(Gn,J))(\hat{\mathcal{E}}^{c_2/q} (G_n,J)) for each JJ implies convergence of the sequences (E^c1/q(Gn,J))(\hat{\mathcal{E}}^{c_1/q} (G_n,J)) for each JJ. As a byproduct one can derive in a natural way the testability of minimum balanced multiway cut densities, that is one of the fundamental problems of cluster analysis.Comment: 14 page

    On Approximability, Convergence, and Limits of CSP Problems

    This thesis studies dense constraint satisfaction problems (CSPs), and other related optimization and decision problems that can be phrased as questions regarding parameters or properties of combinatorial objects such as uniform hypergraphs. We concentrate on the information that can be derived from a very small substructure that is selected uniformly at random. In this thesis, we present a unified framework on the limits of CSPs in the sense of the convergence notion of Lovasz-Szegedy that depends only on the remarkable connection between graph sequences and exchangeable arrays established by Diaconis-Janson. In particular, we formulate and prove a representation theorem for compact colored r-uniform directed hypergraphs and apply this to rCSPs. We investigate the sample complexity of testable r-graph parameters, and discuss a generalized version of ground state energies (GSE) and demonstrate that they are efficiently testable. The GSE is a term borrowed from statistical physics that stands for a generalized version of maximal multiway cut problems from complexity theory, and was studied in the dense graph setting by Borgs et al. A notion related to testing CSPs that are defined on graphs, the nondeterministic property testing, was introduced by Lovasz-Vesztergombi, which extends the graph property testing framework of Goldreich-Goldwasser-Ron in the dense graph model. In this thesis, we study the sample complexity of nondeterministically testable graph parameters and properties and improve existing bounds by several orders of magnitude. Further, we prove the equivalence of the notions of nondeterministic and deterministic parameter and property testing for uniform dense hypergraphs of arbitrary rank, and provide the first effective upper bound on the sample complexity in this general case

    Logic perturbation based circuit partitioning and optimum FPGA switch-box designs.

    Cheung Chak Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 101-114).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiVita --- p.vTable of Contents --- p.viList of Figures --- p.xList of Tables --- p.xivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Aims and Contribution --- p.4Chapter 1.3 --- Thesis Overview --- p.5Chapter 2 --- VLSI Design Cycle --- p.6Chapter 2.1 --- Logic Synthesis --- p.7Chapter 2.1.1 --- Logic Minimization --- p.8Chapter 2.1.2 --- Technology Mapping --- p.8Chapter 2.1.3 --- Testability --- p.8Chapter 2.2 --- Physical Design Synthesis --- p.8Chapter 2.2.1 --- Partitioning --- p.9Chapter 2.2.2 --- Floorplanning & Placement --- p.10Chapter 2.2.3 --- Routing --- p.11Chapter 2.2.4 --- "Compaction, Extraction & Verification" --- p.12Chapter 2.2.5 --- Physical Design of FPGAs --- p.12Chapter 3 --- Alternative Wiring --- p.13Chapter 3.1 --- Introduction --- p.13Chapter 3.2 --- Notation and Definitions --- p.15Chapter 3.3 --- Application of Rewiring --- p.17Chapter 3.3.1 --- Logic Optimization --- p.17Chapter 3.3.2 --- Timing Optimization --- p.17Chapter 3.3.3 --- Circuit Partitioning and Routing --- p.18Chapter 3.4 --- Logic Optimization Analysis --- p.19Chapter 3.4.1 --- Global Flow Optimization --- p.19Chapter 3.4.2 --- OBDD Representation --- p.20Chapter 3.4.3 --- Automatic Test Pattern Generation (ATPG) --- p.22Chapter 3.4.4 --- Graph Based Alternative Wiring (GBAW) --- p.23Chapter 3.5 --- Augmented GBAW --- p.26Chapter 3.6 --- Logic Optimization by using GBAW --- p.28Chapter 3.7 --- Conclusions --- p.31Chapter 4 --- Multi-way Partitioning using Rewiring Techniques --- p.33Chapter 4.1 --- Introduction --- p.33Chapter 4.2 --- Circuit Partitioning Algorithm Analysis --- p.38Chapter 4.2.1 --- The Kernighan-Lin (KL) Algorithm --- p.39Chapter 4.2.2 --- The Fiduccia-Mattheyses (FM) Algorithm --- p.42Chapter 4.2.3 --- Geometric Representation Algorithm --- p.46Chapter 4.2.4 --- The Multi-level Partitioning Algorithm --- p.49Chapter 4.2.5 --- Hypergraph METIS - hMETIS --- p.51Chapter 4.3 --- The GBAW Partitioning Algorithm --- p.53Chapter 4.4 --- Experimental Results --- p.56Chapter 4.5 --- Conclusions --- p.58Chapter 5 --- Optimum FPGA Switch-Box Designs - HUSB --- p.62Chapter 5.1 --- Introduction --- p.62Chapter 5.2 --- Background and Definitions --- p.65Chapter 5.2.1 --- Routing Architectures --- p.65Chapter 5.2.2 --- Global Routing --- p.67Chapter 5.2.3 --- Detailed Routing --- p.67Chapter 5.3 --- FPGA Router Comparison --- p.69Chapter 5.3.1 --- CGE --- p.69Chapter 5.3.2 --- SEGA --- p.70Chapter 5.3.3 --- TRACER --- p.71Chapter 5.3.4 --- VPR --- p.72Chapter 5.4 --- Switch Box Design --- p.73Chapter 5.4.1 --- Disjoint type switch box (XC4000-type) --- p.73Chapter 5.4.2 --- Anti-symmetric switch box --- p.74Chapter 5.4.3 --- Universal Switch box --- p.74Chapter 5.4.4 --- Switch box Analysis --- p.75Chapter 5.5 --- Terminology --- p.77Chapter 5.6 --- "Hyper-universal (4, W)-design analysis" --- p.82Chapter 5.6.1 --- "H3 is an optimum (4, 3)-design" --- p.84Chapter 5.6.2 --- "H4 is an optimum (4,4)-design" --- p.88Chapter 5.6.3 --- "Hi is a hyper-universal (4, i)-design for i = 5,6,7" --- p.90Chapter 5.7 --- Experimental Results --- p.92Chapter 5.8 --- Conclusions --- p.95Chapter 6 --- Conclusions --- p.99Chapter 6.1 --- Thesis Summary --- p.99Chapter 6.2 --- Future work --- p.100Chapter 6.2.1 --- Alternative Wiring --- p.100Chapter 6.2.2 --- Partitioning Quality --- p.100Chapter 6.2.3 --- Routing Devices Studies --- p.100Bibliography --- p.101Chapter A --- 5xpl - Berkeley Logic Interchange Format (BLIF) --- p.115Chapter B --- Proof of some 2-local patterns --- p.122Chapter C --- Illustrations of FM algorithm --- p.124Chapter D --- HUSB Structures --- p.127Chapter E --- Primitive minimal 4-way global routing Structures --- p.13

    Einheitliche Gütemaße für Clusterings, Layouts und Orderings von Graphen, und deren Anwendung als Software-Entwurfskriterien

    How good is a given graph clustering, graph layout, or graph ordering --specifically, how well does it group densely connected vertices and separate sparsely connected vertices? How good is a given software design -- specifically, how well does it minimize the interdependence of the subsystems? This work introduces and validates simple and uniform measures for these two properties. Together with existing optimization algorithms, the introduced measures enable the automatic computation e.g. of communities in social networks and of design flaws in software systems. The first part derives, validates, and unifies quality measures for graph clusterings, graph layouts, and graph orderings, with the following results: - Identical quality measures can be applied to clusterings, layouts, and orderings; this enables the computation of consistent clusterings, layouts, and orderings. - Diverse existing and new measures can be unified into few general measures; this facilitates their comparison and validation. - Many existing measures are biased towards certain clusterings, layouts, or orderings, even for graphs without particularly dense or sparse subgraphs, and thus do not (only) measure quality in the above sense. - For example graphs, the minimization of new, unbiased (or weakly biased) measures reveals nonobvious groups, e.g. communities in social networks, subject areas in hypertexts, or closely interlocked countries in international trade. The second part derives, validates, and unifies dependency-based indicators of software design quality. It applies two quality measures for graph clusterings as measures for the coupling of software subsystems -- specifically for the coupling indicated by common changes and for the coupling indicated by references -- and shows: - The measures quantify the dependency-caused development costs, under well-defined simplifying assumptions. - The minimization of the measures conforms to existing dependency-related design principles (like locality of change, acyclicity of references, and stability of references), design rules, and design patterns. - For example software systems, the incremental minimization of the measures reveals nonobvious design flaws, like the distribution of coherent responsibilities over several subsystems, or references from low-level to high-level subsystems. In summary, this work shows that - simple measures can suffice to capture important aspects of graph clustering quality, graph layout quality, graph ordering quality, and software design quality, and - the optimization of simple measures can suffice to detect nonobvious and often useful structure in various real-world systems.Wie gut ist ein Graph-Clustering, Graph-Layout oder Graph-Ordering -- insbesondere, wie gut gruppiert es dicht verbundene Knoten? Wie gut ist ein Software-Entwurf -- insbesondere, wie gut minimiert er die Abhängigkeiten zwischen Subsystemen? Für diese beiden Eigenschaften definiert und validiert die vorliegende Arbeit einfache und einheitliche Maße. Zusammen mit existierenden Optimierungsalgorithmen ermöglichen diese Maße die automatische Entdeckung z.B. von kohäsiven Communities in sozialen Netzwerken und von Entwurfsfehlern in Software-Systemen. Der erste Teil definiert, validiert und vereinheitlicht Gütemaße für Graph-Clusterings, Graph-Layouts und Graph-Orderings, mit folgenden Ergebnissen: - Identische Gütemaße können auf Clusterings, Layouts und Orderings angewendet werden. Dies ermöglicht die Berechnung von konsistenten Clusterings, Layouts und Orderings. - Viele existierende und neue Gütemaße können zu wenigen allgemeinen Maßen vereinheitlicht werden; dies erleichtert ihren Vergleich und ihre Validierung. - Viele existierende Maße messen nicht (nur) Güte im obigen Sinne, da sie selbst für Graphen ohne ungewöhnlich dichte oder dünne Teilgraphen bestimmte Clusterings, Layouts oder Orderings bevorzugen. - Durch Optimierung verbesserter Maße lassen sich nicht-offensichtliche Gruppen in vielen realen Systemen finden, z.B. Communities in sozialen Netzwerken, Themengebiete in Hypertexten, und Integrationsräume in der Weltwirtschaft. Der zweite Teil definiert, validiert und vereinheitlicht abhängigkeitsbasierte Indikatoren für Software-Entwurfsqualität. Er verwendet zwei Gütemaße für Graph-Clusterings als Maße für die Kopplung von Software-Subsystemen -- insbesondere für Kopplung, deren Symptom gemeinsame Änderungen sind und für Kopplung, deren Ursache Referenzen sind -- und zeigt: - Die Maße quantifizieren die durch Abhängigkeiten verursachten Entwicklungskosten, unter vereinfachenden Annahmen. - Die Optimierung der Maße impliziert anerkannte Entwurfsprinzipien (z.B. Lokalität von Änderungen, Azyklizität von Referenzen, und Stabilität von Referenzen), Entwurfsregeln und Entwurfsmuster. - Durch Optimierung der Maße lassen sich nicht-offensichtliche Entwurfsfehler finden, z.B. die Verteilung kohärenter Verantwortlichkeiten über mehrere Subsysteme, oder Referenzen von allgemeinen zu speziellen Subsystemen. Zusammenfassend zeigt die Arbeit, dass - einfache Maße ausreichen, um wichtige Aspekte der Qualität von Graph-Clusterings, Graph-Layouts, Graph-Orderings und Software-Entwürfen zu formalisieren, und - die Optimierung einfacher Maße ausreicht, um nicht-offensichtliche und nützliche Struktur in verschiedensten Systemen zu finden

    A software testing estimation and process control model

    The control of the testing process and estimation of the resource required to perform testing is key to delivering a software product of target quality on budget. This thesis explores the use of testing to remove errors, the part that metrics and models play in this process, and considers an original method for improving the quality of a software product. The thesis investigates the possibility of using software metrics to estimate the testing resource required to deliver a product of target quality into deployment and also determine during the testing phases the correct point in time to proceed to the next testing phase in the life-cycle. Along with the metrics Clear ratio. Chum, Error rate halving. Severity shift, and faults per week, a new metric 'Earliest Visibility' is defined and used to control the testing process. EV is constructed upon the link between the point at which an error is made within development and subsequently found during testing. To increase the effectiveness of testing and reduce costs, whilst maintaining quality the model operates by each test phase being targeted at the errors linked to that test phase and the ability for each test phase to build upon the previous phase. EV also provides a measure of testing effectiveness and fault introduction rate by development phase. The resource estimation model is based on a gradual refinement of an estimate, which is updated following each development phase as more reliable data is available. Used in conjunction with the process control model, which will ensure the correct testing phase is in operation, the estimation model will have accurate data for each testing phase as input. The proposed model and metrics have been developed and tested on a large-scale (4 million LOC) industrial telecommunications product written in C and C++ running within a Unix environment. It should be possible to extend this work to suit other environments and other development life-cycles