1,807 research outputs found

    Logic learning and optimized drawing: two hard combinatorial problems

    Get PDF
    Nowadays, information extraction from large datasets is a recurring operation in countless fields of applications. The purpose leading this thesis is to ideally follow the data flow along its journey, describing some hard combinatorial problems that arise from two key processes, one consecutive to the other: information extraction and representation. The approaches here considered will focus mainly on metaheuristic algorithms, to address the need for fast and effective optimization methods. The problems studied include data extraction instances, as Supervised Learning in Logic Domains and the Max Cut-Clique Problem, as well as two different Graph Drawing Problems. Moreover, stemming from these main topics, other additional themes will be discussed, namely two different approaches to handle Information Variability in Combinatorial Optimization Problems (COPs), and Topology Optimization of lightweight concrete structures

    A nonmonotone GRASP

    Get PDF
    A greedy randomized adaptive search procedure (GRASP) is an itera- tive multistart metaheuristic for difficult combinatorial optimization problems. Each GRASP iteration consists of two phases: a construction phase, in which a feasible solution is produced, and a local search phase, in which a local optimum in the neighborhood of the constructed solution is sought. Repeated applications of the con- struction procedure yields different starting solutions for the local search and the best overall solution is kept as the result. The GRASP local search applies iterative improvement until a locally optimal solution is found. During this phase, starting from the current solution an improving neighbor solution is accepted and considered as the new current solution. In this paper, we propose a variant of the GRASP framework that uses a new “nonmonotone” strategy to explore the neighborhood of the current solu- tion. We formally state the convergence of the nonmonotone local search to a locally optimal solution and illustrate the effectiveness of the resulting Nonmonotone GRASP on three classical hard combinatorial optimization problems: the maximum cut prob- lem (MAX-CUT), the weighted maximum satisfiability problem (MAX-SAT), and the quadratic assignment problem (QAP)

    Design of Heuristic Algorithms for Hard Optimization

    Get PDF
    This open access book demonstrates all the steps required to design heuristic algorithms for difficult optimization. The classic problem of the travelling salesman is used as a common thread to illustrate all the techniques discussed. This problem is ideal for introducing readers to the subject because it is very intuitive and its solutions can be graphically represented. The book features a wealth of illustrations that allow the concepts to be understood at a glance. The book approaches the main metaheuristics from a new angle, deconstructing them into a few key concepts presented in separate chapters: construction, improvement, decomposition, randomization and learning methods. Each metaheuristic can then be presented in simplified form as a combination of these concepts. This approach avoids giving the impression that metaheuristics is a non-formal discipline, a kind of cloud sculpture. Moreover, it provides concrete applications of the travelling salesman problem, which illustrate in just a few lines of code how to design a new heuristic and remove all ambiguities left by a general framework. Two chapters reviewing the basics of combinatorial optimization and complexity theory make the book self-contained. As such, even readers with a very limited background in the field will be able to follow all the content

    Searching for patterns in Conway's Game of Life

    Get PDF
    Conway’s Game of Life (Life) is a simple cellular automaton, discovered by John Conway in 1970, that exhibits complex emergent behavior. Life-enthusiasts have been looking for building blocks with specific properties (patterns) to answer unsolved problems in Life for the past five decades. Finding patterns in Life is difficult due to the large search space. Current search algorithms use an explorative approach based on the rules of the game, but this can only sample a small fraction of the search space. More recently, people have used Sat solvers to search for patterns. These solvers are not specifically tuned to this problem and thus waste a lot of time processing Life’s rules in an engine that does not understand them. We propose a novel Sat-based approach that replaces the binary tree used by traditional Sat solvers with a grid-based approach, complemented by an injection of Game of Life specific knowledge. This leads to a significant speedup in searching. As a fortunate side effect, our solver can be generalized to solve general Sat problems. Because it is grid-based, all manipulations are embarrassingly parallel, allowing implementation on massively parallel hardware

    Clustering Approaches for Multi-source Entity Resolution

    Get PDF
    Entity Resolution (ER) or deduplication aims at identifying entities, such as specific customer or product descriptions, in one or several data sources that refer to the same real-world entity. ER is of key importance for improving data quality and has a crucial role in data integration and querying. The previous generation of ER approaches focus on integrating records from two relational databases or performing deduplication within a single database. Nevertheless, in the era of Big Data the number of available data sources is increasing rapidly. Therefore, large-scale data mining or querying systems need to integrate data obtained from numerous sources. For example, in online digital libraries or E-Shops, publications or products are incorporated from a large number of archives or suppliers across the world or within a specified region or country to provide a unified view for the user. This process requires data consolidation from numerous heterogeneous data sources, which are mostly evolving. By raising the number of sources, data heterogeneity and velocity as well as the variance in data quality is increased. Therefore, multi-source ER, i.e. finding matching entities in an arbitrary number of sources, is a challenging task. Previous efforts for matching and clustering entities between multiple sources (> 2) mostly treated all sources as a single source. This approach excludes utilizing metadata or provenance information for enhancing the integration quality and leads up to poor results due to ignorance of the discrepancy between quality of sources. The conventional ER pipeline consists of blocking, pair-wise matching of entities, and classification. In order to meet the new needs and requirements, holistic clustering approaches that are capable of scaling to many data sources are needed. The holistic clustering-based ER should further overcome the restriction of pairwise linking of entities by making the process capable of grouping entities from multiple sources into clusters. The clustering step aims at removing false links while adding missing true links across sources. Additionally, incremental clustering and repairing approaches need to be developed to cope with the ever-increasing number of sources and new incoming entities. To this end, we developed novel clustering and repairing schemes for multi-source entity resolution. The approaches are capable of grouping entities from multiple clean (duplicate-free) sources, as well as handling data from an arbitrary combination of clean and dirty sources. The multi-source clustering schemes exclusively developed for multi-source ER can obtain superior results compared to general purpose clustering algorithms. Additionally, we developed incremental clustering and repairing methods in order to handle the evolving sources. The proposed incremental approaches are capable of incorporating new sources as well as new entities from existing sources. The more sophisticated approach is able to repair previously determined clusters, and consequently yields improved quality and a reduced dependency on the insert order of the new entities. To ensure scalability, the parallel variation of all approaches are implemented on top of the Apache Flink framework which is a distributed processing engine. The proposed methods have been integrated in a new end-to-end ER tool named FAMER (FAst Multi-source Entity Resolution system). The FAMER framework is comprised of Linking and Clustering components encompassing both batch and incremental ER functionalities. The output of Linking part is recorded as a similarity graph where each vertex represents an entity and each edge maintains the similarity relationship between two entities. Such a similarity graph is the input of the Clustering component. The comprehensive comparative evaluations overall show that the proposed clustering and repairing approaches for both batch and incremental ER achieve high quality while maintaining the scalability

    On Some Optimization Problems on Dynamic Networks

    Get PDF
    The basic assumption of re-optimization consists in the need of eiciently managing huge quantities of data in order to reduce the waste of resources, both in terms of space and time. Re-optimization refers to a series of computational strategies through which new problem instances are tackled analyzing similar, previously solved, problems, reusing existing useful information stored in memory from past computations. Its natural collocation is in the context of dynamic problems, with these latter accounting for a large share of the themes of interest in the multifaceted scenario of combinatorial optimization, with notable regard to recent applications. Dynamic frameworks are topic of research in classical and new problems spanning from routing, scheduling, shortest paths, graph drawing and many others. Concerning our speciic theme of investigation, we focused on the dynamical characteristics of two problems deined on networks: re-optimization of shortest paths and incremental graph drawing. For the former, we proposed a novel exact algorithm based on an auction approach, while for the latter, we introduced a new constrained formulation, Constrained Incremental Graph Drawing, and several meta-heuristics based prevalently on Tabu Search and GRASP frameworks. Moreover, a parallel branch of our research focused on the design of new GRASP algorithms as eicient solution strategies to address further optimization problems. Speciically, in this research thread, will be presented several GRASP approaches devised to tackle intractable problems such as: the Maximum-Cut Clique, p-Center, and Minimum Cost Satisiability

    Contributions to the cornerstones of interaction in visualization: strengthening the interaction of visualization

    Get PDF
    Visualization has become an accepted means for data exploration and analysis. Although interaction is an important component of visualization approaches, current visualization research pays less attention to interaction than to aspects of the graphical representation. Therefore, the goal of this work is to strengthen the interaction side of visualization. To this end, we establish a unified view on interaction in visualization. This unified view covers four cornerstones: the data, the tasks, the technology, and the human.Visualisierung hat sich zu einem unverzichtbaren Werkzeug für die Exploration und Analyse von Daten entwickelt. Obwohl Interaktion ein wichtiger Bestandteil solcher Werkzeuge ist, wird der Interaktion in der aktuellen Visualisierungsforschung weniger Aufmerksamkeit gewidmet als Aspekten der graphischen Repräsentation. Daher ist es das Ziel dieser Arbeit, die Interaktion im Bereich der Visualisierung zu stärken. Hierzu wird eine einheitliche Sicht auf Interaktion in der Visualisierung entwickelt

    Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications

    Get PDF
    Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets

    Scalability considerations for multivariate graph visualization

    Get PDF
    Real-world, multivariate datasets are frequently too large to show in their entirety on a visual display. Still, there are many techniques we can employ to show useful partial views-sufficient to support incremental exploration of large graph datasets. In this chapter, we first explore the cognitive and architectural limitations which restrict the amount of visual bandwidth available to multivariate graph visualization approaches. These limitations afford several design approaches, which we systematically explore. Finally, we survey systems and studies that exhibit these design strategies to mitigate these perceptual and architectural limitations
    • …
    corecore