1,518 research outputs found

    Anwendungen von #SAT Solvern für Produktlinien: Masterarbeit

    Get PDF
    Product lines are widely used for managing families of similar products. Typically, product lines are complex and infeasible to analyze manually. In the last two decades, product-line analyses have been reduced to satisfiability problems which are well understood. However, there are methods for which satisfiability is not sufficient. Recently, researchers begun to reduce other problems to #SAT. Yet, only few applications have been considered and those are fairly limited in their scope. Furthermore, the authors mainly propose ad-hoc solutions that are only applicable under certain restrictions or do not scale to large product lines. In this thesis, we aim show the benefits of applying #SAT for the analysis of product lines. To this end, we make the following contributions: First, we summarize applications dependent on #AT considered in the literature and propose new applications to motivate the usage of #SAT technology. Second, we present a variety of algorithms and optimizations for these applications including new proposals. Third, we empirically evaluate 10 proposed algorithms with 14 off-the-shelf #SAT solvers on 131 industrial feature models to identify the fastest algorithms and solvers. Our results show that for each analysis at least one algorithm and solver scale on a vast majority of the feature models, whereas Linux and an automotive model not be analyzed at all. In addition, our results further reveal the benefits of knowledge compilation to deterministic decomposable negation normal form for performing counting-based analyses. Overall, our work shows that #SAT dependent analyses for feature models open a new variety of different applications and scale to a large number of industrial feature models.Produktlinien sind weit verbreitet für die Verwaltung von Familien verwandter Pro- dukte. In der Regel sind Produktlinien komplex und manuell schwer zu analysieren. In den letzten zwei Jahrzehnten wurden Produktlinienanalysen auf Erfüllbarkeit- sprobleme reduziert, für welche es eine Vielzahl an effizienten Werkzeugen gibt. Allerdings ist Erfüllbarkeit nicht für alle Analysen hinreichend. Kürzlich haben Forscher damit begonnen, andere Probleme auf #SAT zu reduzieren. Es wur- den jedoch nur wenige Anwendungen in Betracht gezogen und auch der Anwen- dungsbereich ist begrenzt. Darüber hinaus schlagen die Autoren hauptsächlich Ad-hoc-Lösungen vor, die nur unter bestimmten Einschränkungen der Produktlin- ien anwendbar sind oder nicht für große Produktlinien skalieren. In dieser Arbeit zeigen wir die Vorteile von #SAT Anwendungen für Produtlinien auf. Unser wis- senschaftlicher Beitrag besteht aus den folgenden drei Punkten: Zuerst fassen wir die in der Literatur betrachteten #SAT-Anwendungen zusammen und schlagen neue Anwendungen vor, um den Einsatz von #SAT-Technologien zu motivieren. Zweit- ens stellen wir eine Vielzahl von Algorithmen und Optimierungen für diese Anwen- dungen vor, einschließlich neuer Vorschläge. Drittens führen wir eine empirische Evaluation von 10 der vorgeschlagenen Algorithmen mit 14 #SAT-Solvern auf 131 industriellen Feature-Modellen aus, um die schnellsten Algorithmen und Solver zu identifizieren. Die Ergebnisse der Evaluation zeigen, dass wir für jede Analyse wenig- stens einen Algorithmus und Solver identifiziert haben, die für industrielle Feature- Modelle skalieren. Dazu sind die Ergebnisse ein starker Indikator für die Vorteile des Einsatzes von d-DNNFs bei #SAT-Anwendungen. Insgesamt zeigt unsere Ar- beit, dass #SAT-abhängige Analysen für Feature-Modelle eine Vielzahl neuer un- terschiedlicher Anwendungen ermöglicht und für viele industirelle Feature-Modelle skaliert

    View Selection in Semantic Web Databases

    Get PDF
    We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201

    Latent Representation and Sampling in Network: Application in Text Mining and Biology.

    Get PDF
    In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains

    Generating the Ground Truth: Synthetic Data for Label Noise Research

    Full text link
    Most real-world classification tasks suffer from label noise to some extent. Such noise in the data adversely affects the generalization error of learned models and complicates the evaluation of noise-handling methods, as their performance cannot be accurately measured without clean labels. In label noise research, typically either noisy or incomplex simulated data are accepted as a baseline, into which additional noise with known properties is injected. In this paper, we propose SYNLABEL, a framework that aims to improve upon the aforementioned methodologies. It allows for creating a noiseless dataset informed by real data, by either pre-specifying or learning a function and defining it as the ground truth function from which labels are generated. Furthermore, by resampling a number of values for selected features in the function domain, evaluating the function and aggregating the resulting labels, each data point can be assigned a soft label or label distribution. Such distributions allow for direct injection and quantification of label noise. The generated datasets serve as a clean baseline of adjustable complexity into which different types of noise may be introduced. We illustrate how the framework can be applied, how it enables quantification of label noise and how it improves over existing methodologies

    Software Product Line

    Get PDF
    The Software Product Line (SPL) is an emerging methodology for developing software products. Currently, there are two hot issues in the SPL: modelling and the analysis of the SPL. Variability modelling techniques have been developed to assist engineers in dealing with the complications of variability management. The principal goal of modelling variability techniques is to configure a successful software product by managing variability in domain-engineering. In other words, a good method for modelling variability is a prerequisite for a successful SPL. On the other hand, analysis of the SPL aids the extraction of useful information from the SPL and provides a control and planning strategy mechanism for engineers or experts. In addition, the analysis of the SPL provides a clear view for users. Moreover, it ensures the accuracy of the SPL. This book presents new techniques for modelling and new methods for SPL analysis

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

    Commonality in complex product families : implications of divergence and lifecycle offsets

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Engineering Systems Division, 2008.Cataloged from PDF version of thesis.Includes bibliographical references (p. 220-224).Commonality, or the sharing of components, processes, technologies, interfaces and/or infrastructure across a product family, represents one of many potential tools for increasing corporate profitability. Industrial interest in commonality is strong, but results appear to be mixed. A rich stream of academic research has examined commonality (typically under terms such as "product platforms" and "platform-based development") but has not emphasized the benefits and penalties of commonality, a topic that is critical to effective product family planning and lifecycle management, and ultimately, to improving corporate profitability. This dissertation leverages field research and a simple cost model to examine commonality in the context of complex product families. The core research effort was focused on conducting seven case studies of complex product families (aircraft, automobiles, satellites, and capital equipment). While the case studies provided a wealth of general insights, the studies were focused on examining divergence and lifecycle offsets, two critical topics that influence the benefits and penalties of commonality, yet appear to be inadequately addressed by the literature. Divergence refers to the tendency for commonality to reduce with time, for both beneficial and non-beneficial reasons. Lifecycle offsets refer to temporal differences between the lifecycle phases of product family members. Lifecycle offsets alter the potential benefits and penalties of commonality and their apportionment to individual products.(cont.) Additionally, key factors identified during the literature review and case studies were translated into a simple two-product cost model of development and production in order to demonstrate key research insights in a more analytical manner. The case studies provide a refined view of commonality that reflects the realities of industrial practice. The cases indicate that complex product families are developed in a mostly sequential manner; that commonality is highest during the product family planning phase and then declines significantly throughout the lifecycle; and that development focuses more on reusing prior product baselines than on enabling future, potential commonality. The case studies also identified challenges in the evaluation of commonality and its lifecycle management. The case findings and simple cost model contribute to an improved understanding of commonality, while the recommendations offer potential paths to improved corporate profitability.by Ryan C. Boas.Ph.D
    corecore