131 research outputs found

    Learning the Language of Biological Sequences

    Get PDF
    International audienceLearning the language of biological sequences is an appealing challenge for the grammatical inference research field.While some first successes have already been recorded, such as the inference of profile hidden Markov models or stochastic context-free grammars which are now part of the classical bioinformatics toolbox, it is still a source of open and nice inspirational problems for grammatical inference, enabling us to confront our ideas to real fundamental applications. As an introduction to this field, we survey here the main ideas and concepts behind the approaches developed in pattern/motif discovery and grammatical inference to characterize successfully the biological sequences with their specificities

    Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: Case of grammatical inference

    Get PDF
    In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora

    Addressing Resource Variability Through Resource-Driven Adaptation

    Get PDF
    Software systems execute tasks that depend on different types of resources. However, the variability of resources may interfere with the ability of software systems to execute important tasks. Resource variability can occur due to several reasons including unexpected hardware failures, excess workloads, or lack of materials. For example, in automated warehouses, malfunctioning robots could delay product deliveries causing customer dissatisfaction and, therefore, reducing an enterprise’s sales. Moreover, the unavailability of medical materials hinders the ability of hospitals to perform medically-critical operations causing loss of life. In this thesis, we propose to address the problem of resource variability through resource-driven adaptation, using task models as input for adaptation decisions. The thesis presents the following contributions: • SPARK: a framework for performing proactive and reactive resource-driven adaptation based on multiple task-related criteria. The framework supports different types of depletable and reusable resources that could face variability. SPARK assists with four types of adaptation, namely: (i) execution of a similar task that requires fewer resources, (ii) substitution of resources by alternative ones, (iii) execution of tasks in a different order, and (iv) cancellation of the execution of tasks. • SERIES: a task modelling notation and editor tool that enables software practitioners to create task models that serve as input for SPARK. SERIES supports the representation of task priorities, task variants, task execution types, resource types, and properties representing users’ feedback. SPARK was evaluated in terms of the percentage of executed critical task requests, the average criticality of the executed task requests in comparison to the non-executed ones, overhead, and scalability through two case studies concerned with a medicine consumption system and a manufacturing system. The results of the evaluation showed that SPARK increased the number of executed critical task requests during resource variability. Additionally, the results showed that the time it takes to prepare and apply adaptation plans does not add significant overhead that hinders the ability of software systems to execute tasks in a tolerable waiting time. Furthermore, SPARK was shown to be scalable since the abovementioned time increases polynomially relative to the input size (number of tasks and task variants). SERIES was evaluated through a user study with twenty software practitioners. The results showed that software practitioners performed very well when explaining and creating task models using SERIES. These results were reflected in the task modelling activities that the participants performed as well as in their positive feedback regarding the usability of SERIES and the clarity of its semantic constructs. Overall, we conclude that the research presented in the thesis contributes to addressing resource variability through resource-driven adaptation. We also provide suggestions for future work that can extend this research
    • …
    corecore