2 research outputs found

    Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

    Get PDF
    In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora

    Grammar induction using bit masking oriented genetic algorithm and comparative analysis

    No full text
    This paper presents bit masking oriented genetic algorithm (BMOGA) for context free grammar induction. It takes the advantages of crossover and mutation mask-fill operators together with a Boolean based procedure in two phases to guide the search process from ith generation to ( i + 1)th generation. Crossover and mutation mask-fill operations are performed to generate the proportionate amount of population in each generation. A parser has been implemented checks the validity of the grammar rules based on the acceptance or rejection of training data on the positive and negative strings of the language. Experiments are conducted on collection of context free and regular languages. Minimum description length principle has been used to generate a corpus of positive and negative samples as appropriate for the experiment. It was observed that the BMOGA produces successive generations of individuals, computes their fitness at each step and chooses the best when reached to threshold (termination) condition. As presented approach was found effective in handling premature convergence therefore results are compared with the approaches used to alleviate premature convergence. The analysis showed that the BMOGA performs better as compared to other algorithms such as: random offspring generation approach, dynamic allocation of reproduction operators, elite mating pool approach and the simple genetic algorithm. The term success ratio is used as a quality measure and its value shows the effectiveness of the BMOGA. Statistical tests indicate superiority of the BMOGA over other existing approaches implemented
    corecore