thesis

ON THE USE OF NATURAL LANGUAGE PROCESSING FOR AUTOMATED CONCEPTUAL DATA MODELING

Abstract

This research involved the development of a natural language processing (NLP) architecture for the extraction of entity relation diagrams (ERDs) from natural language requirements specifications. Conceptual data modeling plays an important role in database and software design and many approaches to automating and developing software tools for this process have been attempted. NLP approaches to this problem appear to be plausible because compared to general free texts, natural language requirements documents are relatively formal and exhibit some special regularities which reduce the complexity of the problem. The approach taken here involves a loose integration of several linguistic components. Outputs from syntactic parsing are used by a set of hueristic rules developed for this particular domain to produce tuples representing the underlying meanings of the propositions in the documents and semantic resources are used to distinguish between correct and incorrect tuples. Finally the tuples are integrated into full ERD representations. The major challenge addressed in this research is how to bring the various resources to bear on the translation of the natural language documents into the formal language. This system is taken to be representative of a potential class of similar systems designed to translate documents in other restricted domains into corresponding formalisms. The system is incorporated into a tool that presents the final ERDs to users who can modify them in the attempt to produce an accurate ERD for the requirements document. An experiment demonstrated that users with limited experience in ERD specifications could produce better representations of requirements documents than they could without the system, and could do so in less time

    Similar works