2 research outputs found

    Autonomous Consolidation of Heterogeneous Record-Structured HTML Data in Chameleon

    Get PDF
    While progress has been made in querying digital information contained in XML and HTML documents, success in retrieving information from the so called hidden Web (data behind Web forms) has been modest. There has been a nascent trend of developing autonomous tools for extracting information from the hidden Web. Automatic tools for ontology generation, wrapper generation, Weborm querying, response gathering, etc., have been reported in recent research. This thesis presents a system called Chameleon for automatic querying of and response gathering from the hidden Web. The approach to response gathering is based on automatic table structure identification, since most information repositories of the hidden Web are structured databases, and so the information returned in response to a query will have regularities. Information extraction from the identified record structures is performed based on domain knowledge corresponding to the domain specified in a query. So called domain plug-ins are used to make the dynamically generated wrappers domain-specific, rather than conventionally used document-specific

    Autonomous Consolidation of Heterogeneous Record-Structured HTML Data in Chameleon

    Get PDF
    While progress has been made in querying digital information contained in XML and HTML documents, success in retrieving information from the so called hidden Web (data behind Web forms) has been modest. There has been a nascent trend of developing autonomous tools for extracting information from the hidden Web. Automatic tools for ontology generation, wrapper generation, Weborm querying, response gathering, etc., have been reported in recent research. This thesis presents a system called Chameleon for automatic querying of and response gathering from the hidden Web. The approach to response gathering is based on automatic table structure identification, since most information repositories of the hidden Web are structured databases, and so the information returned in response to a query will have regularities. Information extraction from the identified record structures is performed based on domain knowledge corresponding to the domain specified in a query. So called domain plug-ins are used to make the dynamically generated wrappers domain-specific, rather than conventionally used document-specific
    corecore