2 research outputs found

    C3S2E-2008-2016-FinalPrograms

    Get PDF
    This document records the final programs for each of the 9 meetings of the C* Conference on Computer Science & Software Engineering, C 3S2E which were organized in various locations on three continents. The papers published during these years are accessible from the digital librariy of ACM(2008-2016

    Semantics-Assisted Deep Web Query Interface Classification

    No full text
    [[abstract]]Huge amounts of structured data sources are hidden in the databases behind web forms. Volumes of deep web contents were estimated to be around 500 times those of surface web. However, many web forms are not deep web query interfaces. To retrieve contents in the web databases, an important task is to identify those web forms that are deep web query interfaces. Deep web contents normally are associated with a specific domain, and many domain semantics are embedded in the web forms. Additionally, returned HTML pages of deep web queries contain particular patterns, which could assist identifying query interfaces. Thus, we collect the following semantics to assist the classification: (1) feature words: for non-query forms and for keyword fields in deep web query interfaces; (2) common fields in a particular domain: their valid values and relationships, and their synonyms. We design and implement a Semantics-Assisted deep Web Query Interface Classifier (SAWQIC) system based on heuristics. In the pre-query analysis of SAWQIC, feature words of non-query form attributes are combined with heuristics to filter out non-query forms. For web forms passing the filtering, we utilize semantics in filling in valid input data for their components to submit the form. In the post-query analysis of SAWQIC, we then use heuristics in analyzing the returned HTML pages to identify the deep web query interfaces. The SAWQIC system is evaluated against web forms for the "Book" and "Job" domains. The experimental results illustrate that SAWQIC could generate highly effective classification measures.[[notice]]補正完
    corecore