2 research outputs found

    A Granular-based Approach for Semisupervised Web Information Labeling

    Get PDF
    A key issue when mining web information is the labeling problem: data is abundant on the web but is unlabelled. In this thesis, we address this problem by proposing i) a novel theoretical granular model that structures categorical noun phrase instances as well as semantically related noun phrase pairs from a given corpus representing unstructured web pages with a variant of Tolerance Rough Sets Model (TRSM), ii) a semi-supervised learning algorithm called Tolerant Pattern Learner (TPL) that labels categorical instances as well as relations. TRSM has so far been successfully employed for document retrieval and classification, but not for learning categorical and relational phrases. We use the ontological information from the Never Ending Language Learner (Nell) system. We compared the performance of our algorithm with Coupled Bayesian Sets (CBS) and Coupled Pattern Learner (CPL) algorithms for categorical and relational labeling, respectively. Experimental results suggest that TPL can achieve comparable performance with CBS and CPL in terms of precision.Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grant 194376.Master of Science in Applied Computer Scienc
    corecore