27 research outputs found

    Introduction

    Get PDF

    Querying large treebanks: Benchmarking GrETEL indexing

    No full text
    The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the search space as small as possible before actually starting the treebank search, by pre-processing the treebank at hand. We recursively divide the treebank into smaller parts, called subtree-banks, which are then converted into database files. All subtree-banks are organized according to their linguistic dependency pattern, and labeled as such. Additionally, general patterns are linked to more specific ones. By doing so, we create millions of databases, and given a linguistic structure we know in which databases that structure can occur, leading up to a significant efficiency boost. We present the results of a benchmark experiment, testing the effect of the GrInding procedure on the SoNaR-500 treebank.status: publishe

    Discovery of association rules between syntactic variables. Data mining the Syntactic Atlas of the Dutch dialects.

    No full text
    This research applies an association rule mining technique to purely syntactic dialect data. The paper answers the research question of how relevant associations between syntactic variables can be discovered. The method calculates the proportional overlap between geographical distributions of syntactic microvariables and incorporates rule quality factors such as accuracy, coverage and completeness to measure the interestingness of the variable associations.The exploratory review of the results discusses several highly ranked association rules and also examines an implicational chain of syntactic variables

    Lexico-Semantic Multiword Expression Extraction

    No full text

    A memory-based classification approach to marker-based EBMT

    No full text

    Conditional entropy measures intelligibility among related languages

    No full text

    A pilot study for automatic semantic role labeling in a dutch corpus

    No full text
    corecore