3 research outputs found

    Analysing features of lecture slides and past exam paper materials towards automatic associating E-materials for self-revision

    Get PDF
    Digital materials not only provide opportunities as enablers of e-learning development, but also create a new challenge. The current e-materials provided on a course website are individually designed for learning in classrooms rather than for revision. In order to enable the capability of e-materials to support a students revision, we need an efficient system to associate related pieces of different e-materials. In this case, the features of each item of e-material, including the structure and the technical terms they contain, need to be studied and applied in order to calculate the similarity between relevant e-materials. Even though difficulties regarding technical term extraction and the similarities between two text documents have been widely discussed, empirical experiments for particular types of e-learning materials (for instance, lecture slides and past exam papers) are still rare. In this paper, we propose a framework and relatedness model for associating lecture slides and past exam paper materials to support revision based on Natural Language Processing (NLP) techniques. We compare and evaluate the efficiency of different combinations of three weighted schemes, term frequency (TF), inverse document frequency (IDF), and term location (TL), for calculating the relatedness score. The experiments were conducted on 30 lectures (~900 slides) and 3 past exam papers (12 pages) of a data structures course at the authors’ institution. The findings indicate the appropriate features for calculating the relatedness score between lecture slides and past exam papers

    DEXTER: A workbench for automatic term extraction with specialized corpora

    Full text link
    [EN] Automatic term extraction has become a priority area of research within corpus processing. Despite the extensive literature in this field, there are still some outstanding issues that should be dealt with during the construction of term extractors, particularly those oriented to support research in terminology and terminography. In this regard, this article describes the design and development of DEXTER, an online workbench for the extraction of simple and complex terms from domain-specific corpora in English, French, Italian and Spanish. In this framework, three issues contribute to placing the most important terms in the foreground. First, unlike the elaborate morphosyntactic patterns proposed by most previous research, shallow lexical filters have been constructed to discard term candidates. Second, a large number of common stopwords are automatically detected by means of a method that relies on the IATE database together with the frequency distribution of the domain-specific corpus and a general corpus. Third, the term-ranking metric, which is grounded on the notions of salience, relevance and cohesion, is guided by the IATE database to display an adequate distribution of terms.Financial support for this research has been provided by the DGI, Spanish Ministry of Education and Science, grant FFI2014-53788-C3-1-P.Periñán-Pascual, C. (2018). DEXTER: A workbench for automatic term extraction with specialized corpora. Natural Language Engineering. 24(2):163-198. https://doi.org/10.1017/S1351324917000365S16319824
    corecore