5 research outputs found

    Exploiting coarse grained parallelism in conceptual data mining: finding a needle in a haystack as a distributed effort

    Get PDF
    A parallel implementation of Ganter’s algorithm to calculate concept lattices for Formal Concept Analysis is presented. A benchmark was executed to experimentally determine the algorithm’s performance, including an AMD Athlon64, Intel dual Xeon, and UltraSPARC T1, with respectively 1, 4, and 24 threads in parallel. Two subsets of Cranfield’s collection were chosen as document set. In addition, the theoretically maximum performance was determined. Due to scheduling problems, the performance of the UltraSPARC was disappointing. Two alternate schedulers are proposed to tackle this problem. It is shown that, given a good scheduler, the algorithm can massively exploit multi-threading architectures and so, substantially reduce the computational burden of Formal Concept Analysis

    Spelling errors of 24 cohorts of children across primary school 2012-2015: A corpus study

    No full text
    In this paper we present a study of some spelling error types that Dutch primary school children made in the dictations and in the free or themed texts they contributed to the BasiScript corpus, i.e. a corpus comprising child written output produced between 2012 and 2015. The present article first briefly describes the corpus. Then it presents an analysis of the spelling errors that occurred in a selected set of words in the dictations regarding diphthongs (in grades 2 and 3) and verb forms (in grades 4 and 5) – which are notoriously difficult to spell for these age groups. In our analysis we investigate whether the frequencies of the words in the BasiLex corpus (a corpus of child written input) predict the spelling errors and whether there is a correlation between number of incorrect spellings of the words in the dictations and in the free texts and themed texts of the respective grades

    Distributions of cognates in Europe as based on Levenshtein distance

    Get PDF
    Contains fulltext : 101651.pdf (publisher's version ) (Closed access)Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.10 p
    corecore