9,722 research outputs found

    Analyzing collaborative learning processes automatically

    Get PDF
    In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in

    Coalescent Assimilation Across Wordboundaries in American English and in Polish English

    Get PDF
    Coalescent assimilation (CA), where alveolar obstruents /t, d, s, z/ in word-final position merge with word-initial /j/ to produce postalveolar /tʃ, dʒ, ʃ, ʒ/, is one of the most wellknown connected speech processes in English. Due to its commonness, CA has been discussed in numerous textbook descriptions of English pronunciation, and yet, upon comparing them it is difficult to get a clear picture of what factors make its application likely. This paper aims to investigate the application of CA in American English to see a) what factors increase the likelihood of its application for each of the four alveolar obstruents, and b) what is the allophonic realization of plosives /t, d/ if the CA does not apply. To do so, the Buckeye Corpus (Pitt et al. 2007) of spoken American English is analyzed quantitatively. As a second step, these results are compared with Polish English; statistics analogous to the ones listed above for American English are gathered for Polish English based on the PLEC corpus (Pęzik 2012). The last section focuses on what consequences for teaching based on a native speaker model the findings have. It is argued that a description of the phenomenon that reflects the behavior of speakers of American English more accurately than extant textbook accounts could be beneficial to the acquisition of these patterns

    Conditional Complexity of Compression for Authorship Attribution

    Get PDF
    We introduce new stylometry tools based on the sliced conditional compression complexity of literary texts which are inspired by the nearly optimal application of the incomputable Kolmogorov conditional complexity (and presumably approximates it). Whereas other stylometry tools can occasionally be very close for different authors, our statistic is apparently strictly minimal for the true author, if the query and training texts are sufficiently large, compressor is sufficiently good and sampling bias is avoided (as in the poll samplings). We tune it and test its performance on attributing the Federalist papers (Madison vs. Hamilton). Our results confirm the previous attribution of Federalist papers by Mosteller and Wallace (1964) to Madison using the Naive Bayes classifier and the same attribution based on alternative classifiers such as SVM, and the second order Markov model of language. Then we apply our method for studying the attribution of the early poems from the Shakespeare Canon and the continuation of Marlowe’s poem ‘Hero and Leander’ ascribed to G. Chapman.compression complexity, authorship attribution.

    On Descriptive Complexity, Language Complexity, and GB

    Get PDF
    We introduce LK,P2L^2_{K,P}, a monadic second-order language for reasoning about trees which characterizes the strongly Context-Free Languages in the sense that a set of finite trees is definable in LK,P2L^2_{K,P} iff it is (modulo a projection) a Local Set---the set of derivation trees generated by a CFG. This provides a flexible approach to establishing language-theoretic complexity results for formalisms that are based on systems of well-formedness constraints on trees. We demonstrate this technique by sketching two such results for Government and Binding Theory. First, we show that {\em free-indexation\/}, the mechanism assumed to mediate a variety of agreement and binding relationships in GB, is not definable in LK,P2L^2_{K,P} and therefore not enforcible by CFGs. Second, we show how, in spite of this limitation, a reasonably complete GB account of English can be defined in LK,P2L^2_{K,P}. Consequently, the language licensed by that account is strongly context-free. We illustrate some of the issues involved in establishing this result by looking at the definition, in LK,P2L^2_{K,P}, of chains. The limitations of this definition provide some insight into the types of natural linguistic principles that correspond to higher levels of language complexity. We close with some speculation on the possible significance of these results for generative linguistics.Comment: To appear in Specifying Syntactic Structures, papers from the Logic, Structures, and Syntax workshop, Amsterdam, Sept. 1994. LaTeX source with nine included postscript figure
    • 

    corecore