382 research outputs found

    Individual Differences and the Ergodicity Problem

    Get PDF
    Traditional research into individual differences (ID) in second language (L2) learning is based on group studies with the implicit assumption that findings can be generalized to the individual. In this article, we challenge this view. We argue that L2 learners do not form ergodic ensembles and that language learning data lack stability. The data from our experiment show that even highly similar learners in terms of ID show clearly different learning trajectories over time; however, we did find that those who showed the greatest degree of variability gained the most in proficiency. Such findings lead to the view that group studies and individual case studies are complementary. Group studies give us valuable information about the relative weight of individual factors that may play a role in L2 development, but longitudinal case studies are needed to understand the process of individual learners’ development

    Multichronic complexity in second language development

    Get PDF
    Taking a dynamic systems perspective on second language development, this paper argues that development is change over time, which is never stable and has no end state. Moreover, time can be defined at different scales: from the millisecond, minute, week and year to the lifespan. At all scales we can see change over time in language development at different levels of granularity; however, the time scale and level of granularity we use  determines to a great extent what we find. What seems a change at one level may be nothing more than natural variation at another one.Keywords: Multichronic complexity, language development, variation, time scale

    Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

    Get PDF
    We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (IAS), discovery of protein pairs (IPS) and text passages characterizing protein interaction (ISS) in full text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam-detection techniques, as well as an uncertainty-based integration scheme. We also used a Support Vector Machine and the Singular Value Decomposition on the same features for comparison purposes. Our approach to the full text subtasks (protein pair and passage identification) includes a feature expansion method based on word-proximity networks. Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of the measures of performance used in the challenge evaluation (accuracy, F-score and AUC). We also report on a web-tool we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Our approach to abstract classification shows that a simple linear model, using relatively few features, is capable of generalizing and uncovering the conceptual nature of protein-protein interaction from the bibliome. Since the novel approach is based on a very lightweight linear model, it can be easily ported and applied to similar problems. In full text problems, the expansion of word features with word-proximity networks is shown to be useful, though the need for some improvements is discussed

    Contextually-Dependent Lexical Semantics

    Get PDF
    Institute for Communicating and Collaborative SystemsThis thesis is an investigation of phenomena at the interface between syntax, semantics, and pragmatics, with the aim of arguing for a view of semantic interpretation as lexically driven yet contextually dependent. I examine regular, generative processes which operate over the lexicon to induce verbal sense shifts, and discuss the interaction of these processes with the linguistic or discourse context. I concentrate on phenomena where only an interaction between all three linguistic knowledge sources can explain the constraints on verb use: conventionalised lexical semantic knowledge constrains productive syntactic processes, while pragmatic reasoning is both constrained by and constrains the potential interpretations given to certain verbs. The phenomena which are closely examined are the behaviour of PP sentential modifiers (specifically dative and directional PPs) with respect to the lexical semantic representation of the verb phrases they modify, resultative constructions, and logical metonymy. The analysis is couched in terms of a lexical semantic representation drawing on Davis (1995), Jackendoff (1983, 1990), and Pustejovsky (1991, 1995) which aims to capture “linguistically relevant” components of meaning. The representation is shown to have utility for modeling of the interaction between the syntactic form of an utterance and its meaning. I introduce a formalisation of the representation within the framework of Head Driven Phrase Structure Grammar (Pollard and Sag 1994), and rely on the model of discourse coherence proposed by Lascarides and Asher (1992), Discourse in Commonsense Entailment. I furthermore discuss the implications of the contextual dependency of semantic interpretation for lexicon design and computational processing in Natural Language Understanding systems

    Protein annotation as term categorization in the gene ontology using word proximity networks

    Get PDF
    We addressed BioCreAtIvE Task 2, the problem of annotation of a protein with a node in the Gene Ontology (GO). We approached the task as a problem of categorizing terms derived from the document neighborhood of the given protein in the given document into nodes in the GO based on the lexical overlaps with terms on GO nodes and terms identified as related to those nodes. The system incorporates NLP components such as a morphological normalizer, a named entity recognizer, a statistical term frequency analyzer, and an unsupervised method for expanding words associated with GO ids based on a probability measure that captures word proximity (Rocha, 2002). The categorization methodology uses our novel Gene Ontology Categorizer (GOC) methodology (Joslyn et al. 2004) to select GO nodes as cluster heads for the terms in the input set based on the structure of the GO. Pre-processing Swiss-Prot and TrEMBL IDs were provided as input identifiers for the protein, so we needed to establish a set of names by which that protein could be referenced in the text. We made use of both the gene name and protein names that are in Swiss-Prot itself, when available, and a collection of synonyms constructed by Procter & Gamble Company. The fallback case was to us

    The textual characteristics of traditional and Open Access scientific journals are similar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption.</p> <p>Results</p> <p>We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities.</p> <p>Conclusion</p> <p>We did not find structural or semantic differences between the Open Access and traditional journal collections.</p

    Mineral analysis reveals extreme manganese concentrations in wild harvested and commercially available edible termites

    Get PDF
    Open Access Journal; Published online: 09 April 2020Termites are widely used as a food resource, particularly in Africa and Asia. Markets for insects as food are also expanding worldwide. To inform the development of insect-based foods, we analysed selected minerals (Fe-Mn-Zn-Cu-Mg) in wild-harvested and commercially available termites. Mineral values were compared to selected commercially available insects. Alate termites, of the genera Macrotermes and Odontotermes, showed remarkably high manganese (Mn) content (292–515 mg/100 gdw), roughly 50–100 times the concentrations detected in other insects. Other mineral elements occur at moderate concentrations in all insects examined. On further examination, the Mn is located primarily in the abdomens of the Macrotermes subhyalinus; with scanning electron microscopy revealing small spherical structures highly enriched for Mn. We identify the fungus comb, of Macrotermes subhyanus, as a potential biological source of the high Mn concentrations. Consuming even small quantities of termite alates could exceed current upper recommended intakes for Mn in both adults and children. Given the widespread use of termites as food, a better understanding the sources, distribution and bio-availability of these high Mn concentrations in termite alates is needed

    Gender equality and girls education: Investigating frameworks, disjunctures and meanings of quality education

    Get PDF
    The article draws on qualitative educational research across a diversity of low-income countries to examine the gendered inequalities in education as complex, multi-faceted and situated rather than a series of barriers to be overcome through linear input–output processes focused on isolated dimensions of quality. It argues that frameworks for thinking about educational quality often result in analyses of gender inequalities that are fragmented and incomplete. However, by considering education quality more broadly as a terrain of quality it investigates questions of educational transitions, teacher supply and community participation, and develops understandings of how education is experienced by learners and teachers in their gendered lives and their teaching practices. By taking an approach based on theories of human development the article identifies dynamics of power underpinning gender inequalities in the literature and played out in diverse contexts and influenced by social, cultural and historical contexts. The review and discussion indicate that attaining gender equitable quality education requires recognition and understanding of the ways in which inequalities intersect and interrelate in order to seek out multi-faceted strategies that address not only different dimensions of girls’ and women’s lives, but understand gendered relationships and structurally entrenched inequalities between women and men, girls and boys

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
    corecore