382 research outputs found
Individual Differences and the Ergodicity Problem
Traditional research into individual differences (ID) in second language (L2) learning is based on group studies with the implicit assumption that findings can be generalized to the individual. In this article, we challenge this view. We argue that L2 learners do not form ergodic ensembles and that language learning data lack stability. The data from our experiment show that even highly similar learners in terms of ID show clearly different learning trajectories over time; however, we did find that those who showed the greatest degree of variability gained the most in proficiency. Such findings lead to the view that group studies and individual case studies are complementary. Group studies give us valuable information about the relative weight of individual factors that may play a role in L2 development, but longitudinal case studies are needed to understand the process of individual learners’ development
Multichronic complexity in second language development
Taking a dynamic systems perspective on second language development, this paper argues that development is change over time, which is never stable and has no end state. Moreover, time can be defined at different scales: from the millisecond, minute, week and year to the lifespan. At all scales we can see change over time in language development at different levels of granularity; however, the time scale and level of granularity we use determines to a great extent what we find. What seems a change at one level may be nothing more than natural variation at another one.Keywords: Multichronic complexity, language development, variation, time scale
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
We participated in three of the protein-protein interaction subtasks of the
Second BioCreative Challenge: classification of abstracts relevant for
protein-protein interaction (IAS), discovery of protein pairs (IPS) and text
passages characterizing protein interaction (ISS) in full text documents. We
approached the abstract classification task with a novel, lightweight linear
model inspired by spam-detection techniques, as well as an uncertainty-based
integration scheme. We also used a Support Vector Machine and the Singular
Value Decomposition on the same features for comparison purposes. Our approach
to the full text subtasks (protein pair and passage identification) includes a
feature expansion method based on word-proximity networks. Our approach to the
abstract classification task (IAS) was among the top submissions for this task
in terms of the measures of performance used in the challenge evaluation
(accuracy, F-score and AUC). We also report on a web-tool we produced using our
approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our
approach to the full text tasks resulted in one of the highest recall rates as
well as mean reciprocal rank of correct passages. Our approach to abstract
classification shows that a simple linear model, using relatively few features,
is capable of generalizing and uncovering the conceptual nature of
protein-protein interaction from the bibliome. Since the novel approach is
based on a very lightweight linear model, it can be easily ported and applied
to similar problems. In full text problems, the expansion of word features with
word-proximity networks is shown to be useful, though the need for some
improvements is discussed
Contextually-Dependent Lexical Semantics
Institute for Communicating and Collaborative SystemsThis thesis is an investigation of phenomena at the interface between syntax, semantics,
and pragmatics, with the aim of arguing for a view of semantic interpretation as lexically driven
yet contextually dependent. I examine regular, generative processes which operate
over the lexicon to induce verbal sense shifts, and discuss the interaction of these processes
with the linguistic or discourse context. I concentrate on phenomena where only an interaction
between all three linguistic knowledge sources can explain the constraints on verb
use: conventionalised lexical semantic knowledge constrains productive syntactic processes,
while pragmatic reasoning is both constrained by and constrains the potential interpretations
given to certain verbs. The phenomena which are closely examined are the behaviour of
PP sentential modifiers (specifically dative and directional PPs) with respect to the lexical
semantic representation of the verb phrases they modify, resultative constructions, and logical
metonymy.
The analysis is couched in terms of a lexical semantic representation drawing on Davis
(1995), Jackendoff (1983, 1990), and Pustejovsky (1991, 1995) which aims to capture “linguistically
relevant” components of meaning. The representation is shown to have utility for
modeling of the interaction between the syntactic form of an utterance and its meaning.
I introduce a formalisation of the representation within the framework of Head Driven
Phrase Structure Grammar (Pollard and Sag 1994), and rely on the model of discourse
coherence proposed by Lascarides and Asher (1992), Discourse in Commonsense Entailment.
I furthermore discuss the implications of the contextual dependency of semantic interpretation
for lexicon design and computational processing in Natural Language Understanding
systems
Protein annotation as term categorization in the gene ontology using word proximity networks
We addressed BioCreAtIvE Task 2, the problem of annotation of a protein with a node in the Gene Ontology (GO). We approached the task as a problem of categorizing terms derived from the document neighborhood of the given protein in the given document into nodes in the GO based on the lexical overlaps with terms on GO nodes and terms identified as related to those nodes. The system incorporates NLP components such as a morphological normalizer, a named entity recognizer, a statistical term frequency analyzer, and an unsupervised method for expanding words associated with GO ids based on a probability measure that captures word proximity (Rocha, 2002). The categorization methodology uses our novel Gene Ontology Categorizer (GOC) methodology (Joslyn et al. 2004) to select GO nodes as cluster heads for the terms in the input set based on the structure of the GO. Pre-processing Swiss-Prot and TrEMBL IDs were provided as input identifiers for the protein, so we needed to establish a set of names by which that protein could be referenced in the text. We made use of both the gene name and protein names that are in Swiss-Prot itself, when available, and a collection of synonyms constructed by Procter & Gamble Company. The fallback case was to us
The textual characteristics of traditional and Open Access scientific journals are similar
<p>Abstract</p> <p>Background</p> <p>Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption.</p> <p>Results</p> <p>We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities.</p> <p>Conclusion</p> <p>We did not find structural or semantic differences between the Open Access and traditional journal collections.</p
Mineral analysis reveals extreme manganese concentrations in wild harvested and commercially available edible termites
Open Access Journal; Published online: 09 April 2020Termites are widely used as a food resource, particularly in Africa and Asia. Markets for insects as food are also expanding worldwide. To inform the development of insect-based foods, we analysed selected minerals (Fe-Mn-Zn-Cu-Mg) in wild-harvested and commercially available termites. Mineral values were compared to selected commercially available insects. Alate termites, of the genera Macrotermes and Odontotermes, showed remarkably high manganese (Mn) content (292–515 mg/100 gdw), roughly 50–100 times the concentrations detected in other insects. Other mineral elements occur at moderate concentrations in all insects examined. On further examination, the Mn is located primarily in the abdomens of the Macrotermes subhyalinus; with scanning electron microscopy revealing small spherical structures highly enriched for Mn. We identify the fungus comb, of Macrotermes subhyanus, as a potential biological source of the high Mn concentrations. Consuming even small quantities of termite alates could exceed current upper recommended intakes for Mn in both adults and children. Given the widespread use of termites as food, a better understanding the sources, distribution and bio-availability of these high Mn concentrations in termite alates is needed
Gender equality and girls education: Investigating frameworks, disjunctures and meanings of quality education
The article draws on qualitative educational research across a diversity of low-income countries to examine the gendered inequalities in education as complex, multi-faceted and situated rather than a series of barriers to be overcome through linear input–output processes focused on isolated dimensions of quality. It argues that frameworks for thinking about educational quality often result in analyses of gender inequalities that are fragmented and incomplete. However, by considering education quality more broadly as a terrain of quality it investigates questions of educational transitions, teacher supply and community participation, and develops understandings of how education is experienced by learners and teachers in their gendered lives and their teaching practices. By taking an approach based on theories of human development the article identifies dynamics of power underpinning gender inequalities in the literature and played out in diverse contexts and influenced by social, cultural and historical contexts. The review and discussion indicate that attaining gender equitable quality education requires recognition and understanding of the ways in which inequalities intersect and interrelate in order to seek out multi-faceted strategies that address not only different dimensions of girls’ and women’s lives, but understand gendered relationships and structurally entrenched inequalities between women and men, girls and boys
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
- …