498 research outputs found
Emergent inabilities? Inverse scaling over the course of pretraining
Does inverse scaling only occur as a function of model size, or can it also
occur over the course of training? We carry out an exploratory study
investigating whether the performance of language models on specific tasks can
decrease (while general performance remains high) during training on the
language modeling task. We find 8 tasks on which Pythia 12B (Biderman et al.,
2023) shows decreased performance over the course of training. Five of these
tasks (TruthfulQA-MC1, TruthfulQA-MC2, Hindsight Neglect, Memo Trap, and
Pattern Match Suppression) additionally show a consistent relationship whereby
larger language models show a greater decrease in performance the more they are
trained, despite showing standard (positive) scaling overall. This highlights
the importance of testing performance at all relevant benchmarks any time
models are trained on additional data, even if their overall performance
improvesComment: Accepted to Findings of EMNLP 202
Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns?
Some languages allow arguments to be omitted in certain contexts. Yet human
language comprehenders reliably infer the intended referents of these zero
pronouns, in part because they construct expectations about which referents are
more likely. We ask whether Neural Language Models also extract the same
expectations. We test whether 12 contemporary language models display
expectations that reflect human behavior when exposed to sentences with zero
pronouns from five behavioral experiments conducted in Italian by Carminati
(2005). We find that three models - XGLM 2.9B, 4.5B, and 7.5B - capture the
human behavior from all the experiments, with others successfully modeling some
of the results. This result suggests that human expectations about coreference
can be derived from exposure to language, and also indicates features of
language models that allow them to better reflect human behavior.Comment: Accepted at COLING 202
Can Peanuts Fall in Love with Distributional Semantics?
The context in which a sentence appears can drastically alter our
expectations about upcoming words - for example, following a short story
involving an anthropomorphic peanut, experimental participants are more likely
to expect the sentence 'the peanut was in love' than 'the peanut was salted',
as indexed by N400 amplitude (Nieuwland & van Berkum, 2006). This rapid and
dynamic updating of comprehenders' expectations about the kind of events that a
peanut may take part in based on context has been explained using the construct
of Situation Models - updated mental representations of key elements of an
event under discussion, in this case, the peanut protagonist. However, recent
work showing that N400 amplitude can be predicted based on distributional
information alone raises the question whether situation models are in fact
necessary for the kinds of contextual effects observed in previous work. To
investigate this question, we attempt to model the results of Nieuwland and van
Berkum (2006) using six computational language models and three sets of word
vectors, none of which have explicit situation models or semantic grounding. We
find that the effect found by Nieuwland and van Berkum (2006) can be fully
modeled by two language models and two sets of word vectors, with others
showing a reduced effect. Thus, at least some processing effects normally
explained through situation models may not in fact require explicit situation
models
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models
Abstract grammatical knowledge - of parts of speech and grammatical patterns
- is key to the capacity for linguistic generalization in humans. But how
abstract is grammatical knowledge in large language models? In the human
literature, compelling evidence for grammatical abstraction comes from
structural priming. A sentence that shares the same grammatical structure as a
preceding sentence is processed and produced more readily. Because confounds
exist when using stimuli in a single language, evidence of abstraction is even
more compelling from crosslingual structural priming, where use of a syntactic
structure in one language primes an analogous structure in another language. We
measure crosslingual structural priming in large language models, comparing
model behavior to human experimental results from eight crosslingual
experiments covering six languages, and four monolingual structural priming
experiments in three non-English languages. We find evidence for abstract
monolingual and crosslingual grammatical representations in the models that
function similarly to those found in humans. These results demonstrate that
grammatical representations in multilingual language models are not only
similar across languages, but they can causally influence text produced in
different languages.Comment: Accepted at EMNLP 202
Price Discovery and the Accuracy of Consolidated Data Feeds in the U.S. Equity Markets
Both the scientific community and the popular press have paid much attention
to the speed of the Securities Information Processor, the data feed
consolidating all trades and quotes across the US stock market. Rather than the
speed of the Securities Information Processor, or SIP, we focus here on its
accuracy. Relying on Trade and Quote data, we provide various measures of SIP
latency relative to high-speed data feeds between exchanges, known as direct
feeds. We use first differences to highlight not only the divergence between
the direct feeds and the SIP, but also the fundamental inaccuracy of the SIP.
We find that as many as 60 percent or more of trades are reported out of
sequence for stocks with high trade volume, therefore skewing simple measures
such as returns. While not yet definitive, this analysis supports our
preliminary conclusion that the underlying infrastructure of the SIP is
currently unable to keep pace with the trading activity in today's stock
market.Comment: 18 pages, 20 figures, 2 table
Accelerating slip rates on the Puente Hills blind thrust fault system beneath metropolitan Los Angeles, California, USA
Slip rates represent the average displacement across a fault over time and are essential to estimating earthquake recurrence for probabilistic seismic hazard assessments. We demonstrate that the slip rate on the western segment of the Puente Hills blind thrust fault system, which is beneath downtown Los Angeles, California (USA), has accelerated from ∼0.22 mm/yr in the late Pleistocene to ∼1.33 mm/yr in the Holocene. Our analysis is based on syntectonic strata derived from the Los Angeles River, which has continuously buried a fold scarp above the blind thrust. Slip on the fault beneath our field site began during the late-middle Pleistocene and progressively increased into the Holocene. This increase in rate implies that the magnitudes and/or the frequency of earthquakes on this fault segment have increased over time. This challenges the characteristic earthquake model and presents an evolving and potentially increasing seismic hazard to metropolitan Los Angeles
Are genetic risk factors for psychosis also associated with dimension-specific psychotic experiences in adolescence?
Psychosis has been hypothesised to be a continuously distributed quantitative phenotype and disorders such as schizophrenia and bipolar disorder represent its extreme manifestations. Evidence suggests that common genetic variants play an important role in liability to both schizophrenia and bipolar disorder. Here we tested the hypothesis that these common variants would also influence psychotic experiences measured dimensionally in adolescents in the general population. Our aim was to test whether schizophrenia and bipolar disorder polygenic risk scores (PRS), as well as specific single nucleotide polymorphisms (SNPs) previously identified as risk variants for schizophrenia, were associated with adolescent dimension-specific psychotic experiences. Self-reported Paranoia, Hallucinations, Cognitive Disorganisation, Grandiosity, Anhedonia, and Parent-rated Negative Symptoms, as measured by the Specific Psychotic Experiences Questionnaire (SPEQ), were assessed in a community sample of 2,152 16-year-olds. Polygenic risk scores were calculated using estimates of the log of odds ratios from the Psychiatric Genomics Consortium GWAS stage-1 mega-analysis of schizophrenia and bipolar disorder. The polygenic risk analyses yielded no significant associations between schizophrenia and bipolar disorder PRS and the SPEQ measures. The analyses on the 28 individual SNPs previously associated with schizophrenia found that two SNPs in TCF4 returned a significant association with the SPEQ Paranoia dimension, rs17512836 (p-value=2.57x10-4) and rs9960767 (p-value=6.23x10-4). Replication in an independent sample of 16-year-olds (N=3,427) assessed using the Psychotic-Like Symptoms Questionnaire (PLIKS-Q), a composite measure of multiple positive psychotic experiences, failed to yield significant results. Future research with PRS derived from larger samples, as well as larger adolescent validation samples, would improve the predictive power to test these hypotheses further. The challenges of relating adult clinical diagnostic constructs such as schizophrenia to adolescent psychotic experiences at a genetic level are discussed
A Comparison of Rectal Diazepam Gel and Placebo for Acute Repetitive Seizures
ABSTRACT
Background Acute repetitive seizures are readily recognizable episodes involving increased seizure frequency. Urgent treatment is often required. Rectal diazepam gel is a promising therapy.
Methods We conducted a randomized, doubleblind, parallel-group, placebo-controlled study of home-based treatment for acute repetitive seizures. Patients were randomly assigned to receive either rectal diazepam gel, at doses ranging from 0.2 to 0.5 mg per kilogram of body weight on the basis of age, or placebo. Children received one dose at the onset of acute repetitive seizures and a second dose four hours later. Adults received three doses — one dose at onset, and two more doses 4 and 12 hours after onset. Treatment was administered by a care giver, such as a parent, who had received special training. The number of seizures after the first dose was counted for 12 hours in children and for 24 hours in adults.
Results Of 125 study patients (64 assigned to diazepam and 61 to placebo) with a history of acute repetitive seizures, 91 (47 children and 44 adults) were treated for an exacerbation of seizures during the study period. Diazepam treatment was superior to placebo with regard to the outcome variables related to efficacy: reduced seizure frequency (P\u3c0.001) and improved global assessment of treatment outcome by the care giver (frequency and severity of seizures and drug toxicity) (P\u3c0.001). Post hoc analysis showed diazepam to be superior to placebo in reducing seizure frequency in both children (P\u3c0.001) and adults (P=0.02), but only in children was it superior with regard to improvement in global outcome (P\u3c0.001). The time to the first recurrence of seizures after initial treatment was longer for the patients receiving diazepam (P\u3c0.001). Thirty-five patients reported at least one adverse effect of treatment; somnolence was the most frequent. Respiratory depression was not reported.
Conclusions Rectal diazepam gel, administered at home by trained care givers, is an effective and welltolerated treatment for acute repetitive seizures. (N Engl J Med 1998;338:1869-75.
- …