2,666 research outputs found

    Revealing the structure of language model capabilities

    Full text link
    Building a theoretical understanding of the capabilities of large language models (LLMs) is vital for our ability to predict and explain the behavior of these systems. Here, we investigate the structure of LLM capabilities by extracting latent capabilities from patterns of individual differences across a varied population of LLMs. Using a combination of Bayesian and frequentist factor analysis, we analyzed data from 29 different LLMs across 27 cognitive tasks. We found evidence that LLM capabilities are not monolithic. Instead, they are better explained by three well-delineated factors that represent reasoning, comprehension and core language modeling. Moreover, we found that these three factors can explain a high proportion of the variance in model performance. These results reveal a consistent structure in the capabilities of different LLMs and demonstrate the multifaceted nature of these capabilities. We also found that the three abilities show different relationships to model properties such as model size and instruction tuning. These patterns help refine our understanding of scaling laws and indicate that changes to a model that improve one ability might simultaneously impair others. Based on these findings, we suggest that benchmarks could be streamlined by focusing on tasks that tap into each broad model ability.Comment: 10 pages, 3 figures + references and appendices, for data and analysis code see https://github.com/RyanBurnell/revealing-LLM-capabilitie

    Working memory for social information: Chunking or domain-specific buffer?

    Get PDF
    Humans possess unique social abilities that set us apart from other species. These abilities may be partially supported by a large capacity for maintaining and manipulating social information. Efficient social working memory might arise from two different sources: chunking of social information or a domain-specific buffer. We test these hypotheses with functional magnetic resonance imaging (fMRI) by manipulating sociality and working memory load in an n-back paradigm. We observe (i) an effect of load in the frontoparietal control network, (ii) an effect of sociality in regions associated with social cognition and face processing, and (iii) an interaction within the frontoparietal network such that social load has a smaller effect than nonsocial load. These results support the hypothesis that working memory is more efficient for social information than for nonsocial information, and suggest that chunking, rather than a domain-specific buffer, is the mechanism of this greater efficiency

    Psychometric Network Analysis of the Hungarian WAIS

    Get PDF
    The positive manifold—the finding that cognitive ability measures demonstrate positive correlations with one another—has led to models of intelligence that include a general cognitive ability or general intelligence (g). This view has been reinforced using factor analysis and reflective, higher-order latent variable models. However, a new theory of intelligence, Process Overlap Theory (POT), posits that g is not a psychological attribute but an index of cognitive abilities that results from an interconnected network of cognitive processes. These competing theories of intelligence are compared using two different statistical modeling techniques: (a) latent variable modeling and (b) psychometric network analysis. Network models display partial correlations between pairs of observed variables that demonstrate direct relationships among observations. Secondary data analysis was conducted using the Hungarian Wechsler Adult Intelligence Scale Fourth Edition (H-WAIS-IV). The underlying structure of the H-WAIS-IV was first assessed using confirmatory factor analysis assuming a reflective, higher-order model and then reanalyzed using psychometric network analysis. The compatibility (or lack thereof) of these theoretical accounts of intelligence with the data are discussed

    In defense of the personal/impersonal distinction in moral psychology research: Cross-cultural validation of the dual process model of moral judgment

    Get PDF
    The dual process model of moral judgment (DPM; Greene et al., 2004) argues that such judgments are influenced by both emotion-laden intuition and controlled reasoning. These influences are associated with distinct neural circuitries and different response tendencies. After reanalyzing data from an earlier study, McGuire et al. (2009) questioned the level of support for the dual process model and asserted that the distinction between emotion evoking moral dilemmas (personal dilemmas) and those that do not trigger such intuitions (impersonal dilemmas) is spurious. Using similar reanalysis methods on data reported by Moore, Clark, and Kane (2008), we show that the personal/impersonal distinction is reliable. Furthermore, new data show that this distinction is fundamental to moral judgment across widely different cultures (U.S. and China) and supports claims made by the DPM

    Radio Observations of the Hubble Deep Field South region: I. Survey Description and Initial Results

    Full text link
    This paper is the first of a series describing the results of the Australia Telescope Hubble Deep Field South (ATHDFS) radio survey. The survey was conducted at four wavelengths - 20, 11, 6, and 3 cm, over a 4-year period, and achieves an rms sensitivity of about 10 microJy at each wavelength. We describe the observations and data reduction processes, and present data on radio sources close to the centre of the HDF-S. We discuss in detail the properties of a subset of these sources. The sources include both starburst galaxies and galaxies powered by an active galactic nucleus, and range in redshift from 0.1 to 2.2. Some of them are characterised by unusually high radio-to-optical luminosities, presumably caused by dust extinction.Comment: Accepted by AJ. 32 pages, 4 tables, 3 figures. PDF with full-resolution figures is on http://www.atnf.csiro.au/people/rnorris/N197.pd

    Seasonal Variations of Glaciochemical, Isotopic and Stratigraphic Properties in Siple Dome (Antarctica) Surface Snow

    Get PDF
    Six snow-pit records recovered from Siple Dome, West Antarctica, during 1994 are used to study seasonal variations in chemical (major ion and H2O2), isotopic (deuterium) and physical stratigraphic properties during the 1988-94 period. Comparison of dD measurements and satellite-derived brightness temperature for the Siple Dome area suggests that most seasonal dD maxima occur within ±4 weeks of each 1 January. Several other chemical species (H2O2, non-sea-salt (nss) SO42-, methanesulfonic acid and NO3-) show coeval peaks with dD, together providing an accurate method for identifying summer accumulation. Sea-salt-derived species generally peak during winter/spring, but episodic input is noted throughout some years. No reliable seasonal signal is identified in species with continental sources (nssCa2+, nssMg2+), NH4+ or nssCl-. Visible strata such as large depth-hoar layers (\u3e5 cm) are associated with summer accumulation and its metamorphosis, but smaller hoar layers and crusts are more difficult to interpret. A multi-parameter approach is found to provide the most accurate dating of these snow-pit records, and is used to determine annual layer thicknesses at each site. Significant spatial accumulation variability exists on an annual basis, but mean accumulation in the sampled 10 km2 grid for the 1988-94 period is fairly uniform

    Intravital FRAP imaging using an E-cadherin-GFP mouse reveals disease- and drug-dependent dynamic regulation of cell-cell junctions in live tissue

    Get PDF
    E-cadherin-mediated cell-cell junctions play a prominent role in maintaining the epithelial architecture. The disruption or deregulation of these adhesions in cancer can lead to the collapse of tumor epithelia that precedes invasion and subsequent metastasis. Here we generated an E-cadherin-GFP mouse that enables intravital photobleaching and quantification of E-cadherin mobility in live tissue without affecting normal biology. We demonstrate the broad applications of this mouse by examining E-cadherin regulation in multiple tissues, including mammary, brain, liver, and kidney tissue, while specifically monitoring E-cadherin mobility during disease progression in the pancreas. We assess E-cadherin stability in native pancreatic tissue upon genetic manipulation involving Kras and p53 or in response to anti-invasive drug treatment and gain insights into the dynamic remodeling of E-cadherin during in situ cancer progression. FRAP in the E-cadherin-GFP mouse, therefore, promises to be a valuable tool to fundamentally expand our understanding of E-cadherin-mediated events in native microenvironments

    Mineralocorticoid Excess or Glucocorticoid Insufficiency:Renal and Metabolic Phenotypes in a Rat Hsd11b2 Knockout Model

    Get PDF
    Obesity and hypertension are 2 major health issues of the 21st century. The syndrome of apparent mineralocorticoid excess is caused by deficiency of 11β-hydroxysteroid dehydrogenase type 2 (Hsd11b2), which normally inactivates glucocorticoids, rendering the mineralocorticoid receptor aldosterone–specific. The metabolic consequences of Hsd11b2 knockout in the rat are investigated in parallel with electrolyte homeostasis. Hsd11b2 was knocked out, by pronuclear microinjection of targeted zinc-finger nuclease mRNAs, and 1 line was characterized for its response to renal and metabolic challenges. Plasma 11-dehydrocorticosterone was below detection thresholds, and Hsd11b2 protein was undetected by Western blot, indicating complete ablation. Homozygotes were 13% smaller than wild-type littermates, and were polydipsic and polyuric. Their kidneys, adrenals, and hearts were significantly enlarged, but mesenteric fat pads and liver were significantly smaller. On a 0.3% Na diet, mean arterial blood pressure was ≈65 mm Hg higher than controls but only 25 mm Hg higher on a 0.03% Na(+) diet. Urinary Na/K ratio of homozygotes was similar to controls on 0.3% Na(+) diet but urinary albumin and calcium were elevated. Corticosterone and aldosterone levels showed normal circadian variation on both a 0.3% and 0.03% Na(+) diet, but plasma renin was suppressed in homozygotes on both diets. Plasma glucose responses to an oral glucose challenge were reduced despite low circulating insulin, indicating much greater sensitivity to insulin in homozygotes. The rat model reveals mechanisms linking electrolyte homeostasis and metabolic control through the restriction of Hsd11b1 substrate availability
    corecore