169 research outputs found

    Toy Models of Superposition

    Full text link
    Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising connection to the geometry of uniform polytopes, and evidence of a link to adversarial examples. We also discuss potential implications for mechanistic interpretability.Comment: Also available at https://transformer-circuits.pub/2022/toy_model/index.htm

    Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets.</p> <p>Results</p> <p>Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not.</p> <p>Conclusion</p> <p>On the basis of chemical structure content <it>per se </it>public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.</p

    BRCA2 polymorphic stop codon K3326X and the risk of breast, prostate, and ovarian cancers

    Get PDF
    Background: The K3326X variant in BRCA2 (BRCA2*c.9976A&gt;T; p.Lys3326*; rs11571833) has been found to be associated with small increased risks of breast cancer. However, it is not clear to what extent linkage disequilibrium with fully pathogenic mutations might account for this association. There is scant information about the effect of K3326X in other hormone-related cancers. Methods: Using weighted logistic regression, we analyzed data from the large iCOGS study including 76 637 cancer case patients and 83 796 control patients to estimate odds ratios (ORw) and 95% confidence intervals (CIs) for K3326X variant carriers in relation to breast, ovarian, and prostate cancer risks, with weights defined as probability of not having a pathogenic BRCA2 variant. Using Cox proportional hazards modeling, we also examined the associations of K3326X with breast and ovarian cancer risks among 7183 BRCA1 variant carriers. All statistical tests were two-sided. Results: The K3326X variant was associated with breast (ORw = 1.28, 95% CI = 1.17 to 1.40, P = 5.9x10- 6) and invasive ovarian cancer (ORw = 1.26, 95% CI = 1.10 to 1.43, P = 3.8x10-3). These associations were stronger for serous ovarian cancer and for estrogen receptor–negative breast cancer (ORw = 1.46, 95% CI = 1.2 to 1.70, P = 3.4x10-5 and ORw = 1.50, 95% CI = 1.28 to 1.76, P = 4.1x10-5, respectively). For BRCA1 mutation carriers, there was a statistically significant inverse association of the K3326X variant with risk of ovarian cancer (HR = 0.43, 95% CI = 0.22 to 0.84, P = .013) but no association with breast cancer. No association with prostate cancer was observed. Conclusions: Our study provides evidence that the K3326X variant is associated with risk of developing breast and ovarian cancers independent of other pathogenic variants in BRCA2. Further studies are needed to determine the biological mechanism of action responsible for these associations

    Synthetic strategies to nanostructured photocatalysts for CO2 reduction to solar fuels and chemicals

    Get PDF
    Artificial photosynthesis represents one of the great scientific challenges of the 21st century, offering the possibility of clean energy through water photolysis and renewable chemicals through CO2 utilisation as a sustainable feedstock. Catalysis will undoubtedly play a key role in delivering technologies able to meet these goals, mediating solar energy via excited generate charge carriers to selectively activate molecular bonds under ambient conditions. This review describes recent synthetic approaches adopted to engineer nanostructured photocatalytic materials for efficient light harnessing, charge separation and the photoreduction of CO2 to higher hydrocarbons such as methane, methanol and even olefins

    Leaky doors: private captivity as a prominent source of bird introductions in Australia

    Get PDF
    The international pet trade is a major source of emerging invasive vertebrate species. We used online resources as a novel source of information for accidental bird escapes, and we investigated the factors that influence the frequency and distribution of bird escapes at a continental scale. We collected information on over 5,000 pet birds reported to be missing on animal websites during the last 15 years in Australia. We investigated whether variables linked to pet ownership successfully predicted bird escapes, and we assessed the potential distribution of these escapes. Most of the reported birds were parrots (> 90%), thus, we analysed factors associated with the frequency of parrot escapes. We found that bird escapes in Australia are much more frequent than previously acknowledged. Bird escapes were reported more frequently within, or around, large Australian capital cities. Socio-economic factors, such as the average personal income level of the community, and the level of human modification to the environment were the best predictors of bird escapes. Cheaper parrot species, Australian natives, and parrot species regarded as peaceful or playful were the most frequently reported escapees. Accidental introductions have been overlooked as an important source of animal incursions. Information on bird escapes is available online in many higher income countries and, in Australia, this is particularly apparent for parrot species. We believe that online resources may provide useful tools for passive surveillance for non-native pet species. Online surveillance will be particularly relevant for species that are highly reported, such as parrots, and species that are either valuable or highly commensal.Miquel Vall-llosera, Phillip Casse
    • 

    corecore