31 research outputs found

    Word-Level Representation From Bytes For Language Modeling

    Full text link
    Modern language models mostly take sub-words as input, a design that balances the trade-off between vocabulary size, number of parameters, and performance. However, sub-word tokenization still has disadvantages like not being robust to noise and difficult to generalize to new languages. Also, the current trend of scaling up models reveals that larger models require larger embeddings but that makes parallelization hard. Previous work on image classification proves splitting raw input into a sequence of chucks is a strong, model-agnostic inductive bias. Based on this observation, we rethink the existing character-aware method that takes character-level inputs but makes word-level sequence modeling and prediction. We overhaul this method by introducing a cross-attention network that builds word-level representation directly from bytes, and a sub-word level prediction based on word-level hidden states to avoid the time and space requirement of word-level prediction. With these two improvements combined, we have a token free model with slim input embeddings for downstream tasks. We name our method Byte2Word and perform evaluations on language modeling and text classification. Experiments show that Byte2Word is on par with the strong sub-word baseline BERT but only takes up 10\% of embedding size. We further test our method on synthetic noise and cross-lingual transfer and find it competitive to baseline methods on both settings.Comment: preprin

    gCAPjoint, A Software Package for Full Moment Tensor Inversion of Moderately Strong Earthquakes with Local and Teleseismic Waveforms

    Get PDF
    Earthquake moment tensors and focal depths are crucial to assessing seismic hazards and studying active tectonic and volcanic processes. Although less powerful than strong earthquakes (M 7+), moderately strong earthquakes (M 5–6.5) occur more frequently and extensively, which can cause severe damages in populated areas. The inversion of moment tensors is usually affected by insufficient local waveform data (epicentral distance <5°⁠) in sparse seismic networks. It would be necessary to combine local and teleseismic data (epicentral distance 30°–90°) for a joint inversion. In this study, we present the generalized cut‐and‐paste joint (gCAPjoint) algorithm to jointly invert full moment tensor and centroid depth with local and teleseismic broadband waveforms. To demonstrate the effectiveness and explore the limitations of this algorithm, we perform case studies on three earthquakes with different tectonic settings and source properties. Comparison of our results with global centroid moment tensor and other catalog solutions illustrates that both non‐double‐couple compositions of the focal mechanisms and centroid depths can be reliably recovered for very shallow (⁠<10  km⁠) earthquakes and intermediate‐depth events with this software package

    Aridity-driven shift in biodiversity–soil multifunctionality relationships

    Get PDF
    From Springer Nature via Jisc Publications RouterHistory: received 2021-01-07, accepted 2021-08-12, registration 2021-08-25, pub-electronic 2021-09-09, online 2021-09-09, collection 2021-12Publication status: PublishedFunder: National Natural Science Foundation of China (National Science Foundation of China); doi: https://doi.org/10.13039/501100001809; Grant(s): 31770430Abstract: Relationships between biodiversity and multiple ecosystem functions (that is, ecosystem multifunctionality) are context-dependent. Both plant and soil microbial diversity have been reported to regulate ecosystem multifunctionality, but how their relative importance varies along environmental gradients remains poorly understood. Here, we relate plant and microbial diversity to soil multifunctionality across 130 dryland sites along a 4,000 km aridity gradient in northern China. Our results show a strong positive association between plant species richness and soil multifunctionality in less arid regions, whereas microbial diversity, in particular of fungi, is positively associated with multifunctionality in more arid regions. This shift in the relationships between plant or microbial diversity and soil multifunctionality occur at an aridity level of ∼0.8, the boundary between semiarid and arid climates, which is predicted to advance geographically ∼28% by the end of the current century. Our study highlights that biodiversity loss of plants and soil microorganisms may have especially strong consequences under low and high aridity conditions, respectively, which calls for climate-specific biodiversity conservation strategies to mitigate the effects of aridification

    Search for dark matter produced in association with bottom or top quarks in √s = 13 TeV pp collisions with the ATLAS detector

    Get PDF
    A search for weakly interacting massive particle dark matter produced in association with bottom or top quarks is presented. Final states containing third-generation quarks and miss- ing transverse momentum are considered. The analysis uses 36.1 fb−1 of proton–proton collision data recorded by the ATLAS experiment at √s = 13 TeV in 2015 and 2016. No significant excess of events above the estimated backgrounds is observed. The results are in- terpreted in the framework of simplified models of spin-0 dark-matter mediators. For colour- neutral spin-0 mediators produced in association with top quarks and decaying into a pair of dark-matter particles, mediator masses below 50 GeV are excluded assuming a dark-matter candidate mass of 1 GeV and unitary couplings. For scalar and pseudoscalar mediators produced in association with bottom quarks, the search sets limits on the production cross- section of 300 times the predicted rate for mediators with masses between 10 and 50 GeV and assuming a dark-matter mass of 1 GeV and unitary coupling. Constraints on colour- charged scalar simplified models are also presented. Assuming a dark-matter particle mass of 35 GeV, mediator particles with mass below 1.1 TeV are excluded for couplings yielding a dark-matter relic density consistent with measurements

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals &lt;1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    Measurement of jet fragmentation in Pb+Pb and pppp collisions at sNN=2.76\sqrt{{s_\mathrm{NN}}} = 2.76 TeV with the ATLAS detector at the LHC

    Get PDF

    What Dense Graph Do You Need for Self-Attention?

    Full text link
    Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and memory complexities. Recent works propose sparse Transformers with attention on sparse graphs to reduce complexity and remain strong performance. While effective, the crucial parts of how dense a graph needs to be to perform well are not fully explored. In this paper, we propose Normalized Information Payload (NIP), a graph scoring function measuring information transfer on graph, which provides an analysis tool for trade-offs between performance and complexity. Guided by this theoretical analysis, we present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer while yielding O(NlogN)O(N\log N) complexity with sequence length NN. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.Comment: Accepted by ICML 2022. Code is available at https://github.com/yxzwang/Normalized-Information-Payloa

    Butterfly-like enantiomerically homochiral {CoII6CoIII4} clusters exhibiting both slow magnetic relaxation and ferroelectric property

    No full text
    A pair of enantiomerically homochiral {CoII6CoIII4} clusters featuring a butterfly-like structure, [CoII6CoIII4(μ3-OH)(μ3-X)(S-pa)4(pdm)6(pdmH)2](ClO4)4·3.5H2O (S-1) and [CoII6CoIII4(μ3-OH)(μ3-X)(R-pa)4(pdm)6(pdmH)2](ClO4)4·3.5H2O (R-1) (X = OH or OMe, S- or R-paH = S- or R-phenylalaninol and pdmH2 = pyridine-2,6-diyldimethanol), have been synthesized and structurally characterized. They are the second largest homochiral Co clusters constructed by chiral ligands. They are also the first example of high-nuclearity homochiral Co clusters having both slow magnetic relaxation behaviour and ferroelectric property

    A new route to prepare multiresponsive organogels from a block ionomer via charge-driven assembly

    Full text link
    We report a novel route to prepare multiresponsive organogels through charge-driven assembly between a block ionomer and a diblock copolymer. The ionic complex aggregates to form spherical cores, which are connected by the middle block of the block ionomer to form gels. The organogels are responsive to acids, amines and salts
    corecore