695 research outputs found

    The Locality and Symmetry of Positional Encodings

    Full text link
    Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not fully understood, especially given recent findings that various positional encodings are insensitive to word order. In this work, we conduct a systematic study of positional encodings in \textbf{Bidirectional Masked Language Models} (BERT-style) , which complements existing work in three aspects: (1) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry; (2) We show that the two properties are closely correlated with the performances of downstream tasks; (3) We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly. We believe that these results are the basis for developing better PEs for transformer-based language models. The code is available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}Comment: Long Paper in Findings of EMNLP2

    STAR: Steiner tree approximation in relationship-graphs

    No full text
    Large-scale graphs and networks are abundant in modern information systems: entity-relationship graphs over relational data or Web-extracted entities, biological networks, social online communities, knowledge bases, and many more. Often such data comes with expressive node and edge labels that allow an interpretation as a semantic graph, and edge weights that reflect the strengths of semantic relations between entities. Finding close relationships between a given set of two, three, or more entities is an important building block for many search, ranking, and analysis tasks. From an algorithmic point of view, this translates into computing the best Steiner trees between the given nodes, a classical NP-hard problem. In this paper, we present a new approximation algorithm, coined STAR, for relationship queries over large graphs that do not fit into memory. We prove that for n query entities, STAR yields an O(log(n))-approximation of the optimal Steiner tree, and show that in practical cases the results returned by STAR are qualitatively better than the results returned by a classical 2-approximation algorithm. We then describe an extension to our algorithm to return the top-k Steiner trees. Finally, we evaluate our algorithm over both main-memory as well as completely disk-resident graphs containing millions of nodes. Our experiments show that STAR outperforms the best state-of-the returns qualitatively better results

    A la recherche des connaissances du Web...

    Get PDF
    International audienceLe Web contient une masse impressionnante de données, plus ou moins explicites et plus ou moins accessibles aux machines. Nous discutons ici des grandes tendances pour le management de ces données : l’extraction de connaissances du Web, l’enrichissement des connaissances par la communauté des internautes, leur représentation sous forme logique, et leur distribution à travers toutes les facettes du web. Nous allons montrer comment ces développements rendent les données sur le Web plus sémantiques, plus maniables par les machines, plus accessibles aux applications et donc finalement plus utiles pour l’humain

    TrainMiC, Training in Metrology in Chemistry.

    Get PDF
    Abstract not availableJRC.D-Institute for Reference Materials and Measurements (Geel

    A mineralogical record of ocean change: decadal and centennial patterns in the California mussel

    Get PDF
    Ocean acidification, a product of increasing atmospheric carbon dioxide, may already have affected calcified organisms in the coastal zone, such as bivalves and other shellfish. Understanding species’ responses to climate change requires the context of long-term dynamics. This can be particularly difficult given the longevity of many important species in contrast with the relatively rapid onset of environmental changes. Here, we present a unique archival dataset of mussel shells from a locale with recent environmental monitoring and historical climate reconstructions. We compare shell structure and composition in modern mussels, mussels from the 1970s, and mussel shells dating back to 1000–2420 years BP. Shell mineralogy has changed dramatically over the past 15 years, despite evidence for consistent mineral structure in the California mussel, Mytilus californianus, over the prior 2500 years. We present evidence for increased disorder in the calcium carbonate shells of mussels and greater variability between individuals. These changes in the last decade contrast markedly from a background of consistent shell mineralogy for centuries. Our results use an archival record of natural specimens to provide centennial-scale context for altered minerology and variability in shell features as a response to acidification stress and illustrate the utility of long-term studies and archival records in global change ecology. Increased variability between individuals is an emerging pattern in climate change responses, which may equally expose the vulnerability of organisms and the potential of populations for resilience

    Factors associated with comprehensive medication review completion rates: A national survey of community pharmacists

    Get PDF
    Background Completion rates for medication therapy management (MTM) services have been lower than desired and the Centers for Medicare and Medicaid Services has added MTM comprehensive medication review (CMR) completion rates as a Part D plan star measure. Over half of plans utilize community pharmacists via contracts with MTM vendors. Objectives The primary objective of this survey study was to identify factors associated with the CMR completion rates of community pharmacies contracted with a national MTM vendor. Methodsl Representatives from 27,560 pharmacy locations contracted with a national MTM vendor were surveyed. The dependent variable of interest was the pharmacies' CMR completion rate. Independent variables included the pharmacy's progressiveness stratum and number of CMRs assigned by the MTM vendor during the time period, as well as self-reported data to characterize MTM facilitators, barriers, delivery strategies, staffing, selected items from a modified Assessment of Chronic Illness Care, and pharmacist/pharmacy demographics. Univariate negative binomial models were fit for each independent variable, and variables significant at p < 0.05 were entered into a multivariable model. Results Representatives from 3836 (13.9%) pharmacy locations responded; of these, 90.9% (n = 3486) responses were useable. The median CMR completion rate was 0.42. Variables remaining significant at p < 0.05 in the multivariable model included: progressiveness strata; pharmacy type; scores on the facilitators scale; responses to two potential barriers items; scores on the patient/caregiver delivery strategies sub-scale; providing MTM at multiple locations; reporting that the MTM vendor sending the survey link is the primary MTM vendor for which the respondent provides MTM; and the number of hours per week that the pharmacy is open. Conclusions Factors at the respondent (e.g., responses to facilitators scale) and pharmacy (e.g., pharmacy type) levels were associated with CMR completion rates. These findings could be used by MTM stakeholders to improve CMR completion rates

    Multidimensional integration of RDF datasets

    Get PDF
    Data providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their integration. However, since each provider has their own data dictionary, identifying common concepts is not trivial and we require costly and complex entity resolution and transformation rules to perform such integration. In this paper, we propose a novel method, that given a set of independent RDF datasets, provides a multidimensional interpretation of these datasets and integrates them based on a common multidimensional space (if any) identified. To do so, our method first identifies potential dimensional and factual data on the input datasets and performs entity resolution to merge common dimensional and factual concepts. As a result, we generate a common multidimensional space and identify each input dataset as a cuboid of the resulting lattice. With such output, we are able to exploit open data with OLAP operators in a richer fashion than dealing with them separately.This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence-Doctoral College (IT4BI-DC) program.Peer ReviewedPostprint (author's final draft
    • …
    corecore