4 research outputs found
Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity
Vector embeddings have become ubiquitous tools for many language-related
tasks. A leading embedding model is OpenAI's text-ada-002 which can embed
approximately 6,000 words into a 1,536-dimensional vector. While powerful,
text-ada-002 is not open source and is only available via API. We trained a
simple neural network to convert open-source 768-dimensional MPNet embeddings
into text-ada-002 embeddings. We compiled a subset of 50,000 online food
reviews. We calculated MPNet and text-ada-002 embeddings for each review and
trained a simple neural network to for 75 epochs. The neural network was
designed to predict the corresponding text-ada-002 embedding for a given MPNET
embedding. Our model achieved an average cosine similarity of 0.932 on 10,000
unseen reviews in our held-out test dataset. We manually assessed the quality
of our predicted embeddings for vector search over text-ada-002-embedded
reviews. While not as good as real text-ada-002 embeddings, predicted
embeddings were able to retrieve highly relevant reviews. Our final model,
Vec2Vec, is lightweight (<80 MB) and fast. Future steps include training a
neural network with a more sophisticated architecture and a larger dataset of
paired embeddings to achieve greater performance. The ability to convert
between and align embedding spaces may be helpful for interoperability,
limiting dependence on proprietary models, protecting data privacy, reducing
costs, and offline operations.Comment: 14 pages, 6 figures, 5 table
Recommended from our members
Clinical Outcomes and Bacterial Characteristics of Carbapenem-Resistant Acinetobacter baumannii Among Patients from Different Global Regions
Abstract Background Carbapenem-resistant Acinetobacter baumannii (CRAb) is one of the most problematic antimicrobial-resistant bacteria. We sought to elucidate the international epidemiology and clinical impact of CRAb. Methods In a prospective observational cohort study, 842 hospitalized patients with a clinical CRAb culture were enrolled at 46 hospitals in five global regions between 2017 and 2019. The primary outcome was all-cause mortality at 30 days from the index culture. The strains underwent whole-genome analysis. Results Of 842 cases, 536 (64%) represented infection. By 30 days, 128 (24%) of the infected patients died, ranging from 1 (6%) of 18 in Australia-Singapore to 54 (25%) of 216 in the United States and 24 (49%) of 49 in South-Central America, whereas 42 (14%) of non-infected patients died. Bacteremia was associated with a higher risk of death compared with other types of infection (40 [42%] of 96 vs. 88 [20%] of 440). In a multivariable logistic regression analysis, bloodstream infection and higher age-adjusted Charlson comorbidity index were independently associated with 30-day mortality. Clonal group 2 (CG2) strains predominated except in South-Central America, ranging from 216 (59%) of 369 in the United States to 282 (97%) of 291 in China. Acquired carbapenemase genes were carried by 769 (91%) of the 842 isolates. CG2 strains were significantly associated with higher levels of meropenem resistance, yet non-CG2 cases were over-represented among the deaths compared with CG2 cases. Conclusions CRAb infection types and clinical outcomes differed significantly across regions. While CG2 strains remained predominant, non-CG2 strains were associated with higher mortality. ClinicalTrials.gov #NCT0364622