49 research outputs found

    Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

    Full text link
    We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding. In particular, this distribution is supported over the contexts which co-occur with the entity and are embedded in a suitable low-dimensional space. This enables us to consider representation learning from the perspective of Optimal Transport and take advantage of its tools such as Wasserstein distance and barycenters. We elaborate how the method can be applied for obtaining unsupervised representations of text and illustrate the performance (quantitatively as well as qualitatively) on tasks such as measuring sentence similarity, word entailment and similarity, where we empirically observe significant gains (e.g., 4.1% relative improvement over Sent2vec, GenSen). The key benefits of the proposed approach include: (a) capturing uncertainty and polysemy via modeling the entities as distributions, (b) utilizing the underlying geometry of the particular task (with the ground cost), (c) simultaneously providing interpretability with the notion of optimal transport between contexts and (d) easy applicability on top of existing point embedding methods. The code, as well as prebuilt histograms, are available under https://github.com/context-mover/.Comment: AISTATS 2020. Also, accepted previously at ICLR 2019 DeepGenStruct Worksho

    Gene expression profiling reveals consistent differences between clinical samples of human leukaemias and their model cell lines

    Get PDF
    Microarray gene expression profiles of fresh clinical samples of chronic myeloid leukaemia in chronic phase, acute promyelocytic leukaemia and acute monocytic leukaemia were compared with profiles from cell lines representing the corresponding types of leukaemia (K562, NB4, HL60). In a hierarchical clustering analysis, all clinical samples clustered separately from the cell lines, regardless of leukaemic subtype. Gene ontology analysis showed that cell lines chiefly overexpressed genes related to macromolecular metabolism, whereas in clinical samples genes related to the immune response were abundantly expressed. These findings must be taken into consideration when conclusions from cell line-based studies are extrapolated to patients

    Characterization of molecular scores and gene expression signatures in primary breast cancer, local recurrences and brain metastases.

    Get PDF
    BACKGROUND Breast cancer is a leading cause of cancer-related death in women worldwide. Despite extensive studies in all areas of basic, clinical and applied research, accurate prognosis remains elusive, thus leading to overtreatment of many patients. Diagnosis could be improved by introducing multigene molecular scores in standard clinical practice. Several tests that work with formalin-fixed tissue have become routine. Molecular scores usually include several genes representing processes, response to oestrogens, progestogens and human epidermal growth factor receptor 2 (Her2), respectively, which are combined additively in single values. These multi-gene scores have the advantage of being more robust and reproducible than single-gene scores. Their utility may be further enhanced by combining them with classical diagnostic parameters. Here, we present an exploratory study comparing the RISK and research versions of Oncotype DX recurrence score (RS), Prosigna Risk of Recurrence (ROR) and EndoPredict (EP) with respect to their prognostic potential for ipsilateral recurrence and/or distant relapse in brain, and we compared the scores to the intrinsic subtypes based on PAM50. METHODS RNA was extracted from formalin-fixed, paraffin-embedded (FFPE) tissue cores of primary tumours, local recurrences and brain metastases. Gene expression was measured on a NanoString nCounter Analysis System. Intrinsic subtypes and molecular scores were computed according to published literature and RISK, RS, ROR and EP were compared against each other and to the intrinsic subtypes Luminal A (lumA), Luminal B (lumB), Her2-enriched (Her2↑), Basal-like (basal), and Normal-like (normal) of PAM50. Local recurrences and brain metastases were compared to their corresponding primary tumours. RESULTS All four molecular scores were highly correlated. Highest correlations were observed among genes related to proliferation while lower correlations were found among oestrogen-related genes. The scores were significantly higher in primary tumours progressing to brain metastases as compared to recurrence-free primary tumours and primary tumours that relapsed as local recurrences. CONCLUSIONS RISK and ROR-P are prognostic for primary tumours metastasizing to the brain. All four scores, RISK, RS, EP and ROR-P failed to discriminate between primary tumours that remained recurrence-free and primary tumours relapsing as local recurrences

    MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

    Full text link
    Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs

    Gene expression variation between distinct areas of breast cancer measured from paraffin-embedded tissue cores

    Get PDF
    BACKGROUND: Diagnosis and prognosis in breast cancer are mainly based on histology and immunohistochemistry of formalin-fixed, paraffin-embedded (FFPE) material. Recently, gene expression analysis was shown to elucidate the biological variance between tumors and molecular markers were identified that led to new classification systems that provided better prognostic and predictive parameters. Archived FFPE samples represent an ideal source of tissue for translational research, as millions of tissue blocks exist from routine diagnostics and from clinical studies. These should be exploited to provide clinicians with more accurate prognostic and predictive information. Unfortunately, RNA derived from FFPE material is partially degraded and chemically modified and reliable gene expression measurement has only become successful after implementing novel and optimized procedures for RNA isolation, demodification and detection. METHODS: In this study we used tissue cylinders as known from the construction of tissue microarrays. RNA was isolated with a robust protocol recently developed for RNA derived from FFPE material. Gene expression was measured by quantitative reverse transcription PCR. RESULTS: Sixteen tissue blocks from 7 patients diagnosed with multiple histological subtypes of breast cancer were available for this study. After verification of appropriate localization, sufficient RNA yield and quality, 30 tissue cores were available for gene expression measurement on TaqMan(R) Low Density Arrays (16 invasive ductal carcinoma (IDC), 8 ductal carcinoma in situ (DCIS) and 6 normal tissue), and 14 tissue cores were lost. Gene expression values were used to calculate scores representing the proliferation status (PRO), the estrogen receptor status and the HER2 status. The PRO scores measured from entire sections were similar to PRO scores determined from IDC tissue cores. Scores determined from normal tissue cores consistently revealed lower PRO scores than cores derived from IDC or DCIS of the same block or from different blocks of the same patient. CONCLUSION: We have developed optimized protocols for RNA isolation from histologically distinct areas. RNA prepared from FFPE tissue cores is suitable for gene expression measurement by quantitative PCR. Distinct molecular scores could be determined from different cores of the same tumor specimen

    City branding as economic necessity

    Get PDF
    Kvalitetno brendiranje grada je preduvjet za njihovu prepoznatljivost, kvalitetno pozicioniranje i stvaranje dodatne vrijednosti. Praksa i mnogobrojni primjeri potvrđuju ispravnost ove teze. Brendiranje gradova je nužno kako bi se pojačala konkurentnost, ostvarila veća dobit i osigurao razvoj mjesta. No ne radi se samo o ekonomskim kategorijama jer se pod razvojem mjesta podrazumijevaju i pozitivna demografska kretanja, obogaćivanje kulturnih sadržaja kao i drugih činitelja koji podižu ukupnu kvalitetu života. Izazov je to i nužnost i za gradove u Hrvatskoj kako bi bili konkurentni u oštroj tržišnoj konkurenciji.Quality city branding is a precondition for their recognazibility, quality positionig and creating of added value. Practice and numerous examples confirm correction of this theses. City branding is necessary to enhance concurence, gain bigger profit and ensure place development. But this is not only about economic categories because under place development it is understandable alsto positive demographic movement, enrichment of cultural contens as well as other factors which raise total quality of life. This is as well a challenge as it is a necessity for cities in Croatia so they could be concurente in harsh economy concurence

    Global age-sex-specific mortality, life expectancy, and population estimates in 204 countries and territories and 811 subnational locations, 1950–2021, and the impact of the COVID-19 pandemic: a comprehensive demographic analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Estimates of demographic metrics are crucial to assess levels and trends of population health outcomes. The profound impact of the COVID-19 pandemic on populations worldwide has underscored the need for timely estimates to understand this unprecedented event within the context of long-term population health trends. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 provides new demographic estimates for 204 countries and territories and 811 additional subnational locations from 1950 to 2021, with a particular emphasis on changes in mortality and life expectancy that occurred during the 2020–21 COVID-19 pandemic period. Methods: 22 223 data sources from vital registration, sample registration, surveys, censuses, and other sources were used to estimate mortality, with a subset of these sources used exclusively to estimate excess mortality due to the COVID-19 pandemic. 2026 data sources were used for population estimation. Additional sources were used to estimate migration; the effects of the HIV epidemic; and demographic discontinuities due to conflicts, famines, natural disasters, and pandemics, which are used as inputs for estimating mortality and population. Spatiotemporal Gaussian process regression (ST-GPR) was used to generate under-5 mortality rates, which synthesised 30 763 location-years of vital registration and sample registration data, 1365 surveys and censuses, and 80 other sources. ST-GPR was also used to estimate adult mortality (between ages 15 and 59 years) based on information from 31 642 location-years of vital registration and sample registration data, 355 surveys and censuses, and 24 other sources. Estimates of child and adult mortality rates were then used to generate life tables with a relational model life table system. For countries with large HIV epidemics, life tables were adjusted using independent estimates of HIV-specific mortality generated via an epidemiological analysis of HIV prevalence surveys, antenatal clinic serosurveillance, and other data sources. Excess mortality due to the COVID-19 pandemic in 2020 and 2021 was determined by subtracting observed all-cause mortality (adjusted for late registration and mortality anomalies) from the mortality expected in the absence of the pandemic. Expected mortality was calculated based on historical trends using an ensemble of models. In location-years where all-cause mortality data were unavailable, we estimated excess mortality rates using a regression model with covariates pertaining to the pandemic. Population size was computed using a Bayesian hierarchical cohort component model. Life expectancy was calculated using age-specific mortality rates and standard demographic methods. Uncertainty intervals (UIs) were calculated for every metric using the 25th and 975th ordered values from a 1000-draw posterior distribution. Findings: Global all-cause mortality followed two distinct patterns over the study period: age-standardised mortality rates declined between 1950 and 2019 (a 62·8% [95% UI 60·5–65·1] decline), and increased during the COVID-19 pandemic period (2020–21; 5·1% [0·9–9·6] increase). In contrast with the overall reverse in mortality trends during the pandemic period, child mortality continued to decline, with 4·66 million (3·98–5·50) global deaths in children younger than 5 years in 2021 compared with 5·21 million (4·50–6·01) in 2019. An estimated 131 million (126–137) people died globally from all causes in 2020 and 2021 combined, of which 15·9 million (14·7–17·2) were due to the COVID-19 pandemic (measured by excess mortality, which includes deaths directly due to SARS-CoV-2 infection and those indirectly due to other social, economic, or behavioural changes associated with the pandemic). Excess mortality rates exceeded 150 deaths per 100 000 population during at least one year of the pandemic in 80 countries and territories, whereas 20 nations had a negative excess mortality rate in 2020 or 2021, indicating that all-cause mortality in these countries was lower during the pandemic than expected based on historical trends. Between 1950 and 2021, global life expectancy at birth increased by 22·7 years (20·8–24·8), from 49·0 years (46·7–51·3) to 71·7 years (70·9–72·5). Global life expectancy at birth declined by 1·6 years (1·0–2·2) between 2019 and 2021, reversing historical trends. An increase in life expectancy was only observed in 32 (15·7%) of 204 countries and territories between 2019 and 2021. The global population reached 7·89 billion (7·67–8·13) people in 2021, by which time 56 of 204 countries and territories had peaked and subsequently populations have declined. The largest proportion of population growth between 2020 and 2021 was in sub-Saharan Africa (39·5% [28·4–52·7]) and south Asia (26·3% [9·0–44·7]). From 2000 to 2021, the ratio of the population aged 65 years and older to the population aged younger than 15 years increased in 188 (92·2%) of 204 nations. Interpretation: Global adult mortality rates markedly increased during the COVID-19 pandemic in 2020 and 2021, reversing past decreasing trends, while child mortality rates continued to decline, albeit more slowly than in earlier years. Although COVID-19 had a substantial impact on many demographic indicators during the first 2 years of the pandemic, overall global health progress over the 72 years evaluated has been profound, with considerable improvements in mortality and life expectancy. Additionally, we observed a deceleration of global population growth since 2017, despite steady or increasing growth in lower-income countries, combined with a continued global shift of population age structures towards older ages. These demographic changes will likely present future challenges to health systems, economies, and societies. The comprehensive demographic estimates reported here will enable researchers, policy makers, health practitioners, and other key stakeholders to better understand and address the profound changes that have occurred in the global health landscape following the first 2 years of the COVID-19 pandemic, and longer-term trends beyond the pandemic
    corecore