7 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    The seeds of divergence: the economy of French North America, 1688 to 1760

    Get PDF
    Generally, Canada has been ignored in the literature on the colonial origins of divergence with most of the attention going to the United States. Late nineteenth century estimates of income per capita show that Canada was relatively poorer than the United States and that within Canada, the French and Catholic population of Quebec was considerably poorer. Was this gap long standing? Some evidence has been advanced for earlier periods, but it is quite limited and not well-suited for comparison with other societies. This thesis aims to contribute both to Canadian economic history and to comparative work on inequality across nations during the early modern period. With the use of novel prices and wages from Quebec—which was then the largest settlement in Canada and under French rule—a price index, a series of real wages and a measurement of Gross Domestic Product (GDP) are constructed. They are used to shed light both on the course of economic development until the French were defeated by the British in 1760 and on standards of living in that colony relative to the mother country, France, as well as the American colonies. The work is divided into three components. The first component relates to the construction of a price index. The absence of such an index has been a thorn in the side of Canadian historians as it has limited the ability of historians to obtain real values of wages, output and living standards. This index shows that prices did not follow any trend and remained at a stable level. However, there were episodes of wide swings—mostly due to wars and the monetary experiment of playing card money. The creation of this index lays the foundation of the next component. The second component constructs a standardized real wage series in the form of welfare ratios (a consumption basket divided by nominal wage rate multiplied by length of work year) to compare Canada with France, England and Colonial America. Two measures are derived. The first relies on a “bare bones” definition of consumption with a large share of land-intensive goods. This measure indicates that Canada was poorer than England and Colonial America and not appreciably richer than France. However, this measure overestimates the relative position of Canada to the Old World because of the strong presence of land-intensive goods. A second measure is created using a “respectable” definition of consumption in which the basket includes a larger share of manufactured goods and capital-intensive goods. This second basket better reflects differences in living standards since the abundance of land in Canada (and Colonial America) made it easy to achieve bare subsistence, but the scarcity of capital and skilled labor made the consumption of luxuries and manufactured goods (clothing, lighting, imported goods) highly expensive. With this measure, the advantage of New France over France evaporates and turns slightly negative. In comparison with Britain and Colonial America, the gap widens appreciably. This element is the most important for future research. By showing a reversal because of a shift to a different type of basket, it shows that Old World and New World comparisons are very sensitive to how we measure the cost of living. Furthermore, there are no sustained improvements in living standards over the period regardless of the measure used. Gaps in living standards observed later in the nineteenth century existed as far back as the seventeenth century. In a wider American perspective that includes the Spanish colonies, Canada fares better. The third component computes a new series for Gross Domestic Product (GDP). This is to avoid problems associated with using real wages in the form of welfare ratios which assume a constant labor supply. This assumption is hard to defend in the case of Colonial Canada as there were many signs of increasing industriousness during the eighteenth and nineteenth centuries. The GDP series suggest no long-run trend in living standards (from 1688 to circa 1765). The long peace era of 1713 to 1740 was marked by modest economic growth which offset a steady decline that had started in 1688, but by 1760 (as a result of constant warfare) living standards had sunk below their 1688 levels. These developments are accompanied by observations that suggest that other indicators of living standard declined. The flat-lining of incomes is accompanied by substantial increases in the amount of time worked, rising mortality and rising infant mortality. In addition, comparisons of incomes with the American colonies confirm the results obtained with wages— Canada was considerably poorer. At the end, a long conclusion is provides an exploratory discussion of why Canada would have diverged early on. In structural terms, it is argued that the French colony was plagued by the problem of a small population which prohibited the existence of scale effects. In combination with the fact that it was dispersed throughout the territory, the small population of New France limited the scope for specialization and economies of scale. However, this problem was in part created, and in part aggravated, by institutional factors like seigneurial tenure. The colonial origins of French America’s divergence from the rest of North America are thus partly institutional

    The Seeds of Divergence: The Economy of French North America, 1688 to 1760

    Full text link

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press

    Whole-genome sequencing reveals host factors underlying critical COVID-19

    No full text
    Altres ajuts: Department of Health and Social Care (DHSC); Illumina; LifeArc; Medical Research Council (MRC); UKRI; Sepsis Research (the Fiona Elizabeth Agnew Trust); the Intensive Care Society, Wellcome Trust Senior Research Fellowship (223164/Z/21/Z); BBSRC Institute Program Support Grant to the Roslin Institute (BBS/E/D/20002172, BBS/E/D/10002070, BBS/E/D/30002275); UKRI grants (MC_PC_20004, MC_PC_19025, MC_PC_1905, MRNO2995X/1); UK Research and Innovation (MC_PC_20029); the Wellcome PhD training fellowship for clinicians (204979/Z/16/Z); the Edinburgh Clinical Academic Track (ECAT) programme; the National Institute for Health Research, the Wellcome Trust; the MRC; Cancer Research UK; the DHSC; NHS England; the Smilow family; the National Center for Advancing Translational Sciences of the National Institutes of Health (CTSA award number UL1TR001878); the Perelman School of Medicine at the University of Pennsylvania; National Institute on Aging (NIA U01AG009740); the National Institute on Aging (RC2 AG036495, RC4 AG039029); the Common Fund of the Office of the Director of the National Institutes of Health; NCI; NHGRI; NHLBI; NIDA; NIMH; NINDS.Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care or hospitalization after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease
    corecore