23 research outputs found
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Spectral and atmospheric characterization of 51 Eridani b using VLT/SPHERE
51 Eridani b is an exoplanet around a young (20 Myr) nearby (29.4 pc) F0-type
star, recently discovered by direct imaging. Being only 0.5" away from its host
star it is well suited for spectroscopic analysis using integral field
spectrographs. We aim to refine the atmospheric properties of this and to
further constrain the architecture of the system by searching for additional
companions. Using the SPHERE instrument at the VLT we extend the spectral
coverage of the planet to the complete Y- to H-band range and provide
photometry in the K12-bands (2.11, 2.25 micron). The object is compared to
other cool and peculiar dwarfs. Furthermore, the posterior probability
distributions of cloudy and clear atmospheric models are explored using MCMC.
We verified our methods by determining atmospheric parameters for the two
benchmark brown dwarfs Gl 570D and HD 3651B. For probing the innermost region
for additional companions, archival VLT-NACO (L') SAM data is used. We present
the first spectrophotometric measurements in the Y- and K-bands for the planet
and revise its J-band flux to values 40% fainter than previous measurements.
Cloudy models with uniform cloud coverage provide a good match to the data. We
derive the temperature, radius, surface gravity, metallicity and cloud
sedimentation parameter f_sed. We find that the atmosphere is highly
super-solar (Fe/H~1.0) with an extended, thick cloud cover of small particles.
The model radius and surface gravity suggest planetary masses of about 9 M_jup.
The evolutionary model only provides a lower mass limit of >2 M_jup (for pure
hot-start). The cold-start model cannot explain the planet's luminosity. The
SPHERE and NACO/SAM detection limits probe the 51 Eri system at Solar System
scales and exclude brown-dwarf companions more massive than 20 M_jup beyond
separations of ~2.5 au and giant planets more massive than 2 M_jup beyond 9 au.Comment: 29 pages, 31 figures, accepted for publication in A&