9 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Dust Particle Size Distribution Inversion Based on the Multi Population Genetic Algorithm

    No full text
    The aerosol number size distribution is the main parameter for characterizing aerosol optical properties and physical properties, it has a major influence on radiation forcing. With regard to some disadvantages in the traditional methods, a method based on the multi population genetic algorithm (MPGA) is proposed and employed to retrieve the aerosol size distribution of dust particles. The MPGA principles and design are presented in detail. The MPGA has better performance compared with conventional methods. In order to verify the feasibility of the inversion method, the measured aerosol optical thickness (AOT) data of dust particles taken by a sun photometer are used and a series of comparisons between the simple genetic algorithm (SGA) and MPGA are carried out. The results show that the MPGA presents better properties when compared with the SGA with smaller inversion errors, smaller population size and fewer generation numbers to retrieve the aerosol size distribution. The MPGA inversion method is analyzed using the background day, dust storm event and seasonal size distribution. The method proposed in this study has important applications and reference value for aerosol particle size distribution inversion

    Novel Simulation and Analysis of Mie-Scattering Lidar for Detecting Atmospheric Turbulence Based on Non-Kolmogorov Turbulence Power Spectrum Model

    No full text
    The Mie-scattering lidar can detect atmospheric turbulence intensity by using the return signals of Gaussian beams at different heights. The power spectrum method and Zernike polynomial method are used to simulate the non-Kolmogorov turbulent phase plate, respectively, and the power spectrum method with faster running speed is selected for the subsequent simulation. In order to verify the possibility of detecting atmospheric turbulence by the Mie-scattering lidar, some numerical simulations are carried out. The power spectrum method is used to simulate the propagation of the Gaussian beam from the Mie-scattering lidar in a vertical path. The propagation characteristics of the Gaussian beam using a non-Kolmogorov turbulence model are obtained by analyzing the intensity distribution and spot drift effect. The simulation results show that the scintillation index of simulation is consistent with the theoretical value trend, and the accuracy is very high, indicating that the method of atmospheric turbulence detection using Mie-scattering lidar is effective. The simulation plays a guiding role for the subsequent experimental platform construction and equipment design

    A Novel Lidar Signal-Denoising Algorithm Based on Sparrow Search Algorithm for Optimal Variational Modal Decomposition

    No full text
    Atmospheric lidar is susceptible to the influence of light attenuation, sky background light, and detector dark currents during the detection process. This results in a large amount of noise in the lidar return signal. To reduce noise and extract a useful signal, a novel denoising method combined with variational modal decomposition (VMD), the sparrow search algorithm (SSA) and singular value decomposition (SVD) is proposed. The SSA is used to optimize the number of decomposition layers K and the quadratic penalty factor α values of the VMD algorithm. Some intrinsic mode function (IMF) components obtained from the VMD-SSA decomposition are grouped and reconstructed according to the interrelationship number selection criterion. Then, the reconstructed signal is further denoised by combining the strong noise-reduction ability of SVD to obtain a clean lidar return signal. To verify the effectiveness of the VMD-SSA-SVD method, the method is compared and analysed with wavelet packet decomposition, empirical modal decomposition (EMD), ensemble empirical modal decomposition (EEMD), and adaptive noise-complete ensemble empirical modal decomposition (CEEMD), and its noise-reduction effect is considerably improved over that of the other four methods. The method can eliminate the complex noise in the lidar return signal while retaining all the details of the signal. The signal is not distorted, the waveform is smoother, and far-field noise interference can be suppressed. The denoised signal is closer to the real signal with higher accuracy, which shows the feasibility and the practicality of the proposed method

    A Novel Lidar Signal-Denoising Algorithm Based on Sparrow Search Algorithm for Optimal Variational Modal Decomposition

    No full text
    Atmospheric lidar is susceptible to the influence of light attenuation, sky background light, and detector dark currents during the detection process. This results in a large amount of noise in the lidar return signal. To reduce noise and extract a useful signal, a novel denoising method combined with variational modal decomposition (VMD), the sparrow search algorithm (SSA) and singular value decomposition (SVD) is proposed. The SSA is used to optimize the number of decomposition layers K and the quadratic penalty factor α values of the VMD algorithm. Some intrinsic mode function (IMF) components obtained from the VMD-SSA decomposition are grouped and reconstructed according to the interrelationship number selection criterion. Then, the reconstructed signal is further denoised by combining the strong noise-reduction ability of SVD to obtain a clean lidar return signal. To verify the effectiveness of the VMD-SSA-SVD method, the method is compared and analysed with wavelet packet decomposition, empirical modal decomposition (EMD), ensemble empirical modal decomposition (EEMD), and adaptive noise-complete ensemble empirical modal decomposition (CEEMD), and its noise-reduction effect is considerably improved over that of the other four methods. The method can eliminate the complex noise in the lidar return signal while retaining all the details of the signal. The signal is not distorted, the waveform is smoother, and far-field noise interference can be suppressed. The denoised signal is closer to the real signal with higher accuracy, which shows the feasibility and the practicality of the proposed method

    Novel Inversion Algorithm for the Atmospheric Aerosol Extinction Coefficient Based on an Improved Genetic Algorithm

    No full text
    As an important atmospheric component, aerosols play a very important role in the radiation budget balance of the earth–atmosphere system. To study the optical characteristics of aerosols, it is necessary to use an inversion algorithm to process the lidar return signal to obtain both the aerosol extinction coefficient and the backscattering coefficient. However, the lidar return power equation is ill-conditioned and contains two unknown parameters, meaning that traditional inversion algorithms must be solved by adopting certain assumptions (e.g., a uniform atmosphere and the lidar ratio), which to a certain extent can seriously affect the inversion accuracy. Here, to improve the accuracy of the aerosol extinction coefficient inversion, an inversion method based on an improved genetic algorithm is proposed. Using the U.S. Standard Atmosphere model and the return power equation, the aerosol extinction coefficient and the backscattering coefficient are independent variables that randomly provide initial values to simulate the theoretical lidar power. Then, the genetic algorithm is used to approximate the theoretical lidar power to the measured lidar return power with height; when the two are infinitely close, the values of the corresponding two independent variables (i.e., the extinction and backscattering coefficients) are inverted. Experiments performed to compare the different effects between a simple genetic algorithm and the improved genetic algorithm showed the proposed method capable of inverting the aerosol extinction coefficient without reliance on traditional inversion methods, representing a novel approach to the inversion of the aerosol extinction coefficient and the backscattering coefficient

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press
    corecore