5 research outputs found
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
A neural network algorithm for queue length estimation based on the concept of k-leader connected vehicles
This paper presents a novel method to estimate queue length at signalised intersections using connected vehicle (CV) data. The proposed queue length estimation method does not depend on any conventional information such as arrival flow rate and parameters pertaining to traffic signal controllers. The model is applicable for real-time applications when there are sufficient training data available to train the estimation model. To this end, we propose the idea of “k-leader CVs” to be able to predict the queue which is propagated after the communication range of dedicated short-range communication (the communication platform used in CV system). The idea of k-leader CVs could reduce the risk of communication failure which is a serious concern in CV ecosystems. Furthermore, a linear regression model is applied to weigh the importance of input variables to be used in a neural network model. Vissim traffic simulator is employed to train and evaluate the effectiveness and robustness of the model under different travel demand conditions, a varying number of CVs (i.e. CVs’ market penetration rate) as well as various traffic signal control scenarios. As it is expected, when the market penetration rate increases, the accuracy of the model enhances consequently. In a congested traffic condition (saturated flow), the proposed model is more accurate compared to the undersaturated condition with the same market penetration rates. Although the proposed method does not depend on information of the arrival pattern and traffic signal control parameters, the results of the queue length estimation are still comparable with the results of the methods that highly depend on such information. The proposed algorithm is also tested using large size data from a CV test bed (i.e. Australian Integrated Multimodal Ecosystem) currently underway in Melbourne, Australia. The simulation results show that the model can perform well irrespective of the intersection layouts, traffic signal plans and arrival patterns of vehicles. Based on the numerical results, 20% penetration rate of CVs is a critical threshold. For penetration rates below 20%, prediction algorithms fail to produce reliable outcomes