8 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Tuberous sclerosis presenting as neonatal cyanosis because of rhabdomyoma causing tricuspid valve obstruction needing a Blalock-Taussig shunt

    No full text
    We report a newborn female baby who presented at 6 hours of age with cyanosis without any signs of respiratory distress. Cardiovascular and systemic examination was unremarkable apart from cyanosis (saturation 75%). An echocardiogram showed multiple echogenic and homogeneous masses in the interventricular septum, one of which was big and protruding through the tricuspid valve causing right ventricular inflow obstruction. There was a small atrial septal defect (ASD) shunting right to left and patent ductus arteriosus (PDA) shunting left to right. The provisional diagnosis was rhabdomyoma. Blalock-Taussig shunt was done to preserve the tricuspid valve, because these masses tend to regress spontaneously, which was the case after few months. Subsequently, the patient was diagnosed with tuberous sclerosis

    Complete mitochondrial genome sequence of Awassi-Jo sheep breed (Ovis aries) in Jordan

    No full text
    Using high-throughput sequencing technology, the complete mitochondrial genome of Awassi-Jo breed (Ovis aries) was decoded. Mitochondrial genome was 16,617 bp in length. The genome contained 37 genes (13 protein-coding, 22 tRNA, and 2 rRNA) and a control region (D-loop region). The genes were encoded on the H-strand, except for the ND6 gene and 8 tRNA genes, which were encoded on the L-strand. The GC content is 38.9%. Phylogenetic analysis was performed to compare Awassi-Jo with other sheep breeds. The phylogenetic tree showed that Awassi-Jo diverged earlier than related breeds (Turkey, Italy, Germany, and Netherland) with a common ancestor in haplogroup HB. The results revealed the importance of mitochondrial data in studying sheep evolution and domestication

    Malware detection using DNS records and domain name features

    Get PDF
    © 2018 ACM. As billions of people depend on Internet application to perform day to day tasks, the prevalent of malwares and online attacks cause a huge loss to global Internet economy prevalent. Domain name system is one of the core components of the Internet, which allows users to type in website names and resolves them to Internet addresses. Several studies proposed using DNS for malware detection, because it is the first step before visiting a specific website. Unfortunately, majority focused on malicious URLs back listing, botnets, top-level-domain, DNS and resolvers. This paper proposes a system to detect malicious domain names, by using eight unique features that accurately identify malicious websites before being visited.We implemented our approach of malicious domain names detection using Python, and experimented with five weeks of real-world data using Weka.The experimental results reports a 77.5% and low false positive rates 22.4%. That is very promising considering the approach detect website based on feature calculated based on URL and without downloading the file

    Complete chloroplast genome sequence of historical olive (Olea europaea subsp. europaea) cultivar Mehras, in Jordan

    No full text
    The complete chloroplast genome sequence of Olea europaea subsp. europaea cultivar Mehras was determined using high-throughput sequencing technology. Chloroplast genome was 155,897 bp in length, containing a pair of 25,742 bp inverted repeat (IR) regions, which were separated by large and small single-copy regions (LSC and SSC) of 86,622 and 17,791 bp, respectively. The chloroplast genome contained 130 genes (85 protein-coding, 37 tRNA, and eight rRNA). GC content was 37.8%. We performed phylogenetic analysis with other isolates. The analysis showed that O. e. subsp. europaea cultivar Mehras has an ancient common ancestor with cultivated olives in Italy, Spain, and Cyprus

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press
    corecore