8,718 research outputs found
Special Libraries, December 1964
Volume 55, Issue 10https://scholarworks.sjsu.edu/sla_sl_1964/1009/thumbnail.jp
Special Libraries, December 1964
Volume 55, Issue 10https://scholarworks.sjsu.edu/sla_sl_1964/1009/thumbnail.jp
Classification and Retrieval of Digital Pathology Scans: A New Dataset
In this paper, we introduce a new dataset, \textbf{Kimia Path24}, for image
classification and retrieval in digital pathology. We use the whole scan images
of 24 different tissue textures to generate 1,325 test patches of size
10001000 (0.5mm0.5mm). Training data can be generated according
to preferences of algorithm designer and can range from approximately 27,000 to
over 50,000 patches if the preset parameters are adopted. We propose a compound
patch-and-scan accuracy measurement that makes achieving high accuracies quite
challenging. In addition, we set the benchmarking line by applying LBP,
dictionary approach and convolutional neural nets (CNNs) and report their
results. The highest accuracy was 41.80\% for CNN.Comment: Accepted for presentation at Workshop for Computer Vision for
Microscopy Image Analysis (CVMI 2017) @ CVPR 2017, Honolulu, Hawai
An exploratory study of user-centered indexing of published biomedical images
User-centered image indexing—often reported in research on collaborative tagging, social classification, folksonomy, or personal tagging—has received a considerable amount of attention [1-7]. The general themes in more recent studies on this topic include user-centered tagging behavior by types of images, pros and cons of user-created tags as compared to controlled index terms; assessment of the value added by user-generated tags, and comparison of automatic indexing versus human indexing in the context of web digital image collections such as Flickr. For instance, Golbeck\u27s finding restates the importance of indexer experience, order, and type of images [8]. Rorissa has found a significant difference in the number of terms assigned when using Flickr tags or index terms on the same image collection, which might suggest a difference in level of indexing by professional indexers and Flickr taggers [9]. Studies focusing on users and their tagging experiences and user-generated tags suggest ideas to be implemented as part of a personalized, customizable tagging system. Additionally, Stvilia and her colleagues have found that tagger age and image familiarity are negatively related, while indexing and tagging experience were positively associated [10]. A major question for biomedical image indexing is whether the results of the aforementioned studies, all of which dealt with general image collections, are applicable to images in the medical domain. In spite of the importance of visual material in medical education and the prevalence of digitized images in formal medical practice and education, medical students have few opportunities to annotate biomedical images. End-user training could improve the quality of image indexing and so improve retrieval. In a pilot assessment of image indexing and retrieval quality by medical students, this study compared concept completion and retrieval effectiveness of indexing terms generated by medical students on thirty-nine histology images selected from the PubMed Central (PMC) database. Indexing instruction was only given to an intervention group to test its impact on the quality of end-user image indexing
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection
Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval system
- …