2,083 research outputs found
The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain
Eliciting semantic similarity between concepts in the biomedical domain
remains a challenging task. Recent approaches founded on embedding vectors have
gained in popularity as they risen to efficiently capture semantic
relationships The underlying idea is that two words that have close meaning
gather similar contexts. In this study, we propose a new neural network model
named MeSH-gram which relies on a straighforward approach that extends the
skip-gram neural network model by considering MeSH (Medical Subject Headings)
descriptors instead words. Trained on publicly available corpus PubMed MEDLINE,
MeSH-gram is evaluated on reference standards manually annotated for semantic
similarity. MeSH-gram is first compared to skip-gram with vectors of size 300
and at several windows contexts. A deeper comparison is performed with tewenty
existing models. All the obtained results of Spearman's rank correlations
between human scores and computed similarities show that MeSH-gram outperforms
the skip-gram model, and is comparable to the best methods but that need more
computation and external resources.Comment: 6 pages, 2 table
Content Based Image Retrieval (CBIR) in Remote Clinical Diagnosis and Healthcare
Content-Based Image Retrieval (CBIR) locates, retrieves and displays images
alike to one given as a query, using a set of features. It demands accessible
data in medical archives and from medical equipment, to infer meaning after
some processing. A problem similar in some sense to the target image can aid
clinicians. CBIR complements text-based retrieval and improves evidence-based
diagnosis, administration, teaching, and research in healthcare. It facilitates
visual/automatic diagnosis and decision-making in real-time remote
consultation/screening, store-and-forward tests, home care assistance and
overall patient surveillance. Metrics help comparing visual data and improve
diagnostic. Specially designed architectures can benefit from the application
scenario. CBIR use calls for file storage standardization, querying procedures,
efficient image transmission, realistic databases, global availability, access
simplicity, and Internet-based structures. This chapter recommends important
and complex aspects required to handle visual content in healthcare.Comment: 28 pages, 6 figures, Book Chapter from "Encyclopedia of E-Health and
Telemedicine
Hybrid Query Expansion on Ontology Graph in Biomedical Information Retrieval
Nowadays, biomedical researchers publish thousands of papers and journals every day. Searching through biomedical literature to keep up with the state of the art is a task of increasing difficulty for many individual researchers. The continuously increasing amount of biomedical text data has resulted in high demands for an efficient and effective biomedical information retrieval (BIR) system. Though many existing information retrieval techniques can be directly applied in BIR, BIR distinguishes itself in the extensive use of biomedical terms and abbreviations which present high ambiguity. First of all, we studied a fundamental yet simpler problem of word semantic similarity. We proposed a novel semantic word similarity algorithm and related tools called Weighted Edge Similarity Tools (WEST). WEST was motivated by our discovery that humans are more sensitive to the semantic difference due to the categorization than that due to the generalization/specification. Unlike most existing methods which model the semantic similarity of words based on either the depth of their Lowest Common Ancestor (LCA) or the traversal distance of between the word pair in WordNet, WEST also considers the joint contribution of the weighted distance between two words and the weighted depth of their LCA in WordNet. Experiments show that weighted edge based word similarity method has achieved 83.5% accuracy to human judgments. Query expansion problem can be viewed as selecting top k words which have the maximum accumulated similarity to a given word set. It has been proved as an effective method in BIR and has been studied for over two decades. However, most of the previous researches focus on only one controlled vocabulary: MeSH. In addition, early studies find that applying ontology won\u27t necessarily improve searching performance. In this dissertation, we propose a novel graph based query expansion approach which is able to take advantage of the global information from multiple controlled vocabularies via building a biomedical ontology graph from selected vocabularies in Metathesaurus. We apply Personalized PageRank algorithm on the ontology graph to rank and identify top terms which are highly relevant to the original user query, yet not presented in that query. Those new terms are reordered by a weighted scheme to prioritize specialized concepts. We multiply a scaling factor to those final selected terms to prevent query drifting and append them to the original query in the search. Experiments show that our approach achieves 17.7% improvement in 11 points average precision and recall value against Lucene\u27s default indexing and searching strategy and by 24.8% better against all the other strategies on average. Furthermore, we observe that expanding with specialized concepts rather than generalized concepts can substantially improve the recall-precision performance. Furthermore, we have successfully applied WEST from the underlying WordNet graph to biomedical ontology graph constructed by multiple controlled vocabularies in Metathesaurus. Experiments indicate that WEST further improve the recall-precision performance. Finally, we have developed a Graph-based Biomedical Search Engine (G-Bean) for retrieving and visualizing information from literature using our proposed query expansion algorithm. G-Bean accepts any medical related user query and processes them with expanded medical query to search for the MEDLINE database
Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials
CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania,There is a pressing need to exploit recent advances in natural language processing technologies, in
particular language models and deep learning approaches, to enable improved retrieval, classification
and ultimately access to information contained in multiple, heterogeneous types of documents. This is
particularly true for the field of biomedicine and clinical research, where medical experts and scientists
need to carry out complex search queries against a variety of document collections, including literature,
patents, clinical trials or other kind of content like EHRs. Indexing documents with structured controlled
vocabularies used for semantic search engines and query expansion purposes is a critical task for enabling
sophisticated user queries and even cross-language retrieval. Due to the complexity of the medical domain
and the use of very large hierarchical indexing terminologies, implementing efficient automatic systems
to aid manual indexing is extremely difficult. This paper provides a summary of the MESINESP task
results on medical semantic indexing in Spanish (BioASQ/ CLEF 2021 Challenge). MESINESP was carried
out in direct collaboration with literature content databases and medical indexing experts using the DeCS
vocabulary, a similar resource as MeSH terms. Seven participating teams used advanced technologies
including extreme multilabel classification and deep language models to solve this challenge which can
be viewed as a multi-label classification problem. MESINESP resources, we have released a Gold Standard
collection of 243,000 documents with a total of 2179 manual annotations divided in train, development
and test subsets covering literature, patents as well as clinical trial summaries, under a cross-genre
training and data labeling scenario. Manual indexing of the evaluation subsets was carried out by three
independent experts using a specially developed indexing interface called ASIT. Additionally, we have
published a collection of large-scale automatic semantic annotations based on NER systems of these
documents with mentions of drugs/medications (170,000), symptoms (137,000), diseases (840,000) and
clinical procedures (415,000). In addition to a summary of the used technologies by the teams, this paperS
Next Generation of Product Search and Discovery
Online shopping has become an important part of people’s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumers’ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize users’ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users.
This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the user’s overhead in locating the information of value is reduced, and the user’s experience of seeking for useful product information is optimized
Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval.
Describing visual image contents by semantic concepts is an effective and straightforward way to facilitate various high level applications. Inferring semantic concepts from low-level pictorial feature analysis is challenging due to the semantic gap problem, while manually labeling concepts is unwise because of a large number of images in both online and offline collections. In this paper, we present a novel approach to automatically generate intermediate image descriptors by exploiting concept co-occurrence patterns in the pre-labeled training set that renders it possible to depict complex scene images semantically. Our work is motivated by the fact that multiple concepts that frequently co-occur across images form patterns which could provide contextual cues for individual concept inference. We discover the co-occurrence patterns as hierarchical communities by graph modularity maximization in a network with nodes and edges representing concepts and co-occurrence relationships separately. A random walk process working on the inferred concept probabilities with the discovered co-occurrence patterns is applied to acquire the refined concept signature representation. Through experiments in automatic image annotation and semantic image retrieval on several challenging datasets, we demonstrate the effectiveness of the proposed concept co-occurrence patterns as well as the concept signature representation in comparison with state-of-the-art approaches
CONTENT BASED IMAGE RETRIEVAL (CBIR) SYSTEM
Advancement in hardware and telecommunication technology has boosted up creation
and distribution of digital visual content. However this rapid growth of visual content
creations has not been matched by the simultaneous emergence of technologies to support
efficient image analysis and retrieval. Although there are attempt to solve this problem by
using meta-data text annotation but this approach are not practical when it come to the
large number of data collection.
This system used 7 different feature vectors that are focusing on 3 main low level feature
groups (color, shape and texture). This system will use the image that the user feed and
search the similar images in the database that had similar feature by considering the
threshold value. One of the most important aspects in CBIR is to determine the correct
threshold value. Setting the correct threshold value is important in CBIR because setting
it too low will result in less image being retrieve that might exclude relevant data. Setting
to high threshold value might result in irrelevant data to be retrieved and increase the
search time for image retrieval.
Result show that this project able to increase the image accuracy to average 70% by
combining 7 different feature vector at correct threshold value.
ii
- …