4,156 research outputs found
EXPLOITING N-GRAM IMPORTANCE AND ADDITIONAL KNOWEDGE BASED ON WIKIPEDIA FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING
This paper provides a solution to the issue: “How can we use Wikipedia based concepts in document\ud
clustering with lesser human involvement, accompanied by effective improvements in result?” In the\ud
devised system, we propose a method to exploit the importance of N-grams in a document and use\ud
Wikipedia based additional knowledge for GAAC based document clustering. The importance of N-grams\ud
in a document depends on several features including, but not limited to: frequency, position of their\ud
occurrence in a sentence and the position of the sentence in which they occur, in the document. First, we\ud
introduce a new similarity measure, which takes the weighted N-gram importance into account, in the\ud
calculation of similarity measure while performing document clustering. As a result, the chances of topical similarity in clustering are improved. Second, we use Wikipedia as an additional knowledge base both, to remove noisy entries from the extracted N-grams and to reduce the information gap between N-grams that are conceptually-related, which do not have a match owing to differences in writing scheme or strategies. Our experimental results on the publicly available text dataset clearly show that our devised system has a significant improvement in performance over bag-of-words based state-of-the-art systems in this area
Deriving query suggestions for site search
Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T
Deep Lesion Graphs in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database
Radiologists in their daily work routinely find and annotate significant
abnormalities on a large number of radiology images. Such abnormalities, or
lesions, have collected over years and stored in hospitals' picture archiving
and communication systems. However, they are basically unsorted and lack
semantic annotations like type and location. In this paper, we aim to organize
and explore them by learning a deep feature representation for each lesion. A
large-scale and comprehensive dataset, DeepLesion, is introduced for this task.
DeepLesion contains bounding boxes and size measurements of over 32K lesions.
To model their similarity relationship, we leverage multiple supervision
information including types, self-supervised location coordinates and sizes.
They require little manual annotation effort but describe useful attributes of
the lesions. Then, a triplet network is utilized to learn lesion embeddings
with a sequential sampling strategy to depict their hierarchical similarity
structure. Experiments show promising qualitative and quantitative results on
lesion retrieval, clustering, and classification. The learned embeddings can be
further employed to build a lesion graph for various clinically useful
applications. We propose algorithms for intra-patient lesion matching and
missing annotation mining. Experimental results validate their effectiveness.Comment: Accepted by CVPR2018. DeepLesion url adde
- …