4,221 research outputs found

    Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimization

    Get PDF
    Now a day, health information management and utilization is the demanding task to health informaticians for delivering the eminence healthcare services. Extracting the similar cases from the case database can aid the doctors to recognize the same kind of patients and their treatment details. Accordingly, this paper introduces the method called H-BCF for retrieving the similar cases from the case database. Initially, the patient’s case database is constructed with details of different patients and their treatment details. If the new patient comes for treatment, then the doctor collects the information about that patient and sends the query to the H-BCF. The H-BCF system matches the input query with the patient’s case database and retrieves the similar cases. Here, the PSO algorithm is used with the BCF for retrieving the most similar cases from the patient’s case database. Finally, the Doctor gives treatment to the new patient based on the retrieved cases. The performance of the proposed method is analyzed with the existing methods, such as PESM, FBSO-neural network, and Hybrid model for the performance measures accuracy and F-Measure. The experimental results show that the proposed method attains the higher accuracy of 99.5% and the maximum F-Measure of 99% when compared to the existing methods

    Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

    Full text link
    Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our system is highly dependent on the quality of the translation of technical terms. However, the technical term translation is still problematic in that technical terms are often compound words, and thus new terms are progressively created by combining existing base words. In addition, Japanese often represents loanwords based on its special phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we produce a Japanese/English dictionary for base words, and translate compound words on a word-by-word basis. We also use a probabilistic method to resolve translation ambiguity. For the second problem, we use a transliteration method, which corresponds words unlisted in the base word dictionary to their phonetic equivalents in the target language. We evaluate our system using a test collection for CLIR, and show that both the compound word translation and transliteration methods improve the system performance

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Boolean versus ranked querying for biomedical systematic reviews

    Get PDF
    Background: The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. Methods: We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. Results: Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. Conclusions: Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing

    Hybrid Query Expansion on Ontology Graph in Biomedical Information Retrieval

    Get PDF
    Nowadays, biomedical researchers publish thousands of papers and journals every day. Searching through biomedical literature to keep up with the state of the art is a task of increasing difficulty for many individual researchers. The continuously increasing amount of biomedical text data has resulted in high demands for an efficient and effective biomedical information retrieval (BIR) system. Though many existing information retrieval techniques can be directly applied in BIR, BIR distinguishes itself in the extensive use of biomedical terms and abbreviations which present high ambiguity. First of all, we studied a fundamental yet simpler problem of word semantic similarity. We proposed a novel semantic word similarity algorithm and related tools called Weighted Edge Similarity Tools (WEST). WEST was motivated by our discovery that humans are more sensitive to the semantic difference due to the categorization than that due to the generalization/specification. Unlike most existing methods which model the semantic similarity of words based on either the depth of their Lowest Common Ancestor (LCA) or the traversal distance of between the word pair in WordNet, WEST also considers the joint contribution of the weighted distance between two words and the weighted depth of their LCA in WordNet. Experiments show that weighted edge based word similarity method has achieved 83.5% accuracy to human judgments. Query expansion problem can be viewed as selecting top k words which have the maximum accumulated similarity to a given word set. It has been proved as an effective method in BIR and has been studied for over two decades. However, most of the previous researches focus on only one controlled vocabulary: MeSH. In addition, early studies find that applying ontology won\u27t necessarily improve searching performance. In this dissertation, we propose a novel graph based query expansion approach which is able to take advantage of the global information from multiple controlled vocabularies via building a biomedical ontology graph from selected vocabularies in Metathesaurus. We apply Personalized PageRank algorithm on the ontology graph to rank and identify top terms which are highly relevant to the original user query, yet not presented in that query. Those new terms are reordered by a weighted scheme to prioritize specialized concepts. We multiply a scaling factor to those final selected terms to prevent query drifting and append them to the original query in the search. Experiments show that our approach achieves 17.7% improvement in 11 points average precision and recall value against Lucene\u27s default indexing and searching strategy and by 24.8% better against all the other strategies on average. Furthermore, we observe that expanding with specialized concepts rather than generalized concepts can substantially improve the recall-precision performance. Furthermore, we have successfully applied WEST from the underlying WordNet graph to biomedical ontology graph constructed by multiple controlled vocabularies in Metathesaurus. Experiments indicate that WEST further improve the recall-precision performance. Finally, we have developed a Graph-based Biomedical Search Engine (G-Bean) for retrieving and visualizing information from literature using our proposed query expansion algorithm. G-Bean accepts any medical related user query and processes them with expanded medical query to search for the MEDLINE database

    Constructing a biodiversity terminological inventory.

    Get PDF
    The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications
    • …
    corecore