8 research outputs found

    Design of optimal search engine using text summarization through artificial intelligence techniques

    Get PDF
    Natural language processing is the trending topic in the latest research areas, which allows the developers to create the human-computer interactions to come into existence. The natural language processing is an integration of artificial intelligence, computer science and computer linguistics. The research towards natural Language Processing is focused on creating innovations towards creating the devices or machines which operates basing on the single command of a human. It allows various Bot creations to innovate the instructions from the mobile devices to control the physical devices by allowing the speech-tagging. In our paper, we design a search engine which not only displays the data according to user query but also performs the detailed display of the content or topic user is interested for using the summarization concept. We find the designed search engine is having optimal response time for the user queries by analyzing with number of transactions as inputs. Also, the result findings in the performance analysis show that the text summarization method has been an efficient way for improving the response time in the search engine optimizations

    Clustering cliques for graph-based summarization of the biomedical research literature

    Get PDF
    BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively

    Text summarization in the biomedical domain: A systematic review of recent research

    Get PDF
    The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain

    Semantic annotation and summarization of biomedical text

    Get PDF
    Advancements in the biomedical community are largely documented and published in text format in scientific forums such as conference papers and journals. To address the scalability of utilizing the large volume of text-based information generated by continuing advances in the biomedical field, two complementary areas are studied. The first area is Semantic Annotation, which is a method for providing machineunderstandable information based on domain-specific resources. A novel semantic annotator, CONANN, is implemented for online matching of concepts defined by a biomedical metathesaurus. CONANN uses a multi-level filter based on both information retrieval and shallow natural language processing techniques. CONANN is evaluated against a state-of-the-art biomedical annotator using the performance measures of time (e.g. number of milliseconds per noun phrase) and precision/recall of the resulting concept matches. CONANN shows that annotation can be performed online, rather than offline, without a significant loss of precision and recall as compared to current offline systems. The second area of study is Text Summarization which is used as a way to perform data reduction of clinical trial texts while still describing the main themes of a biomedical document. The text summarization work is unique in that it focuses exclusively on summarizing biomedical full-text sources as opposed to abstracts, and also exclusively uses domain-specific concepts, rather than terms, to identify important information within a biomedical text. Two novel text summarization algorithms are implemented: one using a concept chaining method based on existing work in lexical chaining (BioChain), and the other using concept distribution to match important sentences between a source text and a generated summary (FreqDist). The BioChain and FreqDist summarizers are evaluated using the publicly-available ROUGE summary evaluation tool. ROUGE compares n-gram co-occurrences between a system summary and one or more model summaries. The text summarization evaluation shows that the two approaches outperform nearly all of the existing term-based approaches.Ph.D., Information Science and Technology -- Drexel University, 200

    Semantic annotation and summarization of biomedical text

    Get PDF
    Advancements in the biomedical community are largely documented and published in text format in scientific forums such as conference papers and journals. To address the scalability of utilizing the large volume of text-based information generated by continuing advances in the biomedical field, two complementary areas are studied. The first area is Semantic Annotation, which is a method for providing machineunderstandable information based on domain-specific resources. A novel semantic annotator, CONANN, is implemented for online matching of concepts defined by a biomedical metathesaurus. CONANN uses a multi-level filter based on both information retrieval and shallow natural language processing techniques. CONANN is evaluated against a state-of-the-art biomedical annotator using the performance measures of time (e.g. number of milliseconds per noun phrase) and precision/recall of the resulting concept matches. CONANN shows that annotation can be performed online, rather than offline, without a significant loss of precision and recall as compared to current offline systems. The second area of study is Text Summarization which is used as a way to perform data reduction of clinical trial texts while still describing the main themes of a biomedical document. The text summarization work is unique in that it focuses exclusively on summarizing biomedical full-text sources as opposed to abstracts, and also exclusively uses domain-specific concepts, rather than terms, to identify important information within a biomedical text. Two novel text summarization algorithms are implemented: one using a concept chaining method based on existing work in lexical chaining (BioChain), and the other using concept distribution to match important sentences between a source text and a generated summary (FreqDist). The BioChain and FreqDist summarizers are evaluated using the publicly-available ROUGE summary evaluation tool. ROUGE compares n-gram co-occurrences between a system summary and one or more model summaries. The text summarization evaluation shows that the two approaches outperform nearly all of the existing term-based approaches.Ph.D., Information Science and Technology -- Drexel University, 200

    Concept frequency distribution in biomedical text summarization

    Get PDF
    Text summarization is a data reduction process. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the highvolume of publications. Our contribution is two-fold: 1) to propose the frequency of domain concepts as a method to identify important sentences within a full-text; and 2) propose a novel frequency distribution model and algorithm for identifying important sentences based on term or concept frequency distribution. An evaluation of several existing summarization systems using biomedical texts is presented in order to determine a performance baseline. For domain concept comparison, a recent high-performing frequency-based algorithm using terms is adapted to use concepts and evaluated using both terms and concepts. It is shown that the use of concepts performs closely with the use of terms for sentence selection. Our proposed frequency distribution model and algorithm outperforms a state-of-the-art approach
    corecore