444 research outputs found
Multi Domain Semantic Information Retrieval Based on Topic Model
Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important information from numerous database sources have been a key challenge in current IR systems.
Topic modeling is one of the most recent techniquesthat discover hidden thematic structures from large data collections without human supervision. Several topic models have been proposed in various fields of study and have been utilized extensively for many applications. Latent Dirichlet Allocation (LDA) is the most well-known topic model that generates topics from large corpus of resources, such as text, images, and audio.It has been widely used in many areas in information retrieval and data mining, providing efficient way of identifying latent topics among document collections. However, LDA has a drawback that topic cohesion within a concept is attenuated when estimating infrequently occurring words. Moreover, LDAseems not to consider the meaning of words, but rather to infer hidden topics based on a statisticalapproach. However, LDA can cause either reduction in the quality of topic words or increase in loose relations between topics.
In order to solve the previous problems, we propose a domain specific topic model that combines domain concepts with LDA. Two domain specific algorithms are suggested for solving the difficulties associated with LDA. The main strength of our proposed model comes from the fact that it narrows semantic concepts from broad domain knowledge to a specific one which solves the unknown domain problem. Our proposed model is extensively tested on various applications, query expansion, classification, and summarization, to demonstrate the effectiveness of the model. Experimental results show that the proposed model significantly increasesthe performance of applications
The best of both worlds: highlighting the synergies of combining manual and automatic knowledge organization methods to improve information search and discovery.
Research suggests organizations across all sectors waste a significant amount of time looking for information and often fail to leverage the information they have. In response, many organizations have deployed some form of enterprise search to improve the 'findability' of information. Debates persist as to whether thesauri and manual indexing or automated machine learning techniques should be used to enhance discovery of information. In addition, the extent to which a knowledge organization system (KOS) enhances discoveries or indeed blinds us to new ones remains a moot point. The oil and gas industry was used as a case study using a representative organization. Drawing on prior research, a theoretical model is presented which aims to overcome the shortcomings of each approach. This synergistic model could help to re-conceptualize the 'manual' versus 'automatic' debate in many enterprises, accommodating a broader range of information needs. This may enable enterprises to develop more effective information and knowledge management strategies and ease the tension between what arc often perceived as mutually exclusive competing approaches. Certain aspects of the theoretical model may be transferable to other industries, which is an area for further research
Semi-automated Ontology Generation for Biocuration and Semantic Search
Background:
In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed.
Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing.
Motivation:
The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences.
Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods.
Results:
The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results.
To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
Semi-automated Ontology Generation for Biocuration and Semantic Search
Background:
In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed.
Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing.
Motivation:
The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences.
Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods.
Results:
The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results.
To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
Recommended from our members
Automatic message annotation and semantic interface for context aware mobile computing
This thesis was submitted for the degree of Docter of Philosophy and awarded by Brunel University.In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.The Ministry of Higher Education and Scientific Research (IRAQ
- …