5 research outputs found

    Automatic Extraction and Assessment of Entities from the Web

    Get PDF
    The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to find all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The findings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research fields, such as question answering, named entity recognition, and information retrieval

    Semantic approaches to domain template construction and opinion mining from natural language

    Get PDF
    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology. Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results

    Semantic approaches to domain template construction and opinion mining from natural language

    Get PDF
    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology.\ud Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results

    Spatiotemporal enabled Content-based Image Retrieval

    Full text link

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
    corecore