4,907 research outputs found

    Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances

    Get PDF
    Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification

    Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams

    Get PDF
    Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology

    Online event-based conservation documentation: A case study from the IIC website

    Full text link
    There is a wealth of conservation-related resources that are published online on institutional and personal websites. There is value in searching across these websites, but this is currently impossible because the published data do not conform to any universal standard. This paper begins with a review of the types of classifications employed for conservation content in several conservation websites. It continues with an analysis of these classifications and it identifies some of their limitations that are related to the lack of conceptual basis of the classification terms used. The paper then draws parallels with similar problems in other professional fields and investigates the technologies used to resolve them. Solutions developed in the fields of computer science and knowledge organization are then described. The paper continues with the survey of two important resources in cultural heritage: the ICOM-CIDOC-CRM and the Getty vocabularies and it explains how these resources can be combined in the field of conservation documentation to assist the implementation of a common publication framework across different resources. A case study for the proposed implementation is then presented based on recent work on the IIC website. The paper concludes with a summary of the benefits of the recommended approach. An appendix with a selection of classification terms with reasonable coverage for conservation content is included

    Information Extraction based on Named Entity for Tourism Corpus

    Full text link
    Tourism information is scattered around nowadays. To search for the information, it is usually time consuming to browse through the results from search engine, select and view the details of each accommodation. In this paper, we present a methodology to extract particular information from full text returned from the search engine to facilitate the users. Then, the users can specifically look to the desired relevant information. The approach can be used for the same task in other domains. The main steps are 1) building training data and 2) building recognition model. First, the tourism data is gathered and the vocabularies are built. The raw corpus is used to train for creating vocabulary embedding. Also, it is used for creating annotated data. The process of creating named entity annotation is presented. Then, the recognition model of a given entity type can be built. From the experiments, given hotel description, the model can extract the desired entity,i.e, name, location, facility. The extracted data can further be stored as a structured information, e.g., in the ontology format, for future querying and inference. The model for automatic named entity identification, based on machine learning, yields the error ranging 8%-25%.Comment: 6 pages, 9 figure

    Big data warehouse framework for smart revenue management

    Get PDF
    Revenue Management’s most cited definitions is probably “to sell the right accommodation to the right customer, at the right time and the right price, with optimal satisfaction for customers and hoteliers”. Smart Revenue Management (SRM) is a project, which aims the development of smart automatic techniques for an efficient optimization of occupancy and rates of hotel accommodations, commonly referred to, as revenue management. One of the objectives of this project is to demonstrate that the collection of Big Data, followed by an appropriate assembly of functionalities, will make possible to generate a Data Warehouse necessary to produce high quality business intelligence and analytics. This will be achieved through the collection of data extracted from a variety of sources, including from the web. This paper proposes a three stage framework to develop the Big Data Warehouse for the SRM. Namely, the compilation of all available information, in the present case, it was focus only the extraction of information from the web by a web crawler – raw data. The storing of that raw data in a primary NoSQL database, and from that data the conception of a set of functionalities, rules, principles and semantics to select, combine and store in a secondary relational database the meaningful information for the Revenue Management (Big Data Warehouse). The last stage will be the principal focus of the paper. In this context, clues will also be giving how to compile information for Business Intelligence. All these functionalities contribute to a holistic framework that, in the future, will make it possible to anticipate customers and competitor’s behavior, fundamental elements to fulfill the Revenue Managemen
    • 

    corecore