4,907 research outputs found
Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances
Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification
Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams
Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology
Online event-based conservation documentation: A case study from the IIC website
There is a wealth of conservation-related resources that are published online on institutional and personal websites. There is value in searching across these websites, but this is currently impossible because the published data do not conform to any universal standard. This paper begins with a review of the types of classifications employed for conservation content in several conservation websites. It continues with an analysis of these classifications and it identifies some of their limitations that are related to the lack of conceptual basis of the classification terms used. The paper then draws parallels with similar problems in other professional fields and investigates the technologies used to resolve them. Solutions developed in the fields of computer science and knowledge organization are then described. The paper continues with the survey of two important resources in cultural heritage: the ICOM-CIDOC-CRM and the Getty vocabularies and it explains how these resources can be combined in the field of conservation documentation to assist the implementation of a common publication framework across different resources. A case study for the proposed implementation is then presented based on recent work on the IIC website. The paper concludes with a summary of the benefits of the recommended approach. An appendix with a selection of classification terms with reasonable coverage for conservation content is included
Information Extraction based on Named Entity for Tourism Corpus
Tourism information is scattered around nowadays. To search for the
information, it is usually time consuming to browse through the results from
search engine, select and view the details of each accommodation. In this
paper, we present a methodology to extract particular information from full
text returned from the search engine to facilitate the users. Then, the users
can specifically look to the desired relevant information. The approach can be
used for the same task in other domains. The main steps are 1) building
training data and 2) building recognition model. First, the tourism data is
gathered and the vocabularies are built. The raw corpus is used to train for
creating vocabulary embedding. Also, it is used for creating annotated data.
The process of creating named entity annotation is presented. Then, the
recognition model of a given entity type can be built. From the experiments,
given hotel description, the model can extract the desired entity,i.e, name,
location, facility. The extracted data can further be stored as a structured
information, e.g., in the ontology format, for future querying and inference.
The model for automatic named entity identification, based on machine learning,
yields the error ranging 8%-25%.Comment: 6 pages, 9 figure
Big data warehouse framework for smart revenue management
Revenue Managementâs most cited definitions is probably âto sell the right accommodation to the
right customer, at the right time and the right price, with optimal satisfaction for customers and hoteliersâ.
Smart Revenue Management (SRM) is a project, which aims the development of smart automatic techniques
for an efficient optimization of occupancy and rates of hotel accommodations, commonly referred to, as
revenue management. One of the objectives of this project is to demonstrate that the collection of Big Data,
followed by an appropriate assembly of functionalities, will make possible to generate a Data Warehouse
necessary to produce high quality business intelligence and analytics. This will be achieved through the
collection of data extracted from a variety of sources, including from the web. This paper proposes a three stage
framework to develop the Big Data Warehouse for the SRM. Namely, the compilation of all available
information, in the present case, it was focus only the extraction of information from the web by a web crawler
â raw data. The storing of that raw data in a primary NoSQL database, and from that data the conception of a
set of functionalities, rules, principles and semantics to select, combine and store in a secondary relational
database the meaningful information for the Revenue Management (Big Data Warehouse). The last stage will
be the principal focus of the paper. In this context, clues will also be giving how to compile information for
Business Intelligence. All these functionalities contribute to a holistic framework that, in the future, will make
it possible to anticipate customers and competitorâs behavior, fundamental elements to fulfill the Revenue
Managemen
- âŠ