13 research outputs found

    A Survey Paper on Ontology-Based Approaches for Semantic Data Mining

    Get PDF
    Semantic Data Mining alludes to the information mining assignments that deliberately consolidate area learning, particularly formal semantics, into the procedure. Numerous exploration endeavors have validated the advantages of fusing area learning in information mining and in the meantime, the expansion of information building has enhanced the group of space learning, particularly formal semantics and Semantic Web ontology. Ontology is an explicit specification of conceptualization and a formal approach to characterize the semantics of information and data. The formal structure of ontology makes it a nature approach to encode area information for the information mining utilization. Here in Semantic information mining ontology can possibly help semantic information mining and how formal semantics in ontologies can be joined into the data mining procedure. DOI: 10.17762/ijritcc2321-8169.16048

    RuThes cloud: Towards a multilevel linguistic linked open data resource for Russian

    Get PDF
    © 2017, Springer International Publishing AG. In this paper we present a new multi-level Linguistic Linked Open Data resource for Russian. It covers four linguistic levels: semantic, lexical, morphological and syntactic. The resource has been constructed on base of the well-known RuThes thesaurus and the original hitherto unpublished Extended Zaliznyak grammatical dictionary. The resource is represented in terms of SKOS, Lemon, and LexInfo ontologies and a new custom ontology. Building the resource, we automatically completed the following tasks: merging source resources upon common lexical entries, decomposing complex lexical entries, and publishing constructed resource as LLOD-compatible dataset. We demonstrate the use case in which the developed resource is exploited in IR task. We hope that our work can serve as a crystallization point of the LLOD cloud in Russian

    Expertise Profiling in Evolving Knowledgecuration Platforms

    Get PDF
    Expertise modeling has been the subject of extensiveresearch in two main disciplines: Information Retrieval (IR) andSocial Network Analysis (SNA). Both IR and SNA approachesbuild the expertise model through a document-centric approachproviding a macro-perspective on the knowledge emerging fromlarge corpus of static documents. With the emergence of the Webof Data there has been a significant shift from static to evolvingdocuments, through micro-contributions. Thus, the existingmacro-perspective is no longer sufficient to track the evolution ofboth knowledge and expertise. In this paper we present acomprehensive, domain-agnostic model for expertise profiling inthe context of dynamic, living documents and evolving knowledgebases. We showcase its application in the biomedical domain andanalyze its performance using two manually created datasets

    ENRICHMENT AND POPULATION OF A GEOSPATIAL ONTOLOGY FOR SEMANTIC INFORMATION EXTRACTION

    Get PDF
    The massive amount of user-generated content available today presents a new challenge for the geospatial domain and a great opportunity to delve into linguistic, semantic, and cognitive aspects of geographic information. Ontology-based information extraction is a new, prominent field in which a domain ontology guides the extraction process and the identification of pre-defined concepts, properties, and instances from natural language texts. The paper describes an approach for enriching and populating a geospatial ontology using both a top-down and a bottom-up approach in order to enable semantic information extraction. The top-down approach is applied in order to incorporate knowledge from existing ontologies. The bottom-up approach is applied in order to enrich and populate the geospatial ontology with semantic information (concepts, relations, and instances) extracted from domain-specific web content

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    Moving towards the semantic web: enabling new technologies through the semantic annotation of social contents.

    Get PDF
    La Web Social ha causat un creixement exponencial dels continguts disponibles deixant enormes quantitats de recursos textuals electrònics que sovint aclaparen els usuaris. Aquest volum d’informació és d’interès per a la comunitat de mineria de dades. Els algorismes de mineria de dades exploten característiques de les entitats per tal de categoritzar-les, agrupar-les o classificar-les segons la seva semblança. Les dades per si mateixes no aporten cap mena de significat: han de ser interpretades per esdevenir informació. Els mètodes tradicionals de mineria de dades no tenen com a objectiu “entendre” el contingut d’un recurs, sinó que extreuen valors numèrics els quals esdevenen models en aplicar-hi càlculs estadístics, que només cobren sentit sota l’anàlisi manual d’un expert. Els darrers anys, motivat per la Web Semàntica, molts investigadors han proposat mètodes semàntics de classificació de dades capaços d’explotar recursos textuals a nivell conceptual. Malgrat això, normalment aquests mètodes depenen de recursos anotats prèviament per poder interpretar semànticament el contingut d’un document. L’ús d’aquests mètodes està estretament relacionat amb l’associació de dades i el seu significat. Aquest treball es centra en el desenvolupament d’una metodologia genèrica capaç de detectar els trets més rellevants d’un recurs textual descobrint la seva associació semàntica, es a dir, enllaçant-los amb conceptes modelats a una ontologia, i detectant els principals temes de discussió. Els mètodes proposats són no supervisats per evitar el coll d’ampolla generat per l’anotació manual, independents del domini (aplicables a qualsevol àrea de coneixement) i flexibles (capaços d’analitzar recursos heterogenis: documents textuals o documents semi-estructurats com els articles de la Viquipèdia o les publicacions de Twitter). El treball ha estat avaluat en els àmbits turístic i mèdic. Per tant, aquesta dissertació és un primer pas cap a l'anotació semàntica automàtica de documents necessària per possibilitar el camí cap a la visió de la Web Semàntica.La Web Social ha provocado un crecimiento exponencial de los contenidos disponibles, dejando enormes cantidades de recursos electrónicos que a menudo abruman a los usuarios. Tal volumen de información es de interés para la comunidad de minería de datos. Los algoritmos de minería de datos explotan características de las entidades para categorizarlas, agruparlas o clasificarlas según su semejanza. Los datos por sí mismos no aportan ningún significado: deben ser interpretados para convertirse en información. Los métodos tradicionales no tienen como objetivo "entender" el contenido de un recurso, sino que extraen valores numéricos que se convierten en modelos tras aplicar cálculos estadísticos, los cuales cobran sentido bajo el análisis manual de un experto. Actualmente, motivados por la Web Semántica, muchos investigadores han propuesto métodos semánticos de clasificación de datos capaces de explotar recursos textuales a nivel conceptual. Sin embargo, generalmente estos métodos dependen de recursos anotados previamente para poder interpretar semánticamente el contenido de un documento. El uso de estos métodos está estrechamente relacionado con la asociación de datos y su significado. Este trabajo se centra en el desarrollo de una metodología genérica capaz de detectar los rasgos más relevantes de un recurso textual descubriendo su asociación semántica, es decir, enlazándolos con conceptos modelados en una ontología, y detectando los principales temas de discusión. Los métodos propuestos son no supervisados para evitar el cuello de botella generado por la anotación manual, independientes del dominio (aplicables a cualquier área de conocimiento) y flexibles (capaces de analizar recursos heterogéneos: documentos textuales o documentos semi-estructurados, como artículos de la Wikipedia o publicaciones de Twitter). El trabajo ha sido evaluado en los ámbitos turístico y médico. Esta disertación es un primer paso hacia la anotación semántica automática de documentos necesaria para posibilitar el camino hacia la visión de la Web Semántica.Social Web technologies have caused an exponential growth of the documents available through the Web, making enormous amounts of textual electronic resources available. Users may be overwhelmed by such amount of contents and, therefore, the automatic analysis and exploitation of all this information is of interest to the data mining community. Data mining algorithms exploit features of the entities in order to characterise, group or classify them according to their resemblance. Data by itself does not carry any meaning; it needs to be interpreted to convey information. Classical data analysis methods did not aim to “understand” the content and the data were treated as meaningless numbers and statistics were calculated on them to build models that were interpreted manually by human domain experts. Nowadays, motivated by the Semantic Web, many researchers have proposed semantic-grounded data classification and clustering methods that are able to exploit textual data at a conceptual level. However, they usually rely on pre-annotated inputs to be able to semantically interpret textual data such as the content of Web pages. The usability of all these methods is related to the linkage between data and its meaning. This work focuses on the development of a general methodology able to detect the most relevant features of a particular textual resource finding out their semantics (associating them to concepts modelled in ontologies) and detecting its main topics. The proposed methods are unsupervised (avoiding the manual annotation bottleneck), domain-independent (applicable to any area of knowledge) and flexible (being able to deal with heterogeneous resources: raw text documents, semi-structured user-generated documents such Wikipedia articles or short and noisy tweets). The methods have been evaluated in different fields (Tourism, Oncology). This work is a first step towards the automatic semantic annotation of documents, needed to pave the way towards the Semantic Web vision

    Automated energy compliance checking in construction

    Get PDF
    Automated energy compliance checking aims to automatically check the compliance of a building design – in a building information model (BIM) – with applicable energy requirements. A significant number of efforts in both industry and academia have been undertaken to automate the compliance checking process. Such efforts have achieved various levels of automation, expressivity, representativeness, accuracy, and efficiency. Despite the contributions of these efforts, there are two main gaps in existing automated compliance checking (ACC) efforts. First, existing methods are not fully-automated and/or not generalizable across different types of documents. They require different degrees of manual efforts to extract requirements from text into computer-processable representations, and matching the concept representations of the extracted requirements to those of the BIM. Second, existing methods only focused on code checking. There is still a lack of efforts that address contract specification checking. To address these gaps, this thesis aims to develop a fully-automated ACC method for checking BIM-represented building designs for compliance with energy codes and contract specifications. The research included six primary research tasks: (1) conducting a comprehensive literature review; (2) developing a semantic, domain-specific, machine learning-based text classification method and algorithm for classifying energy regulatory documents (including energy codes) and contract specifications for supporting energy ACC in construction; (3) developing a semantic, natural language processing (NLP)-enabled, rule-based information extraction method and algorithm for automated extraction of energy requirements from energy codes; (4) adapting the information extraction method and algorithm for automated extraction of energy requirements from contract specifications; (5) developing a fully-automated, semantic information alignment method and algorithm for aligning the representations used in the BIMs to the representations used in the energy codes and contract specifications; and (6) implementing the aforementioned methods and algorithms in a fully-automated energy compliance checking prototype, called EnergyACC, and using it in conducting a case study to identify the feasibility and challenges for developing an ACC method that is fully-automated and generalized across different types of regulatory documents. Promising noncompliance detection performance was achieved for both energy code checking (95.7% recall and 85.9% precision) and contract specification checking (100% recall and 86.5% precision)
    corecore