    Automatic Ontology Extraction from Unstructured Texts

    Abstract. Construction of the ontology of a specific domain currently relies on the intuition of a knowledge engineer, and the typical output is a thesaurus of terms, each of which is expected to denote a concept. Ontological ‘engineers’ tend to hand-craft these thesauri on an ad-hoc basis and on a relatively small-scale. Workers in the specific domain create their own special language, and one device for this creation is the repetition of select keywords for consolidat-ing or rejecting one or more concepts. A more scalable, systematic and auto-matic approach to ontology construction is possible through the automatic iden-tification of these keywords. An approach for the study and extraction of key-words is outlined where a corpus of randomly collected unstructured, i.e. not containing any kind of mark-up, texts in a specific domain is analysed with ref-erence to the lexical preferences of the workers in the domain. An approxima-tion about the role of frequently used single words within multiword expres-sions leads us to the creation of a semantic network. The network can be as-serted into a terminology database or knowledge representation formalism, and the relationship between the nodes of the network helps in the visualisation of, and automatic inference over, the frequently used words denoting important concepts in the domain. We illustrate our approach with a case study using corpora from three time periods on the emergence and consolidation of nuclear physics. The text-based approach appears to be less subjective and more suit-able for introspection, and is perhaps useful in ontology evolution.

    Ontology of Strategic Information Systems Planning

    Strategically planning and aligning information systems is still one the most challenging IT tasks for organizations. Literature has contributed to describe and analyze the phenomena labeling the process of Strategic Information Systems Planning (SISP) as the one that pursues the alignment of the IS/IT initiatives to achieve business goals. Statistics reveal, however, that those goals are significantly not being achieved, leaving the discussion open to know whether the SISP models, frameworks and methods are correct, complete, applicable, feasible or not. In order to understand and visualize the potential gaps and biases in the SISP literature, the paper introduces an ontology of the SISP process that allows systematically and symmetrically expand study to contribute to maturation of the scientific field as well as to identify the critical omissions within it. Later, the ontological analysis will allow the visualization of bright, light, and blind/blank areas of knowledge documented on SISP

    Space mission design ontology : extraction of domain-specific entities and concepts similarity analysis

    Expert Systems, computer programs able to capture human expertise and mimic experts’ reasoning, can support the design of future space missions by assimilating and facilitating access to accumulated knowledge. To organise these data, the virtual assistant needs to understand the concepts characterising space systems engineering. In other words, it needs an ontology of space systems. Unfortunately, there is currently no official European space systems ontology. Developing an ontology is a lengthy and tedious process, involving several human domain experts, and therefore prone to human error and subjectivity. Could the foundations of an ontology be instead semi-automatically extracted from unstructured data related to space systems engineering? This paper presents an implementation of the first layers of the Ontology Learning Layer Cake, an approach to semi-automatically generate an ontology. Candidate entities and synonyms are extracted from three corpora: a set of 56 feasibility reports provided by the European Space Agency, 40 books on space mission design publicly available and a collection of 273 Wikipedia pages. Lexica of relevant space systems entities are semi-automatically generated based on three different methods: a frequency analysis, a term frequency-inverse document frequency analysis, and a Weirdness Index filtering. The frequency-based lexicon of the combined corpora is then fed to a word embedding method, word2vec, to learn the context of each entity. With a cosine similarity analysis, concepts with similar contexts are matched

    An Ontology of Megaprojects

    Megaprojects are symbolic milestones of human history. From the Great Pyramid of Giza and the Great Wall of China to the Hoover Dam and the Manhattan Project, history is marked by an array of megaprojects. Some megaprojects are born out of necessity while others showcase power and status of individuals, groups, or countries. Most megaprojects are one-of-a-kind endeavors to which traditional project management principles are neither applicable nor suitable, rendering the holistic study of megaprojects especially difficult. Regardless of the recent uptick in research on megaprojects there is no systemic framework that can help systematically assess and guide megaprojects and megaproject research. In the absence of such a framework there is a significant risk of bias in planning the projects and the topics researched. In this paper, we present an ontology of megaprojects and discuss how it can help analyze individual megaprojects and synthesize the corpus of megaproject research

    Ontological Meta-Analysis and Synthesis

    We present ontological meta-analysis and synthesis as a method for reviewing, mapping, and visualizing the research literature in a domain cumulatively, logically, systematically, and systemically. The method highlights a domain’s bright spots that have been heavily studied, the light spots that have been lightly studied, the blind spots that have been overlooked, and the blank spots that have not been studied. It highlights the biases in a domain’s research; the research can then be realigned to make it stronger and more effective. We illustrate the method using the emerging domain of public health informatics (PHI). We present an ontological framework for the domain, map the literature onto the framework, and highlight its bright, light, and blind/blank spots. We also present detailed analyses using the ontological maps of dyads and triads. We conclude by discussing how (a) the results can be used to realign PHI research, and (b) the method can be used in other information systems domains

    Extração automática de ontologias em textos de culinária não estruturados

    Dissertação de mestrado integrado em Engenharia InformáticaA resolução de problemas no âmbito de um domínio específico pode adotar técnicas e ideologias distintas. Para tal, é vital e imperativo elaborar uma análise contextual a todos os elementos pertencentes à teia de relações entre conceitos. Nesse sentido, o uso de uma ontologia permite construir uma rede semântica, no qual a mais importante premissa é a correta identificação dos conceitos e respetivos atributos. A automatização do processo de extração de ontologias permite construir ontologias mais escaláveis e uniformes, extraindo conhecimento assente nas mesmas premissas e padrões. No plano geral, uma extração automática facilita a análise e a leitura de informação de um problema apresentado numa linguagem própria. O trabalho desta dissertação focou-se na extração de conhecimento em textos não estruturados, mais concretamente, textos de culinária, com o intuito de disponibilizar uma ontologia que espelhasse o conhecimento interligado entre receitas. O verdadeiro desafio passa pela correta identificação de termos relevantes, com base em análise sintática, semântica, e linguística em geral, e pela formalização de relações entre os mesmos. A utilização de mecanismos de controlo e de automatização permitiu a extração do conhecimento presente nos textos não estruturados. Estes mecanismos foram aplicados conforme as características linguísticas inerentes aos documentos e restrições de domínio. A ontologia gerada pode ser consultada através de uma plataforma web, na qual o utilizador pode pesquisar os documentos importados no sistema e analisar a interligação entre receitas através da pesquisa por termos e por hiperligações que se encontram nos detalhes de cada registo de culinária.The resolution of problems within a specific domain may adopt distinct approaches. As such, it is vital and imperative to elaborate a context analysis to each and every single existing element in the domain. For that matter, the use of an ontology allows the construction of a semantic environment where the most important factor is the correct identification of the concepts and its attributes. The automation of the whole process enables the ability to create more scalable and sustainable ontologies while extracting knowledge based on the same premises and patterns. An automatic extraction eases the analysis and understading of the information presented in a problem, usually written in natural language. This dissertation takes focus on the knowledge extraction in unstructured texts — culinary texts to be precised — with the sole goal of generating an ontology that exposes the knowledge intertwined between recipes. The main challenge presents itself as identifying the correct relevant terms, based upon context analysis and linguistics, and formalizing the relations among them. Using the proper control and automation mechanisms ensure the best results when retrieving knowledge from unstructured texts. Those mechanisms are chosen regarding linguistic characteristics and the corpus domain. The generated ontology will be used as the backend of a web platform, where the user may search for the desired recipes imported in the system. Thus, the connection between recipes is highlighted when searching for a specific term and the hyperlinks embedded in recipe detailed information

    The role of terminology and local grammar in video annotation

    The linguistic annotation' of video sequences is an intellectually challenging task involving the investigation of how images and words are linked .together, a task that is ultimately financially rewarding in that the eventual automatic retrieval of video (sequences) can be much less time consuming, subjective and expensive than when retrieved manually. Much effort has been focused on automatic or semi-automatic annotation. Computational linguistic methods of video annotation rely on collections of collateral text in the form of keywords and proper nouns. Keywords are often used in a particular order indicating an identifiable pattern which is often limited and can subsequently be used to annotate the portion of a video where such a pattern occurred. Once' the relevant keywords and patterns have been stored, they can then be used to annotate the remainder of the video, excluding all collateral text which does not match the keywords or patterns. A new method of video annotation is presented in this thesis. The method facilitates a) annotation extraction of specialist terms within a corpus of collateral text; b) annotation identification of frequently used linguistic patterns to use in repeating key events within the data-set. The use of the method has led to the development of a system that can automatically assign key words and key patterns to a number of frames that are found in the commentary text approximately contemporaneous to the selected number of frames. The system does not perform video analysis; it only analyses the collateral text. The method is based on corpus linguistics and is mainly frequency based - frequency of occurrence of a key word or key pattern is taken as the basis of its representation. No assumptions are made about the grammatical structure of the language used in the collateral text, neither is a lexica of key words refined. Our system has been designed to annotate videos of football matches in English a!ld Arabic, and also cricket videos in English. The system has also been designed to retrieve annotated clips. The system not only provides a simple search method for annotated clips retrieval, it also provides complex, more advanced search methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Terminologia e ontologias : metodologias para representação do conhecimento

    Doutoramento em LinguísticaDiscute-se na presente dissertação as metodologias de representação do conhecimento que podem ser utilizadas em terminologia na construção de ontologias. Através da análise de duas abordagens terminológicas – semasiologia e onomasiologia – observa-se o estatuto do texto de especialidade sob um ponto de vista teórico e prático, questionando-se a sua importância e o contributo do terminólogo e do especialista na captura de conhecimento enquanto especificação informal de uma conceptualização.The subject matter of this dissertation is the discussion of knowledge representation methodologies that can be used in terminology for the construction of ontologies. The theoretical and practical analysis of two terminological approaches – semasiological and onomasiological – will allow us to observe the role played by the text for special purposes while questioning its importance and the contribution of the terminologist and the domain specialist in the capture of knowledge as representation of an informal specification of a conceptualization