110 research outputs found

    Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

    Full text link
    Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

    Linked Data Supported Information Retrieval

    Get PDF
    Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem

    Semantic web and semantic technologies to enhance innovation and technology watch processes

    Get PDF
    Innovation is a key process for Small and Medium Enterprises in order to survive and evolve in a competitive environment. Ideas and idea management are considered the basis for Innovation. Gathering data on how current technologies and competitors evolve is another key factor for companies' innovation. Therefore, this thesis focuses the application of Information and Communication Technologies and more specifically Semantic Web and Semantic Technologies on Idea Management Systems and Technology Watch Systems. Innovation and Technology Watch platform managers usually face many problems related with the data they collect and manage. Those managers have to deal with a large amount of information distributed in different platforms, not always interoperable among them. It is vital to share data between platforms so it can be converted into knowledge. Many of the tasks they perform are non productive and too much time and effort is expended on them. Moreover, Innovation process managers have difficulties in identifying why an idea contest has been successful. Our proposal is to analyze different Information and Communication Technologies that can assist companies with their Innovation and Technology Watch processes. Thus, we studied several Semantic and Web technologies, we build some conceptual models and tested them in different case studies to see the results achieved in real scenarios. The outcome of this thesis has been the creation of a solution architecture to enable interoperability among platforms and to ease the work of the process' managers. In this framework and to complement the architecture, two ontologies have been developed: (1) Gi2Mo Wave and (2) Mentions Ontology. On one hand, Gi2Mo Wave focused on annotating the background of idea contests, assisting on the analysis of the contests and easing its replication. On the other hand, Mentions Ontology focused on annotating the elements mentioned in plain text content, such as ideas or news items. That way, Mentions Ontology creates a way to link the related content, enabling the interoperability among content from different platforms. In order to test the architecture, a new web Idea Management System and a Technology Watch system have been also developed. The platforms incorporate semantic ontologies and tools to enable interoperability. We also demonstrate how Semantic Technologies reduce human workload by contributing on the automatic classification of content in the Technology Watch process. Finally, conclusions have been gathered according to the results achieved testing the used technologies, identifying the ones with best results.Berrikuntza prozesu oso garrantzitsu bat da Enpresa Txiki eta Ertainen lehiakor eta bizirik irauteko ingurumen lehiakor batean. Berrikuntza prozesuek ideiak eta ideien kudeaketa dituzte oinarri gisa. Teknologiek eta lehiakideek nola eboluzionatzen duten jakitzea ere garrantzitsua da enpresen berrikuntzarako, eta baita ere informazio hori kudeatzea. Beraz, Informazio eta Komunikazio sistemen aplikazioan oinarritzen da tesi hau, zehazkiago Web Semantika eta Teknologia Semantikoetan eta hauen aplikazioa Ideia Kudeaketa eta Zaintza Teknologikoko sistemetan. Berrikuntza eta Zaintza Teknologikoko plataformen kudeatzaileek arazo larriak izaten dituzte jasotako datuekin eta haien kudeaketarekin. Kudeatzaile horiek plataforma ezberdinetan banatutako informazio kantitate handi batekin topo egiten dute eta plataforma horiek ez dira beti elkar eraginkorrak. Beraz, beharrezkoa da plataforma ezberdinetako datuak elkarren artean partekatzea gero datu horiek “ezagutza” bihurtzeko. Gainera, kudeatzaileek egiten dituzten zeregin kopuru handi bat zeregin ez emankorrak dira, denbora eta esfortzu handia suposatzen dute baliozko ezer gehitu gabe. Eta ez hori bakarrik, berrikuntza prozesuko kudeatzaileek zail izaten dute ideia lehiaketen arrakastaren arrazoiak identifikatzen. Gure proposamena Informazio eta Komunikazio Teknologia ezberdinak frogatzea da enpresen berrikuntzako eta zaintza teknologikoko prozesuetan laguntzeko. Honela, hainbat teknologia semantiko eta web teknologia aztertu dira, modelo kontzeptual batzuk eraikitzen eta probatzen benetako erabilpen kasutan lortutako emaitzak konprobatzeko. Tesi honen lorpena plataformen arteko elkar eraginkortasuna ahalbidetzen duen eta prozesuen kudeatzaileen lana errazten duen modelo baten sorpena izan da. Horrela eta sortutako modeloa konplimentatzeko, bi ontologia sortu dira: (1) Gi2Mo- Wave eta (2) Mentions Ontology. Alde batetik, Gi2Mo-Wave ontologia ideien eta ideia lehiaketen testuinguruaren errepresentazio semantikoan oinarritu da. Horrela testuinguruaren analisia errazten da, ideia lehiaketa arrakastatsuak errepikatzea ere errazagoa eginez. Bestalde, Mentions-Ontology ontologia eduki ezberdinen (ideiak edo berriak adibidez) testuetan aipatutako elementuen errepresentazio semantikoan oinarritu da. Horrela, Mentions Ontology ontologiak edukia elkar konektatzeko era bat sortzen du, plataforma ezberdinen edukiaren arteko elkar eraginkortasuna ahalbidetzen. Modelo edo arkitektura hau frogatzeko, Ideia Kudeaketa Sistema eta Zaintza teknologikoko web plataforma berri batzuk garatu dira ere. Plataforma hauek tresna eta ontologia semantikoak dituzte txertatuta, beraien arteko elkar eraginkortasuna ahalbidetzeko. Gainera, teknologia semantikoen aplikazioarekin giza lan kargaren murrizketa nola gauzatu ere frogatzen dugu, Zaintza Teknologikoko edukiaren klasifikazio automatikoan ekarpenak eginez. Bukatzeko, konklusioak bildu dira erabili diren teknologien frogetatik jasotako emaitzetan oinarrituta eta emaitza onenak lortu dituztenak identifikatu dira.El proceso de Innovación es un proceso clave para la supervivencia y evolución de las Pequeñas y Medianas Empresas en un entorno competitivo. Las ideas y la gestión de ideas se consideran la base de la innovación. Recopilar datos sobre cómo evolucionan las actuales tecnologías y los competidores es otro factor clave para la innovación de las empresas. Por lo tanto, esta tesis se centra en la aplicación de Tecnologías de la Información y Comunicación, más concretamente la aplicación de Web Semántica y Tecnologías Semánticas en los Sistemas de Gestión de ideas y de Vigilancia Tecnológica. Los gestores de las plataformas de innovación y de vigilancia tecnológica se enfrentan a muchos problemas relacionados con los datos que recogen y gestionan. Esos gestores se enfrentan a una gran cantidad de información distribuida en diferentes plataformas, no siempre interoperables entre ellas. Es de vital importancia que las diferentes plataformas sean capaces de compartir datos entre ellas, de modo que esos datos puedan convertirse en el conocimiento. Muchas de las tareas realizadas por estos gestores son tareas no productivas y se invierte demasiado tiempo y esfuerzo en realizarlas. Además, los responsables de los procesos de innovación tienen dificultades para identificar por qué un concurso de ideas ha sido un éxito. Nuestra propuesta es analizar diferentes Tecnologías de Información y Comunicación que puedan ayudar a las empresas con sus procesos de Innovación y Vigilancia Tecnológica. Por ello, hemos estudiado varias tecnologías semánticas y Web, hemos desarrollado algunos modelos conceptuales y los hemos probado en diferentes casos de estudio para ver los resultados obtenidos en escenarios reales. El resultado de este trabajo ha sido la creación de una arquitectura que permite la interoperabilidad entre plataformas y que facilita el trabajo de los responsables de los procesos. En este marco, y para complementar la arquitectura, se han desarrollado dos ontologías: (1) Gi2Mo Wave y (2) Mentions Ontology. Gi2Mo Wave se centra en la anotación del contexto de los de ideas, ayudando en el análisis de los concursos y facilitando su replicación. Por otro lado, Mentions Ontology se centra en la anotación de los elementos mencionados en el texto plano de contenidos de diferente índole, como por ejemplo ideas o noticias. Así, Mentions Ontology crea una forma de encontrar relaciones entre contenidos, lo que permite la interoperabilidad entre los contenidos de diferentes plataformas. Con el fin de probar la arquitectura, también se han desarrollado dos plataformas: un Sistema de Gestión de Ideas y un Sistema de Vigilancia Tecnológica. Las plataformas incorporan ontologías semánticas y herramientas para permitir su interoperabilidad. Además, demostramos cómo reducir la carga de trabajo humana, mediante el uso de tecnologías semánticas para la clasificación automática del contenido del proceso de la Vigilancia Tecnológica. Por último, probando las tecnologías y herramientas se han recogido las conclusiones de acuerdo con los resultados obtenidos, identificando las que obtienen los mejores resultados

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Neural Networks forBuilding Semantic Models and Knowledge Graphs

    Get PDF
    1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp

    Deliverable D2.3 Specification of Web mining process for hypervideo concept identification

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for the web mining process as part of the WP2 of the LinkedTV project. The deliverable is divided into two subject areas: a) Named Entity Recognition (NER) and b) retrieval of additional content. The introduction gives an outline of the workflow of the work package, with a subsection devoted to relations with other work packages. The state-of-art review is focused on prospective techniques for LinkedTV. In the NER domain, the main focus is on knowledge-based approaches, which facilitate disambiguation of identified entities using linked open data. As part of the NER requirement analysis, the first tools developed are described and evaluated (NERD, SemiTags and THD). The area of linked additional content is broader and requires a more thorough analysis. A balanced overview of techniques for dealing with the various knowledge sources (semantic web resources, web APIs and completely unstructured resources from a white list of web sites) is presented. The requirements analysis comes out of the RBB and Sound and Vision LinkedTV scenarios

    Information extraction and representation from free text reports Isha Saxena

    Get PDF
    The need for extracting specific information has increased drastically with the boost in digital-born documents. These documents majorly comprise of free text from which structured information can be extracted. The sources include, customer review reports, patient records, financial and legal documents, etc. The needs and applications for extracting specific information from free text are growing every moment, and new researches are emerging to mine contextual information in a way that is both highly efficient and convenient in its usage. This thesis work address to the problem of extracting specific information from free text, specifically for the domains who lack labeled data. First step in the development of an advanced information extraction system is to extract and represent structured information from unstructured natural language text. To accomplish this task, the thesis proposes a system for extracting and tagging domain specific information, as domain related entities / concepts, and relational phrases. The approaches comprise of dictionary matching for domain specific concept extraction, and rule based pattern matching for relation extraction and tagging the free text accordingly. The experiments were performed on Altice Labs’1 customer reports. The system achieved over 80% recall and 90% precision for both concept and relation extraction. The proposed domain-specific concept extraction module was compared with existing concept extraction platforms: Microsoft Concept Graph2 and DBpedia Spotlight3. The proposed model yielded high performance results then both the platforms; Sumário: Extração e representação de informações de relatórios de texto livre A necessidade de extrair informações específicas aumentou drasticamente com o aumento dos documentos de origem digital. Esses documentos consistem principalmente de texto livre do qual informações estruturadas podem ser extraídas. As fontes incluem relatórios de revisão de clientes, registos de pacientes, documentos financeiros e jurídicos, etc. As necessidades e aplicações para extrair informações específicas de texto livre estão crescendo a cada momento e novas pesquisas estão surgindo para extrair informações contextuais de uma forma altamente eficiente e conveniente em seu uso. Este trabalho aborda o problema da extração de informações específicas em texto livre, especificamente para os domínios que carecem de dados etiquetados. O primeiro passo no desenvolvimento de um sistema avançado de extração de informações é extrair e representar informações estruturadas de um texto de linguagem natural não estruturado. Para cumprir essa tarefa, a tese propõe um sistema para extrair e marcar informações específicas do domínio, como entidades / conceitos relacionados ao domínio e frases relacionais. As abordagens incluem correspondência de dicionário para extração de conceitos específico de domínio e correspondência de padrão baseada em regras para extração de relação e marcação de texto livre. As experiências foram realizados nos relatórios de clientes 4 da Altice Labs. O sistema atingiu mais de 80 % de recall e 90% de precisão para extração de conceito e relação. O módulo de extração de conceito específico de domínio proposto foi comparado com plataformas de extração de conceito existentes: Microsoft Concept Graph 5 e DBpedia Spotlight 6. O modelo proposto rendeu resultados de alto desempenho para ambas as plataformas

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories

    Taming web data : exploiting linked data for integrating medical educational content

    Get PDF
    Open data are playing a vital role in different communities, including governments, businesses, and education. This revolution has had a high impact on the education field. Recently, new practices are being adopted for publishing and connecting data on the web, known as "Linked Data", and these are used to expose and connect data which were not previously linked. In the context of education, applying Linked Data practices to the growing amount of open data used for learning is potentially highly beneficial. The work presented in this thesis tackles the challenges of data acquisition and integration from distributed web data sources into one linked dataset. The application domain of this thesis is medical education, and the focus is on bridging the gap between articles published in online educational libraries and content published on Web 2.0 platforms that can be used for education. The integration of a collection of heterogeneous resources is to create links between data collected from distributed web data sources. To address these challenges, a system is proposed that exploits the Linked Data for building a metadata schema in XML/RDF format for describing resources and enriching it with external dataset that adds semantic to its metadata. The proposed system collects resources from distributed data sources on the web and enriches their metadata with concepts from biomedical ontologies, such as SNOMED CT, that enable its linking. The final result of building this system is a linked dataset of more than 10,000 resources collected from PubMed Library, YouTube channels, and Blogging platforms. The effectiveness of the system proposed is evaluated by validating the content of the linked dataset when accessed and retrieved. Ontology-based techniques have been developed for browsing and querying the linked dataset resulting from the system proposed. Experiments have been conducted to simulate users' access to the linked dataset and validate its content. The results were promising and have shown the effectiveness of using SNOMED CT for integrating distributed resources from diverse web data sources
    corecore