110 research outputs found
Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review
Since the Simple Knowledge Organization System (SKOS) specification and its
SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a
significant number of conventional knowledge organization systems (KOS)
(including thesauri, classification schemes, name authorities, and lists of
codes and terms, produced before the arrival of the ontology-wave) have made
their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS"
as an umbrella term to refer to all of the value vocabularies and lightweight
ontologies within the Semantic Web framework. The paper provides an overview of
what the LOD KOS movement has brought to various communities and users. These
are not limited to the colonies of the value vocabulary constructors and
providers, nor the catalogers and indexers who have a long history of applying
the vocabularies to their products. The LOD dataset producers and LOD service
providers, the information architects and interface designers, and researchers
in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper
examines a set of the collected cases (experimental or in real applications)
and aims to find the usages of LOD KOS in order to share the practices and
ideas among communities and users. Through the viewpoints of a number of
different user groups, the functions of LOD KOS are examined from multiple
dimensions. This paper focuses on the LOD dataset producers, vocabulary
producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on
Digital Librarie
Linked Data Supported Information Retrieval
Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem
Semantic web and semantic technologies to enhance innovation and technology watch processes
Innovation is a key process for Small and Medium Enterprises in order to survive and evolve in a competitive environment. Ideas and idea management are considered the basis for Innovation. Gathering data on how current technologies and competitors evolve is another key factor for companies' innovation. Therefore, this thesis focuses the application of Information and Communication Technologies and more specifically Semantic Web and Semantic Technologies on Idea Management Systems and Technology Watch Systems.
Innovation and Technology Watch platform managers usually face many problems related with the data they collect and manage. Those managers have to deal with a large amount of information distributed in different platforms, not always interoperable among them. It is vital to share data between platforms so it can be converted into knowledge. Many of the tasks they perform are non productive and too much time and effort is expended on them. Moreover, Innovation process managers have difficulties in identifying why an idea contest has been successful.
Our proposal is to analyze different Information and Communication Technologies
that can assist companies with their Innovation and Technology Watch processes. Thus, we studied several Semantic and Web technologies, we build some conceptual models and tested them in different case studies to see the results achieved in real scenarios.
The outcome of this thesis has been the creation of a solution architecture to enable interoperability among platforms and to ease the work of the process' managers. In this framework and to complement the architecture, two ontologies have been developed: (1) Gi2Mo Wave and (2) Mentions Ontology. On one hand, Gi2Mo Wave focused on annotating the background of idea contests, assisting on the analysis of the contests and easing its replication. On the other hand, Mentions Ontology focused on annotating the elements mentioned in plain text content, such as ideas or news items. That way, Mentions Ontology creates a way to link the related content, enabling the interoperability among content from different platforms.
In order to test the architecture, a new web Idea Management System and a
Technology Watch system have been also developed. The platforms incorporate semantic
ontologies and tools to enable interoperability. We also demonstrate how Semantic Technologies reduce human workload by contributing on the automatic classification of content in the Technology Watch process. Finally, conclusions have been gathered according to the results achieved testing the used technologies, identifying the ones with best results.Berrikuntza prozesu oso garrantzitsu bat da Enpresa Txiki eta Ertainen lehiakor
eta bizirik irauteko ingurumen lehiakor batean. Berrikuntza prozesuek ideiak eta ideien
kudeaketa dituzte oinarri gisa. Teknologiek eta lehiakideek nola eboluzionatzen duten jakitzea
ere garrantzitsua da enpresen berrikuntzarako, eta baita ere informazio hori kudeatzea. Beraz,
Informazio eta Komunikazio sistemen aplikazioan oinarritzen da tesi hau, zehazkiago Web
Semantika eta Teknologia Semantikoetan eta hauen aplikazioa Ideia Kudeaketa eta Zaintza
Teknologikoko sistemetan.
Berrikuntza eta Zaintza Teknologikoko plataformen kudeatzaileek arazo larriak
izaten dituzte jasotako datuekin eta haien kudeaketarekin. Kudeatzaile horiek plataforma
ezberdinetan banatutako informazio kantitate handi batekin topo egiten dute eta plataforma
horiek ez dira beti elkar eraginkorrak. Beraz, beharrezkoa da plataforma ezberdinetako datuak
elkarren artean partekatzea gero datu horiek “ezagutza” bihurtzeko. Gainera, kudeatzaileek
egiten dituzten zeregin kopuru handi bat zeregin ez emankorrak dira, denbora eta esfortzu
handia suposatzen dute baliozko ezer gehitu gabe. Eta ez hori bakarrik, berrikuntza prozesuko
kudeatzaileek zail izaten dute ideia lehiaketen arrakastaren arrazoiak identifikatzen.
Gure proposamena Informazio eta Komunikazio Teknologia ezberdinak frogatzea
da enpresen berrikuntzako eta zaintza teknologikoko prozesuetan laguntzeko. Honela, hainbat
teknologia semantiko eta web teknologia aztertu dira, modelo kontzeptual batzuk eraikitzen eta
probatzen benetako erabilpen kasutan lortutako emaitzak konprobatzeko.
Tesi honen lorpena plataformen arteko elkar eraginkortasuna ahalbidetzen duen eta
prozesuen kudeatzaileen lana errazten duen modelo baten sorpena izan da. Horrela eta
sortutako modeloa konplimentatzeko, bi ontologia sortu dira: (1) Gi2Mo- Wave eta (2) Mentions
Ontology. Alde batetik, Gi2Mo-Wave ontologia ideien eta ideia lehiaketen testuinguruaren
errepresentazio semantikoan oinarritu da. Horrela testuinguruaren analisia errazten da, ideia
lehiaketa arrakastatsuak errepikatzea ere errazagoa eginez. Bestalde, Mentions-Ontology
ontologia eduki ezberdinen (ideiak edo berriak adibidez) testuetan aipatutako elementuen
errepresentazio semantikoan oinarritu da. Horrela, Mentions Ontology ontologiak edukia elkar
konektatzeko era bat sortzen du, plataforma ezberdinen edukiaren arteko elkar eraginkortasuna
ahalbidetzen.
Modelo edo arkitektura hau frogatzeko, Ideia Kudeaketa Sistema eta Zaintza
teknologikoko web plataforma berri batzuk garatu dira ere. Plataforma hauek tresna eta
ontologia semantikoak dituzte txertatuta, beraien arteko elkar eraginkortasuna ahalbidetzeko.
Gainera, teknologia semantikoen aplikazioarekin giza lan kargaren murrizketa nola gauzatu ere
frogatzen dugu, Zaintza Teknologikoko edukiaren klasifikazio automatikoan ekarpenak eginez.
Bukatzeko, konklusioak bildu dira erabili diren teknologien frogetatik jasotako emaitzetan
oinarrituta eta emaitza onenak lortu dituztenak identifikatu dira.El proceso de Innovación es un proceso clave para la supervivencia y evolución
de las Pequeñas y Medianas Empresas en un entorno competitivo. Las ideas y la gestión de
ideas se consideran la base de la innovación. Recopilar datos sobre cómo evolucionan las
actuales tecnologías y los competidores es otro factor clave para la innovación de las
empresas. Por lo tanto, esta tesis se centra en la aplicación de Tecnologías de la Información y
Comunicación, más concretamente la aplicación de Web Semántica y Tecnologías Semánticas
en los Sistemas de Gestión de ideas y de Vigilancia Tecnológica.
Los gestores de las plataformas de innovación y de vigilancia tecnológica se enfrentan
a muchos problemas relacionados con los datos que recogen y gestionan. Esos gestores se
enfrentan a una gran cantidad de información distribuida en diferentes plataformas, no siempre
interoperables entre ellas. Es de vital importancia que las diferentes plataformas sean capaces
de compartir datos entre ellas, de modo que esos datos puedan convertirse en el conocimiento.
Muchas de las tareas realizadas por estos gestores son tareas no productivas y se invierte
demasiado tiempo y esfuerzo en realizarlas. Además, los responsables de los procesos
de innovación tienen dificultades para identificar por qué un concurso de ideas ha sido un éxito.
Nuestra propuesta es analizar diferentes Tecnologías de Información y Comunicación
que puedan ayudar a las empresas con sus procesos de Innovación y Vigilancia Tecnológica.
Por ello, hemos estudiado varias tecnologías semánticas y Web, hemos desarrollado algunos
modelos conceptuales y los hemos probado en diferentes casos de estudio para ver los
resultados obtenidos en escenarios reales.
El resultado de este trabajo ha sido la creación de una arquitectura que permite la
interoperabilidad entre plataformas y que facilita el trabajo de los responsables de los procesos.
En este marco, y para complementar la arquitectura, se han desarrollado dos ontologías:
(1) Gi2Mo Wave y (2) Mentions Ontology. Gi2Mo Wave se centra en la anotación del contexto de
los de ideas, ayudando en el análisis de los concursos y facilitando su replicación. Por otro
lado, Mentions Ontology se centra en la anotación de los elementos mencionados en el texto
plano de contenidos de diferente índole, como por ejemplo ideas o noticias. Así, Mentions
Ontology crea una forma de encontrar relaciones entre contenidos, lo que permite la
interoperabilidad entre los contenidos de diferentes plataformas.
Con el fin de probar la arquitectura, también se han desarrollado dos plataformas:
un Sistema de Gestión de Ideas y un Sistema de Vigilancia Tecnológica. Las plataformas
incorporan ontologías semánticas y herramientas para permitir su interoperabilidad. Además,
demostramos cómo reducir la carga de trabajo humana, mediante el uso de tecnologías
semánticas para la clasificación automática del contenido del proceso de la Vigilancia
Tecnológica. Por último, probando las tecnologías y herramientas se han recogido las
conclusiones de acuerdo con los resultados obtenidos, identificando las que obtienen los
mejores resultados
Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project
Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic
Neural Networks forBuilding Semantic Models and Knowledge Graphs
1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp
Deliverable D2.3 Specification of Web mining process for hypervideo concept identification
This deliverable presents a state-of-art and requirements analysis report for the web mining process as part of the WP2 of the LinkedTV project. The deliverable is divided into two subject areas: a) Named Entity Recognition (NER) and b) retrieval of additional content. The introduction gives an outline of the workflow of the work package, with a subsection devoted to relations with other work packages. The state-of-art review is focused on prospective techniques for LinkedTV. In the NER domain, the main focus is on knowledge-based approaches, which facilitate disambiguation of identified entities using linked open data. As part of the NER requirement analysis, the first tools developed are described and evaluated (NERD, SemiTags and THD). The area of linked additional content is broader and requires a more thorough analysis. A balanced overview of techniques for dealing with the various knowledge sources (semantic web resources, web APIs and completely unstructured resources from a white list of web sites) is presented. The requirements analysis comes out of the RBB and Sound and Vision LinkedTV scenarios
Information extraction and representation from free text reports Isha Saxena
The need for extracting specific information has increased drastically with
the boost in digital-born documents. These documents majorly comprise of
free text from which structured information can be extracted. The sources
include, customer review reports, patient records, financial and legal documents,
etc. The needs and applications for extracting specific information
from free text are growing every moment, and new researches are emerging
to mine contextual information in a way that is both highly efficient and
convenient in its usage.
This thesis work address to the problem of extracting specific information
from free text, specifically for the domains who lack labeled data. First
step in the development of an advanced information extraction system is
to extract and represent structured information from unstructured natural
language text. To accomplish this task, the thesis proposes a system for extracting
and tagging domain specific information, as domain related entities
/ concepts, and relational phrases. The approaches comprise of dictionary
matching for domain specific concept extraction, and rule based pattern
matching for relation extraction and tagging the free text accordingly. The
experiments were performed on Altice Labs’1 customer reports. The system
achieved over 80% recall and 90% precision for both concept and relation
extraction.
The proposed domain-specific concept extraction module was compared with
existing concept extraction platforms: Microsoft Concept Graph2 and DBpedia
Spotlight3. The proposed model yielded high performance results then
both the platforms; Sumário:
Extração e representação de informações de
relatórios de texto livre
A necessidade de extrair informações específicas aumentou drasticamente
com o aumento dos documentos de origem digital. Esses documentos consistem
principalmente de texto livre do qual informações estruturadas podem
ser extraídas. As fontes incluem relatórios de revisão de clientes, registos de
pacientes, documentos financeiros e jurídicos, etc. As necessidades e aplicações
para extrair informações específicas de texto livre estão crescendo a
cada momento e novas pesquisas estão surgindo para extrair informações
contextuais de uma forma altamente eficiente e conveniente em seu uso.
Este trabalho aborda o problema da extração de informações específicas em
texto livre, especificamente para os domínios que carecem de dados etiquetados.
O primeiro passo no desenvolvimento de um sistema avançado de
extração de informações é extrair e representar informações estruturadas de
um texto de linguagem natural não estruturado. Para cumprir essa tarefa,
a tese propõe um sistema para extrair e marcar informações específicas do
domínio, como entidades / conceitos relacionados ao domínio e frases relacionais.
As abordagens incluem correspondência de dicionário para extração
de conceitos específico de domínio e correspondência de padrão baseada em
regras para extração de relação e marcação de texto livre. As experiências
foram realizados nos relatórios de clientes 4 da Altice Labs. O sistema atingiu
mais de 80 % de recall e 90% de precisão para extração de conceito e relação.
O módulo de extração de conceito específico de domínio proposto foi comparado
com plataformas de extração de conceito existentes: Microsoft Concept
Graph 5 e DBpedia Spotlight 6. O modelo proposto rendeu resultados
de alto desempenho para ambas as plataformas
Predictive Analysis on Twitter: Techniques and Applications
Predictive analysis of social media data has attracted considerable attention
from the research community as well as the business world because of the
essential and actionable information it can provide. Over the years, extensive
experimentation and analysis for insights have been carried out using Twitter
data in various domains such as healthcare, public health, politics, social
sciences, and demographics. In this chapter, we discuss techniques, approaches
and state-of-the-art applications of predictive analysis of Twitter data.
Specifically, we present fine-grained analysis involving aspects such as
sentiment, emotion, and the use of domain knowledge in the coarse-grained
analysis of Twitter data for making decisions and taking actions, and relate a
few success stories
Taming web data : exploiting linked data for integrating medical educational content
Open data are playing a vital role in different communities, including governments, businesses, and education. This revolution has had a high impact on the education field. Recently, new practices are being adopted for publishing and connecting data on the web, known as "Linked Data", and these are used to expose and connect data which were not previously linked. In the context of education, applying Linked Data practices to the growing amount of open data used for learning is potentially highly beneficial. The work presented in this thesis tackles the challenges of data acquisition and integration from distributed web data sources into one linked dataset. The application domain of this thesis is medical education, and the focus is on bridging the gap between articles published in online educational libraries and content published on Web 2.0 platforms that can be used for education. The integration of a collection of heterogeneous resources is to create links between data collected from distributed web data sources. To address these challenges, a system is proposed that exploits the Linked Data for building a metadata schema in XML/RDF format for describing resources and enriching it with external dataset that adds semantic to its metadata. The proposed system collects resources from distributed data sources on the web and enriches their metadata with concepts from biomedical ontologies, such as SNOMED CT, that enable its linking. The final result of building this system is a linked dataset of more than 10,000 resources collected from PubMed Library, YouTube channels, and Blogging platforms. The effectiveness of the system proposed is evaluated by validating the content of the linked dataset when accessed and retrieved. Ontology-based techniques have been developed for browsing and querying the linked dataset resulting from the system proposed. Experiments have been conducted to simulate users' access to the linked dataset and validate its content. The results were promising and have shown the effectiveness of using SNOMED CT for integrating distributed resources from diverse web data sources
- …