3,691 research outputs found

    Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence

    Get PDF
    Linking named entities to structured knowledge sources paves the way for state-of-the-art Web intelligence applications which assign sentiment to the correct entities, identify trends, and reveal relations between organizations, persons and products. For this purpose this paper introduces Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories, and outlines the process of transforming heterogeneous data silos within an organization into a linked enterprise data repository which draws upon popular linked open data vocabularies to foster interoperability with public data sets. The presented examples use comprehensive real-world data sets from Orell Füssli Business Information, Switzerland's largest business information provider. The linked data repository created from these data sets comprises more than nine million triples on companies, the companies' contact information, key people, products and brands. We identify the major challenges of tapping into such sources for named entity linking, and describe required data pre-processing techniques to use and integrate such data sets, with a special focus on disambiguation and ranking algorithms. Finally, we conduct a comprehensive evaluation based on business news from the New Journal of Zurich and AWP Financial News to illustrate how these techniques improve the performance of the Recognyze named entity linking component

    A Preliminary Review on the Diffusion of Linked Data in the Enterprise

    Get PDF
    Artigo apresentado no I Workshop de Informação, Dados e Tecnologia, realizado entre nos dias 04 e 06 de setembro de 2017, na cidade de Florianópolis (SC), no Auditório do Espaço Físico Integrado (EFI) da Universidade Federal de Santa Catarina (UFSC).As empresas necessitam estar atentas as mudanças no cenário mundial, para que possam responder rapidamente as mudanças de mercado. Para isso, a Web contribuiu possibilitando dinamismo, impactando com isso em um aumento significativo no volume de dados, que podem ser explorados, com o intuito de se obter benefícios. O Linked Enterprise Data é apresentado como um formato de dados que pode auxiliar os gestores na tomada de decisão, oportunizando explorar dados internos e externos a empresa, porém este conceito não é muito explorado no âmbito empresarial. Por este motivo o presente estudo objetiva-se em discutir os fatores que impedem a difusão de Linked Data neste contexto. Deste modo, observa-se que a cultura organizacional e alinhamento estratégico, entre outros, são alguns dos fatores encontrado para que não ocorra a difusão do Linked Enterprise Data. O que permite concluir que Linked Enterprise Data está começando a chamar atenção das empresas, mas se faz necessário mais trabalhos que abordem as reais contribuições ao meio empresarial, para difundir o Linked Data neste contexto.Companies need to be aware of changes in the global landscape, so they can respond quickly to market changes. For this purpose, the Web has contributed to the dynamism, thus impacting on a significant increase in the volume of data that can be exploited in order to obtain benefits. Linked Enterprise Data is presented as a concept that can help managers in decision making, allowing the exploration of internal and external data to the enterprise, but this concept is not much explored in the business scope. For this reason, the present study aims at discussing the factors that prevent the diffusion of Linked Data in this context. In this way, it is observed that the organizational culture and strategic alignment, among others, are some of the factors found so that the dissemination of Linked Enterprise Data does not occur. This allows us to conclude that Linked Enterprise Data is beginning to attract attention from companies, but more work is needed to address the real contributions to the business environment, to disseminate Linked Data in this context

    Окружење за анализу и оцену квалитета великих и повезаних података

    Get PDF
    Linking and publishing data in the Linked Open Data format increases the interoperability and discoverability of resources over the Web. To accomplish this, the process comprises several design decisions, based on the Linked Data principles that, on one hand, recommend to use standards for the representation and the access to data on the Web, and on the other hand to set hyperlinks between data from different sources. Despite the efforts of the World Wide Web Consortium (W3C), being the main international standards organization for the World Wide Web, there is no one tailored formula for publishing data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a fundamental issue, and it is yet to be thoroughly managed and considered. In this doctoral thesis, the main objective is to design and implement a novel framework for selecting, analyzing, converting, interlinking, and publishing data from diverse sources, simultaneously paying great attention to quality assessment throughout all steps and modules of the framework. The goal is to examine whether and to what extent are the Semantic Web technologies applicable for merging data from different sources and enabling end-users to obtain additional information that was not available in individual datasets, in addition to the integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to validate the applicability of the process in the specific and demanding use case, i.e. for creating and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain that allows further integration and developing different business services on top of the integrated data sources. Through data representation in an open machine-readable format, the approach offers an optimum solution for information and data dissemination for building domain-specific applications, and to enrich and gain value from the original dataset. This thesis showcases how the pharmaceutical domain benefits from the evolving research trends for building competitive advantages. However, as it is elaborated in this thesis, a better understanding of the specifics of the Arabic language is required to extend linked data technologies utilization in targeted Arabic organizations.Повезивање и објављивање података у формату "Повезани отворени подаци" (енг. Linked Open Data) повећава интероперабилност и могућности за претраживање ресурса преко Web-а. Процес је заснован на Linked Data принципима (W3C, 2006) који са једне стране елаборира стандарде за представљање и приступ подацима на Wебу (RDF, OWL, SPARQL), а са друге стране, принципи сугеришу коришћење хипервеза између података из различитих извора. Упркос напорима W3C конзорцијума (W3C је главна међународна организација за стандарде за Web-у), не постоји јединствена формула за имплементацију процеса објављивање података у Linked Data формату. Узимајући у обзир да је квалитет објављених повезаних отворених података одлучујући за будући развој Web-а, у овој докторској дисертацији, главни циљ је (1) дизајн и имплементација иновативног оквира за избор, анализу, конверзију, међусобно повезивање и објављивање података из различитих извора и (2) анализа примена овог приступа у фармацeутском домену. Предложена докторска дисертација детаљно истражује питање квалитета великих и повезаних екосистема података (енг. Linked Data Ecosystems), узимајући у обзир могућност поновног коришћења отворених података. Рад је мотивисан потребом да се омогући истраживачима из арапских земаља да употребом семантичких веб технологија повежу своје податке са отвореним подацима, као нпр. DBpedia-јом. Циљ је да се испита да ли отворени подаци из Арапских земаља омогућавају крајњим корисницима да добију додатне информације које нису доступне у појединачним скуповима података, поред интеграције у семантички Wеб простор. Докторска дисертација предлаже методологију за развој апликације за рад са повезаним (Linked) подацима и имплементира софтверско решење које омогућује претраживање консолидованог скупа података о лековима из изабраних арапских земаља. Консолидовани скуп података је имплементиран у облику Семантичког језера података (енг. Semantic Data Lake). Ова теза показује како фармацеутска индустрија има користи од примене иновативних технологија и истраживачких трендова из области семантичких технологија. Међутим, како је елаборирано у овој тези, потребно је боље разумевање специфичности арапског језика за имплементацију Linked Data алата и њухову примену са подацима из Арапских земаља

    On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance

    Get PDF
    Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations. This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools. We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i)improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources

    Identifying and Consolidating Knowledge Engineering Requirements

    Full text link
    Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used because high-quality knowledge is assumed to be crucial for reliable intelligent agents. However, the landscape of knowledge engineering has changed, presenting four challenges: unaddressed stakeholder requirements, mismatched technologies, adoption barriers for new organizations, and misalignment with software engineering practices. In this paper, we propose to address these challenges by developing a reference architecture using a mainstream software methodology. By studying the requirements of different stakeholders and eras, we identify 23 essential quality attributes for evaluating reference architectures. We assess three candidate architectures from recent literature based on these attributes. Finally, we discuss the next steps towards a comprehensive reference architecture, including prioritizing quality attributes, integrating components with complementary strengths, and supporting missing socio-technical requirements. As this endeavor requires a collaborative effort, we invite all knowledge engineering researchers and practitioners to join us

    Semantic Systems and Visual Tools to Support Environmental Communication

    Get PDF
    Given the intense attention that environmental topics such as climate change attract in news and social media coverage, scientists and communication professionals want to know how different stakeholders perceive observable threats and policy options, how specific media channels react to new insights, and how journalists present scientific knowledge to the public. This paper investigates the potential of semantic technologies to address these questions. After summarizing methods to extract and disambiguate context information, we present visualization techniques to explore the lexical, geospatial, and relational context of topics and entities referenced in these repositories. The examples stem from the Media Watch on Climate Change, the Climate Resilience Toolkit and the NOAA Media Watch—three applications that aggregate environmental resources from a wide range of online sources. These systems not only show the value of providing comprehensive information to the public, but also have helped to develop a novel communication success metric that goes beyond bipolar assessments of sentiment

    Adapting data-driven research to the fields of social sciences and the humanities

    Get PDF
    Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a collaborative, interdisciplinary process for adapting data-driven research to research questions within other disciplines, which considers the methodological background required to obtain a significant impact on the target discipline and guides the systematic collection and formalization of domain knowledge, as well as the selection of appropriate data sources and methods for analyzing, visualizing, and interpreting the results. Finally, we present a case study that applies the described process to the domain of communication science by creating approaches that aid domain experts in locating, tracking, analyzing, and, finally, better understanding the dynamics of media criticism. The study clearly demonstrates the potential of the presented method, but also shows that data-driven research approaches require a tighter integration with the methodological framework of the target discipline to really provide a significant impact on the target discipline

    Semantic Model Alignment for Business Process Integration

    Get PDF
    Business process models describe an enterprise’s way of conducting business and in this form the basis for shaping the organization and engineering the appropriate supporting or even enabling IT. Thereby, a major task in working with models is their analysis and comparison for the purpose of aligning them. As models can differ semantically not only concerning the modeling languages used, but even more so in the way in which the natural language for labeling the model elements has been applied, the correct identification of the intended meaning of a legacy model is a non-trivial task that thus far has only been solved by humans. In particular at the time of reorganizations, the set-up of B2B-collaborations or mergers and acquisitions the semantic analysis of models of different origin that need to be consolidated is a manual effort that is not only tedious and error-prone but also time consuming and costly and often even repetitive. For facilitating automation of this task by means of IT, in this thesis the new method of Semantic Model Alignment is presented. Its application enables to extract and formalize the semantics of models for relating them based on the modeling language used and determining similarities based on the natural language used in model element labels. The resulting alignment supports model-based semantic business process integration. The research conducted is based on a design-science oriented approach and the method developed has been created together with all its enabling artifacts. These results have been published as the research progressed and are presented here in this thesis based on a selection of peer reviewed publications comprehensively describing the various aspects

    Propelling the Potential of Enterprise Linked Data in Austria. Roadmap and Report

    Get PDF
    In times of digital transformation and considering the potential of the data-driven economy, it is crucial that data is not only made available, data sources can be trusted, but also data integrity can be guaranteed, necessary privacy and security mechanisms are in place, and data and access comply with policies and legislation. In many cases, complex and interdisciplinary questions cannot be answered by a single dataset and thus it is necessary to combine data from multiple disparate sources. However, because most data today is locked up in isolated silos, data cannot be used to its fullest potential. The core challenge for most organisations and enterprises in regards to data exchange and integration is to be able to combine data from internal and external data sources in a manner that supports both day to day operations and innovation. Linked Data is a promising data publishing and integration paradigm that builds upon standard web technologies. It supports the publishing of structured data in a semantically explicit and interlinked manner such that it can be easily connected, and consequently becomes more interoperable and useful. The PROPEL project - Propelling the Potential of Enterprise Linked Data in Austria - surveyed technological challenges, entrepreneurial opportunities, and open research questions on the use of Linked Data in a business context and developed a roadmap and a set of recommendations for policy makers, industry, and the research community. Shifting away from a predominantly academic perspective and an exclusive focus on open data, the project looked at Linked Data as an emerging disruptive technology that enables efficient enterprise data management in the rising data economy. Current market forces provide many opportunities, but also present several data and information management challenges. Given that Linked Data enables advanced analytics and decision-making, it is particularly suitable for addressing today's data and information management challenges. In our research, we identified a variety of highly promising use cases for Linked Data in an enterprise context. Examples of promising application domains include "customization and customer relationship management", "automatic and dynamic content production, adaption and display", "data search, information retrieval and knowledge discovery", as well as "data and information exchange and integration". The analysis also revealed broad potential across a large spectrum of industries whose structural and technological characteristics align well with Linked Data characteristics and principles: energy, retail, finance and insurance, government, health, transport and logistics, telecommunications, media, tourism, engineering, and research and development rank among the most promising industries for the adoption of Linked Data principles. In addition to approaching the subject from an industry perspective, we also examined the topics and trends emerging from the research community in the field of Linked Data and the Semantic Web. Although our analysis revolved around a vibrant and active community composed of academia and leading companies involved in semantic technologies, we found that industry needs and research discussions are somewhat misaligned. Whereas some foundation technologies such as knowledge representation and data creation/publishing/sharing, data management and system engineering are highly represented in scientific papers, specific topics such as recommendations, or cross-topics such as machine learning or privacy and security are marginally present. Topics such as big/large data and the internet of things are (still) on an upward trajectory in terms of attention. In contrast, topics that are very relevant for industry such as application oriented topics or those that relate to security, privacy and robustness are not attracting much attention. When it comes to standardisation efforts, we identified a clear need for a more in-depth analysis into the effectiveness of existing standards, the degree of coverage they provide with respect the foundations they belong to, and the suitability of alternative standards that do not fall under the core Semantic Web umbrella. Taking into consideration market forces, sector analysis of Linked Data potential, demand side analysis and the current technological status it is clear that Linked Data has a lot of potential for enterprises and can act as a key driver of technological, organizational, and economic change. However, in order to ensure a solid foundation for Enterprise Linked Data include there is a need for: greater awareness surrounding the potential of Linked Data in enterprises, lowering of entrance barriers via education and training, better alignment between industry demands and research activities, greater support for technology transfer from universities to companies. The PROPEL roadmap recommends concrete measures in order to propel the adoption of Linked Data in Austrian enterprises. These measures are structured around five fields of activities: "awareness and education", "technological innovation, research gaps, standardisation", "policy and legal", and "funding". Key short-term recommendations include the clustering of existing activities in order to raise visibility on an international level, the funding of key topics that are under represented by the community, and the setup of joint projects. In the medium term, we recommend the strengthening of existing academic and private education efforts via certification and to establish flagship projects that are based on national use cases that can serve as blueprints for transnational initiatives. This requires not only financial support, but also infrastructure support, such as data and services to build solutions on top. In the long term, we recommend cooperation with international funding schemes to establish and foster a European level agenda, and the setup of centres of excellence
    corecore