903 research outputs found

    DARIAH and the Benelux

    Get PDF

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    GeoBIM for built environment condition assessment supporting asset management decision making

    Get PDF
    The digital transformation in management of the built environment is more and more evident. While the benefits of location data, from Building Information Modelling or Geographical Information Systems, have been explored separately, their combination - GeoBIM - in asset management has never been explored. Data collection for condition assessment is challenging due to quantity, types, frequency and quality of data. We first describe the opportunities and challenges of GeoBIM for condition assessment. The theoretical approach is then validated developing an integrated GeoBIM model of the digital built environment, for a neighbourhood in Milan, Italy. Data are collected, linked, processed and analysed, through multiple software platforms, providing relevant information for asset management decision making. Good results are achieved in rapid massive data collection, improved visualisation, and analysis. While further testing and development is required, the case study outcomes demonstrated the innovation and the mid-term service-oriented potential of the proposed approach

    Le nuage de point intelligent

    Full text link
    Discrete spatial datasets known as point clouds often lay the groundwork for decision-making applications. E.g., we can use such data as a reference for autonomous cars and robot’s navigation, as a layer for floor-plan’s creation and building’s construction, as a digital asset for environment modelling and incident prediction... Applications are numerous, and potentially increasing if we consider point clouds as digital reality assets. Yet, this expansion faces technical limitations mainly from the lack of semantic information within point ensembles. Connecting knowledge sources is still a very manual and time-consuming process suffering from error-prone human interpretation. This highlights a strong need for domain-related data analysis to create a coherent and structured information. The thesis clearly tries to solve automation problematics in point cloud processing to create intelligent environments, i.e. virtual copies that can be used/integrated in fully autonomous reasoning services. We tackle point cloud questions associated with knowledge extraction – particularly segmentation and classification – structuration, visualisation and interaction with cognitive decision systems. We propose to connect both point cloud properties and formalized knowledge to rapidly extract pertinent information using domain-centered graphs. The dissertation delivers the concept of a Smart Point Cloud (SPC) Infrastructure which serves as an interoperable and modular architecture for a unified processing. It permits an easy integration to existing workflows and a multi-domain specialization through device knowledge, analytic knowledge or domain knowledge. Concepts, algorithms, code and materials are given to replicate findings and extend current applications.Les ensembles discrets de données spatiales, appelés nuages de points, forment souvent le support principal pour des scénarios d’aide à la décision. Par exemple, nous pouvons utiliser ces données comme référence pour les voitures autonomes et la navigation des robots, comme couche pour la création de plans et la construction de bâtiments, comme actif numérique pour la modélisation de l'environnement et la prédiction d’incidents... Les applications sont nombreuses et potentiellement croissantes si l'on considère les nuages de points comme des actifs de réalité numérique. Cependant, cette expansion se heurte à des limites techniques dues principalement au manque d'information sémantique au sein des ensembles de points. La création de liens avec des sources de connaissances est encore un processus très manuel, chronophage et lié à une interprétation humaine sujette à l'erreur. Cela met en évidence la nécessité d'une analyse automatisée des données relatives au domaine étudié afin de créer une information cohérente et structurée. La thèse tente clairement de résoudre les problèmes d'automatisation dans le traitement des nuages de points pour créer des environnements intelligents, c'est-àdire des copies virtuelles qui peuvent être utilisées/intégrées dans des services de raisonnement totalement autonomes. Nous abordons plusieurs problématiques liées aux nuages de points et associées à l'extraction des connaissances - en particulier la segmentation et la classification - la structuration, la visualisation et l'interaction avec les systèmes cognitifs de décision. Nous proposons de relier à la fois les propriétés des nuages de points et les connaissances formalisées pour extraire rapidement les informations pertinentes à l'aide de graphes centrés sur le domaine. La dissertation propose le concept d'une infrastructure SPC (Smart Point Cloud) qui sert d'architecture interopérable et modulaire pour un traitement unifié. Elle permet une intégration facile aux flux de travail existants et une spécialisation multidomaine grâce aux connaissances liée aux capteurs, aux connaissances analytiques ou aux connaissances de domaine. Plusieurs concepts, algorithmes, codes et supports sont fournis pour reproduire les résultats et étendre les applications actuelles.Diskrete räumliche Datensätze, so genannte Punktwolken, bilden oft die Grundlage für Entscheidungsanwendungen. Beispielsweise können wir solche Daten als Referenz für autonome Autos und Roboternavigation, als Ebene für die Erstellung von Grundrissen und Gebäudekonstruktionen, als digitales Gut für die Umgebungsmodellierung und Ereignisprognose verwenden... Die Anwendungen sind zahlreich und nehmen potenziell zu, wenn wir Punktwolken als Digital Reality Assets betrachten. Allerdings stößt diese Erweiterung vor allem durch den Mangel an semantischen Informationen innerhalb von Punkt-Ensembles auf technische Grenzen. Die Verbindung von Wissensquellen ist immer noch ein sehr manueller und zeitaufwendiger Prozess, der unter fehleranfälliger menschlicher Interpretation leidet. Dies verdeutlicht den starken Bedarf an domänenbezogenen Datenanalysen, um eine kohärente und strukturierte Information zu schaffen. Die Arbeit versucht eindeutig, Automatisierungsprobleme in der Punktwolkenverarbeitung zu lösen, um intelligente Umgebungen zu schaffen, d.h. virtuelle Kopien, die in vollständig autonome Argumentationsdienste verwendet/integriert werden können. Wir befassen uns mit Punktwolkenfragen im Zusammenhang mit der Wissensextraktion - insbesondere Segmentierung und Klassifizierung - Strukturierung, Visualisierung und Interaktion mit kognitiven Entscheidungssystemen. Wir schlagen vor, sowohl Punktwolkeneigenschaften als auch formalisiertes Wissen zu verbinden, um schnell relevante Informationen mithilfe von domänenzentrierten Grafiken zu extrahieren. Die Dissertation liefert das Konzept einer Smart Point Cloud (SPC) Infrastruktur, die als interoperable und modulare Architektur für eine einheitliche Verarbeitung dient. Es ermöglicht eine einfache Integration in bestehende Workflows und eine multidimensionale Spezialisierung durch Gerätewissen, analytisches Wissen oder Domänenwissen. Konzepte, Algorithmen, Code und Materialien werden zur Verfügung gestellt, um Erkenntnisse zu replizieren und aktuelle Anwendungen zu erweitern

    SWKM 2008: Social Web and Knowledge Management, Proceedings:CEUR Workshop Proceedings

    Get PDF

    Digital Innovation: A Frugal Ecosystem Perspective

    Get PDF
    In this conceptual paper, we attempt to answer the question: How do firms develop frugal IT capabilities in a resource-constrained ecosystem? Frugal firms tend to successfully overcome severe infrastructure, financial, social, and technological constraints. Frugal IT Innovation” is a special case of frugal innovation where IT/IS play a pivotal, core role in enabling capabilities to overcome challenges of resource-constrained business environments. It is centered on development of products/services with a sharp focus on affordability, simplicity, and sustainability. Taking a digital ecodynamics perspective, we focus on the co-evolution of firm-level capabilities, the frugal ecosystem, and underlying IT systems to uncover how a dynamic, higher-order, frugal IT innovation capability (FITIC) drives firm performance. Due to unique ecosystem conditions, we measure firm performance by including social and environmental measures in addition to financial measures. The paper discusses ecosystem-wide implications and contributes to advancement of both theoretical and practice-based knowledge in this domain

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    Supporting the workflow of archaeo-related sciences by providing storage, sharing, analysis, and retrieval methods

    Get PDF
    The recovery and analysis of material culture is the main focus of archaeo-related work. The corpus of findings like rest of buildings, artifacts, human burial remains, or faunal remains is excavated, described, categorized, and analyzed in projects all over the world. A huge amount of archaeo-related data is the basis for many analyses. The results of analyzing collected data make us learn about the past. All disciplines of archaeo-related sciences deal with similar challenges. The workflow of the disciplines is similar, however there are still differences in the nature of the data. These circumstances result in questions how to store, share, retrieve, and analyze these heterogeneous and distributed data. The contribution of this thesis is to support archaeologists and bioarchaeologists in their work by providing methods following the archaeo-related workflow which is split in five main parts. Therefore, the first part of this thesis describes the xBook framework that has been developed to gather and store archaeological data. It allows creating several database applications to provide necessary features for the archaeo-related context. The second part deals with methods to share information, collaborate with colleagues, and retrieve distributed data of cohesive archaeological contexts to bring together archaeo-related data. The third part addresses a dynamic framework for data analyses which features a flexible and easy to be used tool to support archaeologists and bioarchaeologists executing analyses on their data without any programming skills and without the necessity to get familiar with external technologies. The fourth part introduces an interactive tool to compare the temporal position of archaeological findings in form of a Harris Matrix with their spatial position as 2D and 3D site plan sketches by using the introduced data retrieval methods. Finally, the fifth part specifies an architecture for an information system which allows distributed and interdisciplinary data to be searched by using dynamic joins of results from heterogeneous data formats. This novel way of information retrieval enables scientists to cross-connect archaeological information with domain-extrinsic knowledge. However, the concept of this information system is not limited to the archaeo-related context. Other sciences could also benefit from this architecture.Die Wiederherstellung und Analyse von materieller Kultur ist der Schwerpunkt archäologischer Arbeit. Das Material von Funden wie Gebäudereste, Artefakte, menschliche Überreste aus Bestattungen oder tierische Reste wird in Projekten auf der ganzen Welt ausgegraben, beschrieben, kategorisiert und analysiert. Die große Anzahl an archäologischen Daten bildet die Grundlage für viele Analysen. Die Ergebnisse der Auswertung der gesammelten Daten gibt uns Aufschluss über die Vergangenheit. Alle Disziplinen der archäologischen Wissenschaften setzen sich mit ähnlichen Herausforderungen auseinander. Der Arbeitsablauf ist in den einzelnen Disziplinen ähnlich, jedoch gibt es aufgrund der Art der Daten Unterschiede. Das führt zu Fragestellungen, wie heterogene und verteilte Daten erfasst, geteilt, abgerufen und analysiert werden können. Diese Dissertation beschäftigt sich mit der Unterstützung von Archäologen und Bioarchäologen bei ihrer Arbeit, indem unterstützende Methoden bereitgestellt werden, die dem archäologischen Arbeitsablauf , der in fünf Schritte unterteilt ist, folgt. Der erste Teil dieser Arbeit beschreibt das xBook Framework, welches entwickelt wurde, um archäologische Daten zu erfassen und zu speichern. Es ermöglicht die Erstellung zahlreicher Datenbankanwendungen, um notwendige Funktionen für den archäologischen Kontext bereitzustellen. Der zweite Teil beschäftigt sich mit der Zusammentragung von archäologischen Daten und setzt sich mit Methoden zum Teilen von Informationen, Methoden zur Zusammenarbeit zwischen Kollegen und Methoden zum Abruf von verteilten, aber zusammenhängenden archäologischen Daten auseinander. Der dritte Teil stellt ein dynamisches Framework für Datenanalysen vor, welches ein flexibles und leicht zu bedienendes Tool bereitstellt, das Archäologen und Bioarchäologen in der Ausführung von Analysen ihrer Daten unterstützt, so dass weder Programmierkenntnisse noch die Einarbeitung in externe Technologien benötigt werden. Der vierte Teil führt ein interaktives Tool ein, mit dem – unter Verwendung der zuvor beschriebenen Methoden zur Datenabfrage – die zeitliche Position von archäologischen Funden in Form einer Harris Matrix mit ihrer räumlichen Position als 2D- und 3D-Lageplan verglichen werden kann. Abschließend spezifiziert der fünfte Teil eine Architektur für ein Informationssystem, das die Durchsuchung von verteilten und interdisziplinären Daten durch dynamische Joins von Suchergebnissen aus heterogenen Datenformaten ermöglicht. Diese neue Art an Informationsabfrage erlaubt Wissenschaftlern eine Querverbindung von archäologischen Informationen mit fachfremdem Wissen. Das Konzept für dieses Informationssystem ist jedoch nicht auf den archäologischen Kontext begrenzt. Auch andere wissenschaftliche Bereiche können von dieser Architektur profitieren

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges
    • …
    corecore