133 research outputs found

    Medical Big Data Analysis in Hospital Information System

    Get PDF
    The rapidly increasing medical data generated from hospital information system (HIS) signifies the era of Big Data in the healthcare domain. These data hold great value to the workflow management, patient care and treatment, scientific research, and education in the healthcare industry. However, the complex, distributed, and highly interdisciplinary nature of medical data has underscored the limitations of traditional data analysis capabilities of data accessing, storage, processing, analyzing, distributing, and sharing. New and efficient technologies are becoming necessary to obtain the wealth of information and knowledge underlying medical Big Data. This chapter discusses medical Big Data analysis in HIS, including an introduction to the fundamental concepts, related platforms and technologies of medical Big Data processing, and advanced Big Data processing technologies

    Scalable and Declarative Information Extraction in a Parallel Data Analytics System

    Get PDF
    Informationsextraktions (IE) auf sehr großen Datenmengen erfordert hochkomplexe, skalierbare und anpassungsfähige Systeme. Obwohl zahlreiche IE-Algorithmen existieren, ist die nahtlose und erweiterbare Kombination dieser Werkzeuge in einem skalierbaren System immer noch eine große Herausforderung. In dieser Arbeit wird ein anfragebasiertes IE-System für eine parallelen Datenanalyseplattform vorgestellt, das für konkrete Anwendungsdomänen konfigurierbar ist und für Textsammlungen im Terabyte-Bereich skaliert. Zunächst werden konfigurierbare Operatoren für grundlegende IE- und Web-Analytics-Aufgaben definiert, mit denen komplexe IE-Aufgaben in Form von deklarativen Anfragen ausgedrückt werden können. Alle Operatoren werden hinsichtlich ihrer Eigenschaften charakterisiert um das Potenzial und die Bedeutung der Optimierung nicht-relationaler, benutzerdefinierter Operatoren (UDFs) für Data Flows hervorzuheben. Anschließend wird der Stand der Technik in der Optimierung nicht-relationaler Data Flows untersucht und herausgearbeitet, dass eine umfassende Optimierung von UDFs immer noch eine Herausforderung ist. Darauf aufbauend wird ein erweiterbarer, logischer Optimierer (SOFA) vorgestellt, der die Semantik von UDFs mit in die Optimierung mit einbezieht. SOFA analysiert eine kompakte Menge von Operator-Eigenschaften und kombiniert eine automatisierte Analyse mit manuellen UDF-Annotationen, um die umfassende Optimierung von Data Flows zu ermöglichen. SOFA ist in der Lage, beliebige Data Flows aus unterschiedlichen Anwendungsbereichen logisch zu optimieren, was zu erheblichen Laufzeitverbesserungen im Vergleich mit anderen Techniken führt. Als Viertes wird die Anwendbarkeit des vorgestellten Systems auf Korpora im Terabyte-Bereich untersucht und systematisch die Skalierbarkeit und Robustheit der eingesetzten Methoden und Werkzeuge beurteilt um schließlich die kritischsten Herausforderungen beim Aufbau eines IE-Systems für sehr große Datenmenge zu charakterisieren.Information extraction (IE) on very large data sets requires highly complex, scalable, and adaptive systems. Although numerous IE algorithms exist, their seamless and extensible combination in a scalable system still is a major challenge. This work presents a query-based IE system for a parallel data analysis platform, which is configurable for specific application domains and scales for terabyte-sized text collections. First, configurable operators are defined for basic IE and Web Analytics tasks, which can be used to express complex IE tasks in the form of declarative queries. All operators are characterized in terms of their properties to highlight the potential and importance of optimizing non-relational, user-defined operators (UDFs) for dataflows. Subsequently, we survey the state of the art in optimizing non-relational dataflows and highlight that a comprehensive optimization of UDFs is still a challenge. Based on this observation, an extensible, logical optimizer (SOFA) is introduced, which incorporates the semantics of UDFs into the optimization process. SOFA analyzes a compact set of operator properties and combines automated analysis with manual UDF annotations to enable a comprehensive optimization of data flows. SOFA is able to logically optimize arbitrary data flows from different application areas, resulting in significant runtime improvements compared to other techniques. Finally, the applicability of the presented system to terabyte-sized corpora is investigated. Hereby, we systematically evaluate scalability and robustness of the employed methods and tools in order to pinpoint the most critical challenges in building an IE system for very large data sets

    dstlr: Scalable Knowledge Graph Construction from Text Collections

    Get PDF
    In recent years, the amount of data being generated for consumption by enterprises has increased exponentially. Enterprises typically work with structured data, but oftentimes the data being generated is semi-structured or unstructured in nature. In particular, there exists a wealth of unstructured text data (customer reviews, social media posts, news articles, etc.) containing information that could provide value to an organization. As data from different sources often reside in silos, a number of questions arise: How do we integrate the structured and unstructured data? How can we curate and refine the data? Can we do this at scale? In this thesis, I present dstlr -- a platform for scalable knowledge graph construction from text collections. I show how assertions extracted from a collection of unstructured text documents can be used to form a knowledge graph, enabling integration of structured and unstructured data. Further, I show that linking to an existing knowledge graph enables rule-based data curation using the additional external information. I demonstrate this on a large collection of news articles, highlighting the horizontal scale-out of the system

    A Domain Specific Language for Digital Forensics and Incident Response Analysis

    Get PDF
    One of the longstanding conceptual problems in digital forensics is the dichotomy between the need for verifiable and reproducible forensic investigations, and the lack of practical mechanisms to accomplish them. With nearly four decades of professional digital forensic practice, investigator notes are still the primary source of reproducibility information, and much of it is tied to the functions of specific, often proprietary, tools. The lack of a formal means of specification for digital forensic operations results in three major problems. Specifically, there is a critical lack of: a) standardized and automated means to scientifically verify accuracy of digital forensic tools; b) methods to reliably reproduce forensic computations (their results); and c) framework for inter-operability among forensic tools. Additionally, there is no standardized means for communicating software requirements between users, researchers and developers, resulting in a mismatch in expectations. Combined with the exponential growth in data volume and complexity of applications and systems to be investigated, all of these concerns result in major case backlogs and inherently reduce the reliability of the digital forensic analyses. This work proposes a new approach to the specification of forensic computations, such that the above concerns can be addressed on a scientific basis with a new domain specific language (DSL) called nugget. DSLs are specialized languages that aim to address the concerns of particular domains by providing practical abstractions. Successful DSLs, such as SQL, can transform an application domain by providing a standardized way for users to communicate what they need without specifying how the computation should be performed. This is the first effort to build a DSL for (digital) forensic computations with the following research goals: 1) provide an intuitive formal specification language that covers core types of forensic computations and common data types; 2) provide a mechanism to extend the language that can incorporate arbitrary computations; 3) provide a prototype execution environment that allows the fully automatic execution of the computation; 4) provide a complete, formal, and auditable log of computations that can be used to reproduce an investigation; 5) demonstrate cloud-ready processing that can match the growth in data volumes and complexity

    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to present their current research, and to discuss topics with other students in order to look for synergies and common research topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big data management, training, contributing to glue disparate researchers working across different areas and provide a meeting ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in research topics such as sustainable software solutions (applications and system software stack), data management, energy efficiency, and resilience.European Cooperation in Science and Technology. COS

    Big Data in the construction industry: A review of present status, opportunities, and future trends

    Get PDF
    © 2016 Elsevier Ltd The ability to process large amounts of data and to extract useful insights from data has revolutionised society. This phenomenon—dubbed as Big Data—has applications for a wide assortment of industries, including the construction industry. The construction industry already deals with large volumes of heterogeneous data; which is expected to increase exponentially as technologies such as sensor networks and the Internet of Things are commoditised. In this paper, we present a detailed survey of the literature, investigating the application of Big Data techniques in the construction industry. We reviewed related works published in the databases of American Association of Civil Engineers (ASCE), Institute of Electrical and Electronics Engineers (IEEE), Association of Computing Machinery (ACM), and Elsevier Science Direct Digital Library. While the application of data analytics in the construction industry is not new, the adoption of Big Data technologies in this industry remains at a nascent stage and lags the broad uptake of these technologies in other fields. To the best of our knowledge, there is currently no comprehensive survey of Big Data techniques in the context of the construction industry. This paper fills the void and presents a wide-ranging interdisciplinary review of literature of fields such as statistics, data mining and warehousing, machine learning, and Big Data Analytics in the context of the construction industry. We discuss the current state of adoption of Big Data in the construction industry and discuss the future potential of such technologies across the multiple domain-specific sub-areas of the construction industry. We also propose open issues and directions for future work along with potential pitfalls associated with Big Data adoption in the industry

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    Interoperabilnost uslužnog računarstva pomoću aplikacijskih programskih sučelja

    Get PDF
    Cloud computing paradigm is accepted by an increasing number of organizations due to significant financial savings. On the other hand, there are some issues that hinder cloud adoption. One of the most important problems is the vendor lock-in and lack of interoperability as its outcome. The ability to move data and application from one cloud offer to another and to use resources of multiple clouds is very important for cloud consumers.The focus of this dissertation is on the interoperability of commercial providers of platform as a service. This cloud model was chosen due to many incompatibilities among vendors and lack of the existing solutions. The main aim of the dissertation is to identify and address interoperability issues of platform as a service. Automated data migration between different providers of platform as a service is also an objective of this study.The dissertation has the following main contributions: first, the detailed ontology of resources and remote API operations of providers of platform as a service was developed. This ontology was used to semantically annotate web services that connect to providers remote APIs and define mappings between PaaS providers. A tool that uses defined semantic web services and AI planning technique to detect and try to resolve found interoperability problems was developed. The automated migration of data between providers of platform as a service is presented. Finally, a methodology for the detection of platform interoperability problems was proposed and evaluated in use cases.Zbog mogućnosti financijskih ušteda, sve veći broj poslovnih organizacija razmatra korištenje ili već koristi uslužno računarstvo. Međutim, postoje i problemi koji otežavaju primjenu ove nove paradigme. Jedan od najznačajnih problema je zaključavanje korisnika od strane pružatelja usluge i nedostatak interoperabilnosti. Za korisnike je jako važna mogućnost migracije podataka i aplikacija s jednog oblaka na drugi, te korištenje resursa od više pružatelja usluga.Fokus ove disertacije je interoperabilnost komercijalnih pružatelja platforme kao usluge. Ovaj model uslužnog računarstva je odabran zbog nekompatibilnosti različitih pružatelja usluge i nepostojanja postojećih rješenja. Glavni cilj disertacije je identifikacija i rješavanje problema interoperabilnosti platforme kao usluge. Automatizirana migracija podataka između različitih pružatelja platforme kao usluge je također jedan od ciljeva ovog istraživanja.Znanstveni doprinos ove disertacije je sljedeći: Najprije je razvijena detaljna ontologija resursa i operacija iz aplikacijskih programskih sučelja pružatelja platforme kao usluge. Spomenuta ontologija se koristi za semantičko označavanje web servisa koji pozivaju udaljene operacije aplikacijskih programskih sučelja pružatelja usluga, a sama ontologija definira i mapiranja između pružatelja platforme kao usluge. Također je razvijen alat koji otkriva i pokušava riješiti probleme interoperabilnosti korištenjem semantičkih web servisa i tehnika AI planiranja. Prikazana je i arhitektura za automatiziranu migraciju podataka između različitih pružatelja platforme kao usluge. Na kraju je predložena metodologija za otkrivanje problema interoperabilnosti koja je evaluirana pomoću slučajeva korištenja
    corecore