1,072,174 research outputs found
Using Ontologies for Semantic Data Integration
While big data analytics is considered as one of the most important paths to competitive advantage of today’s enterprises, data scientists spend a comparatively large amount of time in the data preparation and data integration phase of a big data project. This shows that data integration is still a major challenge in IT applications. Over the past two decades, the idea of using semantics for data integration has become increasingly crucial, and has received much attention in the AI, database, web, and data mining communities. Here, we focus on a specific paradigm for semantic data integration, called Ontology-Based Data Access (OBDA). The goal of this paper is to provide an overview of OBDA, pointing out both the techniques that are at the basis of the paradigm, and the main challenges that remain to be addressed
Challenges of Internet of Things and Big Data Integration
The Internet of Things anticipates the conjunction of physical gadgets to the
In-ternet and their access to wireless sensor data which makes it expedient to
restrain the physical world. Big Data convergence has put multifarious new
opportunities ahead of business ventures to get into a new market or enhance
their operations in the current market. considering the existing techniques and
technologies, it is probably safe to say that the best solution is to use big
data tools to provide an analytical solution to the Internet of Things. Based
on the current technology deployment and adoption trends, it is envisioned that
the Internet of Things is the technology of the future, while to-day's
real-world devices can provide real and valuable analytics, and people in the
real world use many IoT devices. Despite all the advertisements that companies
offer in connection with the Internet of Things, you as a liable consumer, have
the right to be suspicious about IoT advertise-ments. The primary question is:
What is the promise of the Internet of things con-cerning reality and what are
the prospects for the future.Comment: Proceedings of the International Conference on International
Conference on Emerging Technologies in Computing 2018 (iCETiC '18), 23rd
-24th August, 2018, at London Metropolitan University, London, UK, Published
by Springer-Verla
Integration of survey data and big observational data for finite population inference using mass imputation
Multiple data sources are becoming increasingly available for statistical
analyses in the era of big data. As an important example in finite-population
inference, we consider an imputation approach to combining a probability sample
with big observational data. Unlike the usual imputation for missing data
analysis, we create imputed values for the whole elements in the probability
sample. Such mass imputation is attractive in the context of survey data
integration (Kim and Rao, 2012). We extend mass imputation as a tool for data
integration of survey data and big non-survey data. The mass imputation methods
and their statistical properties are presented. The matching estimator of
Rivers (2007) is also covered as a special case. Variance estimation with
mass-imputed data is discussed. The simulation results demonstrate the proposed
estimators outperform existing competitors in terms of robustness and
efficiency
Distributed service orchestration : eventually consistent cloud operation and integration
Both researchers and industry players are facing the same obstacles when entering the big data field. Deploying and testing distributed data technologies requires a big up-front investment of both time and knowledge. Existing cloud automation solutions are not well suited for managing complex distributed data solutions. This paper proposes a distributed service orchestration architecture to better handle the complex orchestration logic needed in these cases. A novel service-engine based approach is proposed to cope with the versatility of the individual components. A hybrid integration approach bridges the gap between cloud modeling languages, automation artifacts, image-based schedulers and PaaS solutions. This approach is integrated in the distributed data experimentation platform Tengu, making it more flexible and robust
Setting the stage for data science: integration of data management skills in introductory and second courses in statistics
Many have argued that statistics students need additional facility to express
statistical computations. By introducing students to commonplace tools for data
management, visualization, and reproducible analysis in data science and
applying these to real-world scenarios, we prepare them to think statistically.
In an era of increasingly big data, it is imperative that students develop
data-related capacities, beginning with the introductory course. We believe
that the integration of these precursors to data science into our
curricula-early and often-will help statisticians be part of the dialogue
regarding "Big Data" and "Big Questions"
- …
