513 research outputs found
Profiling relational data: a survey
Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases
Quality of Service and Optimization in Data Integration Systems
This work presents techniques for the construction of a global data integrations system. Similar to distributed databases this system allows declarative queries in order to express user-specific information needs. Scalability towards global data integration systems and openness were major design goals for the architecture and techniques developed in this work. It is shown how service composition, extensibility and quality of service can be supported in an open system of providers for data, functionality for query processing operations, and computing power.Diese Arbeit präsentiert Techniken für den Aufbau eines globalen Datenintegrationssystems. Analog zu verteilten Datenbanken unterstützt dieses System deklarative Anfragen, mit denen Benutzer die gesuchte Information beschreiben können. Die Skalierbarkeit in einem globalen Kontext und die Offenheit waren hauptsächliche Entwicklungsziele der Architektur und der Techniken, die in dieser Arbeit entstanden sind. Es wird gezeigt wie Dienstekomposition, Erweiterbarkeit und Dienstgüte in einem offenen System von Anbietern für Daten, Anfrageverarbeitungsfunktionalität und Rechenleistung unterstützt werden können
Cost-Based Optimization of Integration Flows
Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation
Privacidade em comunicações de dados para ambientes contextualizados
Doutoramento em InformáticaInternet users consume online targeted advertising based on information collected
about them and voluntarily share personal information in social networks.
Sensor information and data from smart-phones is collected and used
by applications, sometimes in unclear ways. As it happens today with smartphones,
in the near future sensors will be shipped in all types of connected
devices, enabling ubiquitous information gathering from the physical environment,
enabling the vision of Ambient Intelligence. The value of gathered data,
if not obvious, can be harnessed through data mining techniques and put to
use by enabling personalized and tailored services as well as business intelligence
practices, fueling the digital economy.
However, the ever-expanding information gathering and use undermines the
privacy conceptions of the past. Natural social practices of managing privacy
in daily relations are overridden by socially-awkward communication tools, service
providers struggle with security issues resulting in harmful data leaks,
governments use mass surveillance techniques, the incentives of the digital
economy threaten consumer privacy, and the advancement of consumergrade
data-gathering technology enables new inter-personal abuses.
A wide range of fields attempts to address technology-related privacy problems,
however they vary immensely in terms of assumptions, scope and approach.
Privacy of future use cases is typically handled vertically, instead
of building upon previous work that can be re-contextualized, while current
privacy problems are typically addressed per type in a more focused way.
Because significant effort was required to make sense of the relations and
structure of privacy-related work, this thesis attempts to transmit a structured
view of it. It is multi-disciplinary - from cryptography to economics, including
distributed systems and information theory - and addresses privacy issues of
different natures.
As existing work is framed and discussed, the contributions to the state-of-theart
done in the scope of this thesis are presented. The contributions add to
five distinct areas: 1) identity in distributed systems; 2) future context-aware
services; 3) event-based context management; 4) low-latency information flow
control; 5) high-dimensional dataset anonymity. Finally, having laid out such
landscape of the privacy-preserving work, the current and future privacy challenges
are discussed, considering not only technical but also socio-economic
perspectives.Quem usa a Internet vê publicidade direccionada com base nos seus hábitos
de navegação, e provavelmente partilha voluntariamente informação pessoal
em redes sociais. A informação disponível nos novos telemóveis é amplamente
acedida e utilizada por aplicações móveis, por vezes sem razões claras
para isso. Tal como acontece hoje com os telemóveis, no futuro muitos tipos
de dispositivos elecónicos incluirão sensores que permitirão captar dados do
ambiente, possibilitando o surgimento de ambientes inteligentes. O valor dos
dados captados, se não for óbvio, pode ser derivado através de técnicas de
análise de dados e usado para fornecer serviços personalizados e definir estratégias
de negócio, fomentando a economia digital.
No entanto estas práticas de recolha de informação criam novas questões de
privacidade. As práticas naturais de relações inter-pessoais são dificultadas
por novos meios de comunicação que não as contemplam, os problemas de
segurança de informação sucedem-se, os estados vigiam os seus cidadãos,
a economia digital leva á monitorização dos consumidores, e as capacidades
de captação e gravação dos novos dispositivos eletrónicos podem ser usadas
abusivamente pelos próprios utilizadores contra outras pessoas.
Um grande número de áreas científicas focam problemas de privacidade relacionados
com tecnologia, no entanto fazem-no de maneiras diferentes e
assumindo pontos de partida distintos. A privacidade de novos cenários é
tipicamente tratada verticalmente, em vez de re-contextualizar trabalho existente,
enquanto os problemas actuais são tratados de uma forma mais focada.
Devido a este fraccionamento no trabalho existente, um exercício muito relevante
foi a sua estruturação no âmbito desta tese. O trabalho identificado é
multi-disciplinar - da criptografia à economia, incluindo sistemas distribuídos
e teoria da informação - e trata de problemas de privacidade de naturezas
diferentes.
À medida que o trabalho existente é apresentado, as contribuições feitas por
esta tese são discutidas. Estas enquadram-se em cinco áreas distintas: 1)
identidade em sistemas distribuídos; 2) serviços contextualizados; 3) gestão
orientada a eventos de informação de contexto; 4) controlo de fluxo de
informação com latência baixa; 5) bases de dados de recomendação anónimas.
Tendo descrito o trabalho existente em privacidade, os desafios actuais
e futuros da privacidade são discutidos considerando também perspectivas
socio-económicas
Bio-signal data gathering, management and analysis within a patient-centred health care context
The healthcare service is under pressure to do more with less, and changing the way the service is modelled could be the key to saving resources and increasing efficacy. This change could be possible using patient-centric care models. This model would include straightforward and easy-to-use telemonitoring devices and a flexible data management structure. The structure would maintain its state by ingesting many sources of data, then tracking this data through cleaning and processing into models and estimates to obtaining values from data which could be used by the patient. The system can become less disease-focused and more health-focused by being preventative in nature and allowing patients to be more proactive and involved in their care by automating the data management. This work presents the development of a new device and a data management and analysis system to utilise the data from this device and support data processing along with two examples of its use. These are signal quality and blood pressure estimation. This system could aid in the creation of patient-centric telecare systems
- …