26 research outputs found
Quality-Aware Data Source Management
Data is becoming a commodity of tremendous value in many domains. The ease of collecting and publishing data has led to an upsurge in the number of available data sources --- sources that are highly heterogeneous in the domains they cover, the quality of data they provide, and the fees they charge for accessing their data. However, most existing data integration approaches, for combining information from a collection of sources, focus on facilitating integration itself but are agnostic to the actual utility or the quality of the integration result. These approaches do not optimize for the trade-off between the utility and the cost of integration to determine which sources are worth integrating.
In this dissertation, I introduce a framework for quality-aware data source management. I define a collection of formal quality metrics for different types of data sources, including sources that provide both structured and unstructured data. I develop techniques to efficiently detect the content focus of a large number of diverse sources, to reason about their content changes over time and to formally compute the utility obtained when integrating subsets of them. I also design efficient algorithms with constant factor approximation guarantees for finding a set of sources that maximizes the utility of the integration result given a cost budget. Finally, I develop a prototype quality-aware data source management system and demonstrate the effectiveness of the developed techniques on real-world applications
EnergAt: Fine-Grained Energy Attribution for Multi-Tenancy
In the post-Moore's Law era, relying solely on hardware advancements for
automatic performance gains is no longer feasible without increased energy
consumption, due to the end of Dennard scaling. Consequently, computing
accounts for an increasing amount of global energy usage, contradicting the
objective of sustainable computing. The lack of hardware support and the
absence of a standardized, software-centric method for the precise tracing of
energy provenance exacerbates the issue. Aiming to overcome this challenge, we
argue that fine-grained software energy attribution is attainable, even with
limited hardware support. To support our position, we present a thread-level,
NUMA-aware energy attribution method for CPU and DRAM in multi-tenant
environments. The evaluation of our prototype implementation, EnergAt,
demonstrates the validity, effectiveness, and robustness of our theoretical
model, even in the presence of the noisy-neighbor effect. We envisage a
sustainable cloud environment and emphasize the importance of collective
efforts to improve software energy efficiency.Comment: 8 pages, 4 figures; Published in HotCarbon 2023; Artifact available
at https://github.com/HongyuHe/energa