390 research outputs found
ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain
We present ARCHANGEL; a novel distributed ledger based system for assuring
the long-term integrity of digital video archives. First, we describe a novel
deep network architecture for computing compact temporal content hashes (TCHs)
from audio-visual streams with durations of minutes or hours. Our TCHs are
sensitive to accidental or malicious content modification (tampering) but
invariant to the codec used to encode the video. This is necessary due to the
curatorial requirement for archives to format shift video over time to ensure
future accessibility. Second, we describe how the TCHs (and the models used to
derive them) are secured via a proof-of-authority blockchain distributed across
multiple independent archives. We report on the efficacy of ARCHANGEL within
the context of a trial deployment in which the national government archives of
the United Kingdom, Estonia and Norway participated.Comment: Accepted to CVPR Blockchain Workshop 201
Incentivizing Private Data Sharing in Vehicular Networks: A Game-Theoretic Approach
In the context of evolving smart cities and autonomous transportation
systems, Vehicular Ad-hoc Networks (VANETs) and the Internet of Vehicles (IoV)
are growing in significance. Vehicles are becoming more than just a means of
transportation; they are collecting, processing, and transmitting massive
amounts of data to make driving safer and more convenient. However, this
advancement ushers in complex issues concerning the centralized structure of
traditional vehicular networks and the privacy and security concerns around
vehicular data. This paper offers a novel, game-theoretic network architecture
to address these challenges. Our approach decentralizes data collection through
distributed servers across the network, aggregating vehicular data into
spatio-temporal maps via secure multi-party computation (SMPC). This strategy
effectively reduces the chances of adversaries reconstructing a vehicle's
complete path, increasing privacy. We also introduce an economic model grounded
in game theory that incentivizes vehicle owners to participate in the network,
balancing the owners' privacy concerns with the monetary benefits of data
sharing. This model aims to maximize the data consumer's utility from the
gathered sensor data by determining the most suitable payment to participating
vehicles, the frequency in which these vehicles share their data, and the total
number of servers in the network. We explore the interdependencies among these
parameters and present our findings accordingly. To define meaningful utility
and loss functions for our study, we utilize a real dataset of vehicular
movement traces.Comment: To Appear in the Proceedings of The 2023 IEEE 98th Vehicular
Technology Conference (VTC2023-Fall), 6 pages, 5 figure
SMAP: A Novel Heterogeneous Information Framework for Scenario-based Optimal Model Assignment
The increasing maturity of big data applications has led to a proliferation
of models targeting the same objectives within the same scenarios and datasets.
However, selecting the most suitable model that considers model's features
while taking specific requirements and constraints into account still poses a
significant challenge. Existing methods have focused on worker-task assignments
based on crowdsourcing, they neglect the scenario-dataset-model assignment
problem. To address this challenge, a new problem named the Scenario-based
Optimal Model Assignment (SOMA) problem is introduced and a novel framework
entitled Scenario and Model Associative percepts (SMAP) is developed. SMAP is a
heterogeneous information framework that can integrate various types of
information to intelligently select a suitable dataset and allocate the optimal
model for a specific scenario. To comprehensively evaluate models, a new score
function that utilizes multi-head attention mechanisms is proposed. Moreover, a
novel memory mechanism named the mnemonic center is developed to store the
matched heterogeneous information and prevent duplicate matching. Six popular
traffic scenarios are selected as study cases and extensive experiments are
conducted on a dataset to verify the effectiveness and efficiency of SMAP and
the score function
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Intelligent Computing for Big Data
Recent advances in artificial intelligence have the potential to further develop current big data research. The Special Issue on ‘Intelligent Computing for Big Data’ highlighted a number of recent studies related to the use of intelligent computing techniques in the processing of big data for text mining, autism diagnosis, behaviour recognition, and blockchain-based storage
Secure Data Hiding for Contact Tracing
Contact tracing is an effective tool in controlling the spread of infectious
diseases such as COVID-19. It involves digital monitoring and recording of
physical proximity between people over time with a central and trusted
authority, so that when one user reports infection, it is possible to identify
all other users who have been in close proximity to that person during a
relevant time period in the past and alert them. One way to achieve this
involves recording on the server the locations, e.g. by reading and reporting
the GPS coordinates of a smartphone, of all users over time. Despite its
simplicity, privacy concerns have prevented widespread adoption of this method.
Technology that would enable the "hiding" of data could go a long way towards
alleviating privacy concerns and enable contact tracing at a very large scale.
In this article we describe a general method to hide data. By hiding, we mean
that instead of disclosing a data value x, we would disclose an "encoded"
version of x, namely E(x), where E(x) is easy to compute but very difficult,
from a computational point of view, to invert. We propose a general
construction of such a function E and show that it guarantees perfect recall,
namely, all individuals who have potentially been exposed to infection are
alerted, at the price of an infinitesimal number of false alarms, namely, only
a negligible number of individuals who have not actually been exposed will be
wrongly informed that they have
Distributed Spatial Data Sharing: a new era in sharing spatial data
The advancements in information and communications technology, including the widespread adoption of GPS-based sensors, improvements in computational data processing, and satellite imagery, have resulted in new data sources, stakeholders, and methods of producing, using, and sharing spatial data. Daily, vast amounts of data are produced by individuals interacting with digital content and through automated and semi-automated sensors deployed across the environment. A growing portion of this information contains geographic information directly or indirectly embedded within it. The widespread use of automated smart sensors and an increased variety of georeferenced media resulted in new individual data collectors. This raises a new set of social concerns around individual geopricacy and data ownership. These changes require new approaches to managing, sharing, and processing geographic data. With the appearance of distributed data-sharing technologies, some of these challenges may be addressed. This can be achieved by moving from centralized control and ownership of the data to a more distributed system. In such a system, the individuals are responsible for gathering and controlling access and storing data. Stepping into the new area of distributed spatial data sharing needs preparations, including developing tools and algorithms to work with spatial data in this new environment efficiently. Peer-to-peer (P2P) networks have become very popular for storing and sharing information in a decentralized approach. However, these networks lack the methods to process spatio-temporal queries. During the first chapter of this research, we propose a new spatio-temporal multi-level tree structure, Distributed Spatio-Temporal Tree (DSTree), which aims to address this problem. DSTree is capable of performing a range of spatio-temporal queries. We also propose a framework that uses blockchain to share a DSTree on the distributed network, and each user can replicate, query, or update it. Next, we proposed a dynamic k-anonymity algorithm to address geoprivacy concerns in distributed platforms. Individual dynamic control of geoprivacy is one of the primary purposes of the proposed framework introduced in this research. Sharing data within and between organizations can be enhanced by greater trust and transparency offered by distributed or decentralized technologies. Rather than depending on a central authority to manage geographic data, a decentralized framework would provide a fine-grained and transparent sharing capability. Users can also control the precision of shared spatial data with others. They are not limited to third-party algorithms to decide their privacy level and are also not limited to the binary levels of location sharing. As mentioned earlier, individuals and communities can benefit from distributed spatial data sharing. During the last chapter of this work, we develop an image-sharing platform, aka harvester safety application, for the Kakisa indigenous community in northern Canada. During this project, we investigate the potential of using a Distributed Spatial Data sharing (DSDS) infrastructure for small-scale data-sharing needs in indigenous communities. We explored the potential use case and challenges and proposed a DSDS architecture to allow users in small communities to share and query their data using DSDS. Looking at the current availability of distributed tools, the sustainable development of such applications needs accessible technology. We need easy-to-use tools to use distributed technologies on community-scale SDS. In conclusion, distributed technology is in its early stages and requires easy-to-use tools/methods and algorithms to handle, share and query geographic information. Once developed, it will be possible to contrast DSDS against other data systems and thereby evaluate the practical benefit of such systems. A distributed data-sharing platform needs a standard framework to share data between different entities. Just like the first decades of the appearance of the web, these tools need regulations and standards. Such can benefit individuals and small communities in the current chaotic spatial data-sharing environment controlled by the central bodies
Differential Privacy for Industrial Internet of Things: Opportunities, Applications and Challenges
The development of Internet of Things (IoT) brings new changes to various fields. Particularly, industrial Internet of Things (IIoT) is promoting a new round of industrial revolution. With more applications of IIoT, privacy protection issues are emerging. Specially, some common algorithms in IIoT technology such as deep models strongly rely on data collection, which leads to the risk of privacy disclosure. Recently, differential privacy has been used to protect user-terminal privacy in IIoT, so it is necessary to make in-depth research on this topic. In this paper, we conduct a comprehensive survey on the opportunities, applications and challenges of differential privacy in IIoT. We firstly review related papers on IIoT and privacy protection, respectively. Then we focus on the metrics of industrial data privacy, and analyze the contradiction between data utilization for deep models and individual privacy protection. Several valuable problems are summarized and new research ideas are put forward. In conclusion, this survey is dedicated to complete comprehensive summary and lay foundation for the follow-up researches on industrial differential privacy
- …