136 research outputs found
Trust-based algorithms for fusing crowdsourced estimates of continuous quantities
Crowdsourcing has provided a viable way of gathering information at unprecedented volumes and speed by engaging individuals to perform simple microātasks. In particular, the crowdsourcing paradigm has been successfully applied to participatory sensing, in which the users perform sensing tasks and provide data using their mobile devices. In this way, people can help solve complex environmental sensing tasks, such as weather monitoring, nuclear radiation monitoring and cell tower mapping, in a highly decentralised and parallelised fashion. Traditionally, crowdsourcing technologies were primarily used for gathering data for classifications and image labelling tasks. In contrast, such crowdābased participatory sensing poses new challenges that relate to (i) dealing with humanāreported sensor data that are available in the form of continuous estimates of an observed quantity such as a location, a temperature or a sound reading, (ii) dealing with possible spatial and temporal correlations within the data and (ii) issues of data trustworthiness due to the unknown capabilities and incentives of the participants and their devices. Solutions to these challenges need to be able to combine the data provided by multiple users to ensure the accuracy and the validity of the aggregated results. With this in mind, our goal is to provide methods to better aid the aggregation process of crowdāreported sensor estimates of continuous quantities when data are provided by individuals of varying trustworthiness. To achieve this, we develop a trustābased in- formation fusion framework that incorporates latent trustworthiness traits of the users within the data fusion process. Through this framework, we develop a set of four novel algorithms (MaxTrust, BACE, TrustGP and TrustLGCP) to compute reliable aggregations of the usersā reports in both the settings of observing a stationary quantity (Max- Trust and BACE) and a spatially distributed phenomenon (TrustGP and TrustLGCP). The key feature of all these algorithm is the ability of (i) learning the trustworthiness of each individual who provide the data and (ii) exploit this latent userās trustworthiness information to compute a more accurate fused estimate. In particular, this is achieved by using a probabilistic framework that allows our methods to simultaneously learn the fused estimate and the usersā trustworthiness from the crowd reports. We validate our algorithms in four key application areas (cell tower mapping, WiFi networks mapping, nuclear radiation monitoring and disaster response) that demonstrate the practical impact of our framework to achieve substantially more accurate and informative predictions compared to the existing fusion methods. We expect that results of this thesis will allow to build more reliable data fusion algorithms for the broad class of humanācentred information systems (e.g., recommendation systems, peer reviewing systems, student grading tools) that are based on making decisions upon subjective opinions provided by their users
Bayesian modelling of community-based multidimensional trust in participatory sensing under data sparsity
We propose a new Bayesian model for reliable aggregation of crowdsourced estimates of real-valued quantities in participatory sensing applications. Existing approaches focus on probabilistic modelling of userās reliability as the key to accurate aggregation. However, these are either limited to estimating discrete quantities, or require a significant number of reports from each user to accurately model their reliability. To mitigate these issues, we adopt a community-based approach, which reduces the data required to reliably aggregate real-valued estimates, by leveraging correlations between the re- porting behaviour of users belonging to different communities. As a result, our method is up to 16.6% more accurate than existing state-of-the-art methods and is up to 49% more effective under data sparsity when used to estimate Wi-Fi hotspot locations in a real-world crowdsourcing application
Earth Observation Open Science and Innovation
geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc
Intelligent Data Analytics using Deep Learning for Data Science
Nowadays, data science stimulates the interest of academics and practitioners because it can assist in the extraction of significant insights from massive amounts of data. From the years 2018 through 2025, the Global Datasphere is expected to rise from 33 Zettabytes to 175 Zettabytes, according to the International Data Corporation. This dissertation proposes an intelligent data analytics framework that uses deep learning to tackle several difficulties when implementing a data science application. These difficulties include dealing with high inter-class similarity, the availability and quality of hand-labeled data, and designing a feasible approach for modeling significant correlations in features gathered from various data sources. The proposed intelligent data analytics framework employs a novel strategy for improving data representation learning by incorporating supplemental data from various sources and structures. First, the research presents a multi-source fusion approach that utilizes confident learning techniques to improve the data quality from many noisy sources. Meta-learning methods based on advanced techniques such as the mixture of experts and differential evolution combine the predictive capacity of individual learners with a gating mechanism, ensuring that only the most trustworthy features or predictions are integrated to train the model. Then, a Multi-Level Convolutional Fusion is presented to train a model on the correspondence between local-global deep feature interactions to identify easily confused samples of different classes. The convolutional fusion is further enhanced with the power of Graph Transformers, aggregating the relevant neighboring features in graph-based input data structures and achieving state-of-the-art performance on a large-scale building damage dataset. Finally, weakly-supervised strategies, noise regularization, and label propagation are proposed to train a model on sparse input labeled data, ensuring the model\u27s robustness to errors and supporting the automatic expansion of the training set. The suggested approaches outperformed competing strategies in effectively training a model on a large-scale dataset of 500k photos, with just about 7% of the images annotated by a human. The proposed framework\u27s capabilities have benefited various data science applications, including fluid dynamics, geometric morphometrics, building damage classification from satellite pictures, disaster scene description, and storm-surge visualization
Big Earth Data and Machine Learning for Sustainable and Resilient Agriculture
Big streams of Earth images from satellites or other platforms (e.g., drones
and mobile phones) are becoming increasingly available at low or no cost and
with enhanced spatial and temporal resolution. This thesis recognizes the
unprecedented opportunities offered by the high quality and open access Earth
observation data of our times and introduces novel machine learning and big
data methods to properly exploit them towards developing applications for
sustainable and resilient agriculture. The thesis addresses three distinct
thematic areas, i.e., the monitoring of the Common Agricultural Policy (CAP),
the monitoring of food security and applications for smart and resilient
agriculture. The methodological innovations of the developments related to the
three thematic areas address the following issues: i) the processing of big
Earth Observation (EO) data, ii) the scarcity of annotated data for machine
learning model training and iii) the gap between machine learning outputs and
actionable advice.
This thesis demonstrated how big data technologies such as data cubes,
distributed learning, linked open data and semantic enrichment can be used to
exploit the data deluge and extract knowledge to address real user needs.
Furthermore, this thesis argues for the importance of semi-supervised and
unsupervised machine learning models that circumvent the ever-present challenge
of scarce annotations and thus allow for model generalization in space and
time. Specifically, it is shown how merely few ground truth data are needed to
generate high quality crop type maps and crop phenology estimations. Finally,
this thesis argues there is considerable distance in value between model
inferences and decision making in real-world scenarios and thereby showcases
the power of causal and interpretable machine learning in bridging this gap.Comment: Phd thesi
Modeling User Transportation Patterns Using Mobile Devices
Participatory sensing frameworks use humans and their computing devices as a large mobile sensing network. Dramatic accessibility and affordability have turned mobile devices (smartphone and tablet computers) into the most popular computational machines in the world, exceeding laptops. By the end of 2013, more than 1.5 billion people on earth will have a smartphone. Increased coverage and higher speeds of cellular networks have given these devices the power to constantly stream large amounts of data. Most mobile devices are equipped with advanced sensors such as GPS, cameras, and microphones. This expansion of smartphone numbers and power has created a sensing system capable of achieving tasks practically impossible for conventional sensing platforms. One of the advantages of participatory sensing platforms is their mobility, since human users are often in motion. This dissertation presents a set of techniques for modeling and predicting user transportation patterns from cell-phone and social media check-ins. To study large-scale transportation patterns, I created a mobile phone app, Kpark, for estimating parking lot occupancy on the UCF campus. Kpark aggregates individual user reports on parking space availability to produce a global picture across all the campus lots using crowdsourcing. An issue with crowdsourcing is the possibility of receiving inaccurate information from users, either through error or malicious motivations. One method of combating this problem is to model the trustworthiness of individual participants to use that information to selectively include or discard data. This dissertation presents a comprehensive study of the performance of different worker quality and data fusion models with plausible simulated user populations, as well as an evaluation of their performance on the real data obtained from a full release of the Kpark app on the UCF Orlando campus. To evaluate individual trust prediction methods, an algorithm selection portfolio was introduced to take advantage of the strengths of each method and maximize the overall prediction performance. Like many other crowdsourced applications, user incentivization is an important aspect of creating a successful crowdsourcing workflow. For this project a form of non-monetized incentivization called gamification was used in order to create competition among users with the aim of increasing the quantity and quality of data submitted to the project. This dissertation reports on the performance of Kpark at predicting parking occupancy, increasing user app usage, and predicting worker quality
Recommended from our members
Perceptual quality assessment of real-world images and videos
The development of online social-media venues and rapid advances in technology by camera and mobile device manufacturers have led to the creation and consumption of a seemingly limitless supply of visual content. However, a vast majority of these digital images and videos are often afflicted with annoying artifacts during acquisition, subsequent storage, and transmission over the network. All these factors impact the quality of the visual media as perceived by a human observer, thereby compromising their quality of experience (QoE).
This dissertation focuses on constructing datasets that are representative of real-world image and video distortions as well as on designing algorithms that accurately predict the perceptual quality of images and videos. The primary goal of this research is to design and demonstrate automatic image and continuous-time video quality predictors that can effectively tackle the widely diverse authentic spatial, temporal, and network-induced distortions -- contrary to all present-day algorithms that operate on single, synthetic visual distortions and predict a single overall quality score for a given video.
I introduce an image quality database which contains a large number of images captured using a representative variety of modern mobile devices and afflicted with a widely diverse authentic image distortions. I will also describe the design of an online crowdsourcing system which aided a very large-scale image quality assessment subjective study. This data collection facilitated the design of a new image quality predictor that is founded on the principles of natural scene statistics of images in different color spaces and transform domains. This new quality method is capable of assessing the quality of images with complex mixtures of distortions and yields high correlation with human perception.
Pertaining to videos, this dissertation describes a video quality database created to understand the impact of network-induced distortions on an end user's quality of experience. I present the details of a large-scale subjective study that I conducted to gather continuous-time ground truth QoE scores on a collection of 180 videos afflicted with diverse stalling events. I also present my analysis of the temporal variations in the perceived QoE due to the time-varying video quality and present insights on the impact of relevant human cognitive aspects such as long-term and short-term memory and recency on quality perception. Next, I present a continuous-time objective QoE predicting model that effectively captures the complex interactions between the aforementioned human cognitive elements, spatial and temporal distortions, properties of stalling events, and models the state of any given client-side network buffer. I also show how the proposed framework can be extended by further supplementing with any number of additional inputs (or by eliminating any ineffective ones), based on the information available at the content providers during the design of adaptive stream-switching algorithms. This QoE predictor supports future research in the design of quality-aware stream-switching algorithms which could control the position, location, and length of stalls, given a network bandwidth budget and the end user's device information, such that the end user's QoE is maximized.Computer Science
- ā¦