112,499 research outputs found
Recommended from our members
Privacy-preserving model learning on a blockchain network-of-networks.
ObjectiveTo facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a "flattened" topology, while real-world research networks may consist of "network-of-networks" which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks.Materials and methodsWe propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology.ResultsHierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level.DiscussionHierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns.ConclusionWe demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction
On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research
Scientific research requires access, analysis, and sharing of data that is
distributed across various heterogeneous data sources at the scale of the
Internet. An eager ETL process constructs an integrated data repository as its
first step, integrating and loading data in its entirety from the data sources.
The bootstrapping of this process is not efficient for scientific research that
requires access to data from very large and typically numerous distributed data
sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy
ETL is faster in bootstrapping. However, queries on the integrated data
repository of eager ETL perform faster, due to the availability of the entire
data beforehand.
In this paper, we propose a novel ETL approach for scientific data
integration, as a hybrid of eager and lazy ETL approaches, and applied both to
data as well as metadata. This way, Hybrid ETL supports incremental integration
and loading of metadata and data from the data sources. We incorporate a
human-in-the-loop approach, to enhance the hybrid ETL, with selective data
integration driven by the user queries and sharing of integrated data between
users. We implement our hybrid ETL approach in a prototype platform, Obidos,
and evaluate it in the context of data sharing for medical research. Obidos
outperforms both the eager ETL and lazy ETL approaches, for scientific research
data integration and sharing, through its selective loading of data and
metadata, while storing the integrated data in a scalable integrated data
repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD
Journa
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
- …