72,628 research outputs found
Multimodal Federated Learning via Contrastive Representation Ensemble
With the increasing amount of multimedia data on modern mobile systems and
IoT infrastructures, harnessing these rich multimodal data without breaching
user privacy becomes a critical issue. Federated learning (FL) serves as a
privacy-conscious alternative to centralized machine learning. However,
existing FL methods extended to multimodal data all rely on model aggregation
on single modality level, which restrains the server and clients to have
identical model architecture for each modality. This limits the global model in
terms of both model complexity and data capacity, not to mention task
diversity. In this work, we propose Contrastive Representation Ensemble and
Aggregation for Multimodal FL (CreamFL), a multimodal federated learning
framework that enables training larger server models from clients with
heterogeneous model architectures and data modalities, while only communicating
knowledge on public dataset. To achieve better multimodal representation
fusion, we design a global-local cross-modal ensemble strategy to aggregate
client representations. To mitigate local model drift caused by two
unprecedented heterogeneous factors stemming from multimodal discrepancy
(modality gap and task gap), we further propose two inter-modal and intra-modal
contrasts to regularize local training, which complements information of the
absent modality for uni-modal clients and regularizes local clients to head
towards global consensus. Thorough evaluations and ablation studies on
image-text retrieval and visual question answering tasks showcase the
superiority of CreamFL over state-of-the-art FL methods and its practical
value.Comment: ICLR 2023. Code is available at https://github.com/FLAIR-THU/CreamF
Benefit-Cost Analysis for Transportation Planning and Public Policy: Towards Multimodal Demand Modeling
This report examines existing methods of benefit-cost analysis (BCA) in two areas, transportation policy and transportation planning, and suggests ways of modifying these methods to account for travel within a multimodal system. Although the planning and policy contexts differ substantially, this report shows how important multimodal impacts can be incorporated into both by using basic econometric techniques and even simpler rule-of-thumb methods. Case studies in transportation planning focus on the California Department of Transportation (Caltrans), but benchmark Californiaâs competencies by exploring methods used by other states and local governments. The report concludes with a list and discussion of recommendations for improving transportation planning models and methods. These will have immediate use to decision makers at Caltrans and other state DOTs as they consider directions for developing new planning capabilities. This project also identifies areas, and lays groundwork, for future research. Finally, by fitting the planning models into the broader context of transportation policy, this report will serve as a resource for students and others who wish to better understand BCA and its use in practice
GTmoPass: Two-factor Authentication on Public Displays Using Gaze-touch Passwords and Personal Mobile Devices
As public displays continue to deliver increasingly private and personalized content, there is a need to ensure that only the legitimate users can access private information in sensitive contexts. While public displays can adopt similar authentication concepts like those used on public terminals (e.g., ATMs), authentication in public is subject to a number of risks. Namely, adversaries can uncover a user's password through (1) shoulder surfing, (2) thermal attacks, or (3) smudge attacks. To address this problem we propose GTmoPass, an authentication architecture that enables Multi-factor user authentication on public displays. The first factor is a knowledge-factor: we employ a shoulder-surfing resilient multimodal scheme that combines gaze and touch input for password entry. The second factor is a possession-factor: users utilize their personal mobile devices, on which they enter the password. Credentials are securely transmitted to a server via Bluetooth beacons. We describe the implementation of GTmoPass and report on an evaluation of its usability and security, which shows that although authentication using GTmoPass is slightly slower than traditional methods, it protects against the three aforementioned threats
Conceptual Frameworks for Multimodal Social Signal Processing
This special issue is about a research area which is developing rapidly. Pentland gave it a name which has become widely used, âSocial Signal Processingâ (SSP for short), and his phrase provides the title of a European project, SSPnet, which has a brief to consolidate the area. The challenge that Pentland highlighted was understanding the nonlinguistic signals that serve as the basis for âsubconscious discussions between humans about relationships, resources, risks, and rewardsâ. He identified it as an area where computational research had made interesting progress, and could usefully make more
- âŠ