72,628 research outputs found

    Multimodal Federated Learning via Contrastive Representation Ensemble

    Full text link
    With the increasing amount of multimedia data on modern mobile systems and IoT infrastructures, harnessing these rich multimodal data without breaching user privacy becomes a critical issue. Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning. However, existing FL methods extended to multimodal data all rely on model aggregation on single modality level, which restrains the server and clients to have identical model architecture for each modality. This limits the global model in terms of both model complexity and data capacity, not to mention task diversity. In this work, we propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL), a multimodal federated learning framework that enables training larger server models from clients with heterogeneous model architectures and data modalities, while only communicating knowledge on public dataset. To achieve better multimodal representation fusion, we design a global-local cross-modal ensemble strategy to aggregate client representations. To mitigate local model drift caused by two unprecedented heterogeneous factors stemming from multimodal discrepancy (modality gap and task gap), we further propose two inter-modal and intra-modal contrasts to regularize local training, which complements information of the absent modality for uni-modal clients and regularizes local clients to head towards global consensus. Thorough evaluations and ablation studies on image-text retrieval and visual question answering tasks showcase the superiority of CreamFL over state-of-the-art FL methods and its practical value.Comment: ICLR 2023. Code is available at https://github.com/FLAIR-THU/CreamF

    Benefit-Cost Analysis for Transportation Planning and Public Policy: Towards Multimodal Demand Modeling

    Get PDF
    This report examines existing methods of benefit-cost analysis (BCA) in two areas, transportation policy and transportation planning, and suggests ways of modifying these methods to account for travel within a multimodal system. Although the planning and policy contexts differ substantially, this report shows how important multimodal impacts can be incorporated into both by using basic econometric techniques and even simpler rule-of-thumb methods. Case studies in transportation planning focus on the California Department of Transportation (Caltrans), but benchmark California’s competencies by exploring methods used by other states and local governments. The report concludes with a list and discussion of recommendations for improving transportation planning models and methods. These will have immediate use to decision makers at Caltrans and other state DOTs as they consider directions for developing new planning capabilities. This project also identifies areas, and lays groundwork, for future research. Finally, by fitting the planning models into the broader context of transportation policy, this report will serve as a resource for students and others who wish to better understand BCA and its use in practice

    GTmoPass: Two-factor Authentication on Public Displays Using Gaze-touch Passwords and Personal Mobile Devices

    Get PDF
    As public displays continue to deliver increasingly private and personalized content, there is a need to ensure that only the legitimate users can access private information in sensitive contexts. While public displays can adopt similar authentication concepts like those used on public terminals (e.g., ATMs), authentication in public is subject to a number of risks. Namely, adversaries can uncover a user's password through (1) shoulder surfing, (2) thermal attacks, or (3) smudge attacks. To address this problem we propose GTmoPass, an authentication architecture that enables Multi-factor user authentication on public displays. The first factor is a knowledge-factor: we employ a shoulder-surfing resilient multimodal scheme that combines gaze and touch input for password entry. The second factor is a possession-factor: users utilize their personal mobile devices, on which they enter the password. Credentials are securely transmitted to a server via Bluetooth beacons. We describe the implementation of GTmoPass and report on an evaluation of its usability and security, which shows that although authentication using GTmoPass is slightly slower than traditional methods, it protects against the three aforementioned threats

    Conceptual Frameworks for Multimodal Social Signal Processing

    Get PDF
    This special issue is about a research area which is developing rapidly. Pentland gave it a name which has become widely used, ‘Social Signal Processing’ (SSP for short), and his phrase provides the title of a European project, SSPnet, which has a brief to consolidate the area. The challenge that Pentland highlighted was understanding the nonlinguistic signals that serve as the basis for “subconscious discussions between humans about relationships, resources, risks, and rewards”. He identified it as an area where computational research had made interesting progress, and could usefully make more
    • 

    corecore