48 research outputs found

    Accessible Robot Control in Mixed Reality

    Full text link
    A novel method to control the Spot robot of Boston Dynamics by Hololens 2 is proposed. This method is mainly designed for people with physical disabilities, users can control the robot's movement and robot arm without using their hands. The eye gaze tracking and head motion tracking technologies of Hololens 2 are utilized for sending control commands. The movement of the robot would follow the eye gaze and the robot arm would mimic the pose of the user's head. Through our experiment, our method is comparable with the traditional control method by joystick in both time efficiency and user experience. Demo can be found on our project webpage: https://zhangganlin.github.io/Holo-Spot-Page/index.htmlComment: Course Project of Mixed Reality at ETH Zuric

    Semantic-Enhanced Image Clustering

    Full text link
    Image clustering is an important and open-challenging task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus being unable to distinguish visually similar but semantically different images. In this paper, we propose to investigate the task of image clustering with the help of a visual-language pre-training model. Different from the zero-shot setting, in which the class names are known, we only know the number of clusters in this setting. Therefore, how to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. To solve the above problems, we propose a novel image clustering method guided by the visual-language pre-training model CLIP, named \textbf{Semantic-Enhanced Image Clustering (SIC)}. In this new method, we propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics. Finally, we propose performing clustering with consistency learning in both image space and semantic space, in a self-supervised learning fashion. The theoretical result of convergence analysis shows that our proposed method can converge at a sublinear speed. Theoretical analysis of expectation risk also shows that we can reduce the expected risk by improving neighborhood consistency, increasing prediction confidence, or reducing neighborhood imbalance. Experimental results on five benchmark datasets clearly show the superiority of our new method

    EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

    Full text link
    Building scalable vision-language models to learn from diverse, multimodal data remains an open challenge. In this paper, we introduce an Efficient Vision-languagE foundation model, namely EVE, which is one unified multimodal Transformer pre-trained solely by one unified pre-training task. Specifically, EVE encodes both vision and language within a shared Transformer network integrated with modality-aware sparse Mixture-of-Experts (MoE) modules, which capture modality-specific information by selectively switching to different experts. To unify pre-training tasks of vision and language, EVE performs masked signal modeling on image-text pairs to reconstruct masked signals, i.e., image pixels and text tokens, given visible signals. This simple yet effective pre-training objective accelerates training by 3.5x compared to the model pre-trained with Image-Text Contrastive and Image-Text Matching losses. Owing to the combination of the unified architecture and pre-training task, EVE is easy to scale up, enabling better downstream performance with fewer resources and faster training speed. Despite its simplicity, EVE achieves state-of-the-art performance on various vision-language downstream tasks, including visual question answering, visual reasoning, and image-text retrieval.Comment: Accepted by AAAI 202

    FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

    Full text link
    The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs. However, consumer-level GPUs, which constitute a larger market share, are typically overlooked in LLM due to their weaker computing performance, smaller storage capacity, and lower communication bandwidth. Additionally, users may have privacy concerns when interacting with remote LLMs. In this paper, we envision a decentralized system unlocking the potential vast untapped consumer-level GPUs in pre-training, inference and fine-tuning of LLMs with privacy protection. However, this system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity. To address these challenges, our system design incorporates: 1) a broker with backup pool to implement dynamic join and quit of computing providers; 2) task scheduling with hardware performance to improve system efficiency; 3) abstracting ML procedures into directed acyclic graphs (DAGs) to achieve model and task universality; 4) abstracting intermediate represention and execution planes to ensure compatibility of various devices and deep learning (DL) frameworks. Our performance analysis demonstrates that 50 RTX 3080 GPUs can achieve throughputs comparable to those of 4 H100 GPUs, which are significantly more expensive

    Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

    Full text link
    Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs

    CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

    Full text link
    With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. We propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension and code generation abilities of LLMs. CodeApex comprises three types of multiple-choice questions: conceptual understanding, commonsense reasoning, and multi-hop reasoning, designed to evaluate LLMs on programming comprehension tasks. Additionally, CodeApex utilizes algorithmic questions and corresponding test cases to assess the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs, including both general-purpose and specialized models. GPT exhibits the best programming capabilities, achieving approximate accuracies of 50% and 56% on the two tasks, respectively. There is still significant room for improvement in programming tasks. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git. CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.Comment: 21 page

    GE11-antigen-loaded hepatitis B virus core antigen virus-like particles efficiently bind to TNBC tumor

    Get PDF
    PurposeThis study aimed to explore the possibility of utilizing hepatitis B core protein (HBc) virus-like particles (VLPs) encapsulate doxorubicin (Dox) to reduce the adverse effect caused by its off-target and toxic side effect.MethodsHere, a triple-negative breast cancer (TNBC) tumor-targeting GE11-HBc VLP was constructed through genetic engineering. The GE11 peptide, a 12-amino-acid peptide targeting epidermal growth factor receptor (EGFR), was inserted into the surface protein loops of VLPs. The Dox was loaded into HBc VLPs by a thermal-triggered encapsulation strategy. The in vitro release, cytotoxicity, and cellular uptake of TNBC tumor-targeting GE11-HBc VLPs was then evaluated.ResultsThese VLPs possessed excellent stability, DOX loading efficiency, and preferentially released drug payload at high GSH levels. The insertion of GE11 targeting peptide caused improved cellular uptake and enhanced cell viability inhibitory in EGFR high-expressed TNBC cells.ConclusionTogether, these results highlight DOX-loaded, EGFR-targeted VLPs as a potentially useful therapeutic choice for EGFR-overexpressing TNBC

    Contribution of Hepatitis B Virus Infection to the Aggressiveness of Primary Liver Cancer: A Clinical Epidemiological Study in Eastern China

    Get PDF
    Background and aims: The contribution of hepatitis B virus (HBV) infection to the aggressiveness of primary liver cancer (PLC) remains controversial. We aimed to characterize this in eastern China.Methods: We enrolled 8,515 PLC patients whose specimens were reserved at the BioBank of the hepatobiliary hospital (Shanghai, China) during 2007–2016. Of those, 3,124 who received primary radical resection were involved in survival analysis. A nomogram was constructed to predict the survivals using preoperative parameters.Results: Hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC), and combined hepatocellular cholangiocarcinoma (CHC) accounted for 94.6, 3.7, and 1.7%, respectively. The rates of HBV infection were 87.5, 49.2, and 80.6%, respectively. HBV infection was significantly associated with 10-year earlier onset, more cirrhosis, higher α-fetoprotein, higher carbohydrate antigen 19-9 (CA19-9), more microvascular invasion (MVI), lower neutrophil-to-lymphocyte ratio (NLR), and lower platelet-to-lymphocyte ratio (PLR) in HCC. HBV infection was also associated with 7-year earlier onset, more cirrhosis, higher α-fetoprotein, more MVI, and lower PLR in ICC. In the multivariate Cox analysis, high circulating HBV DNA, α-fetoprotein, CA19-9, NLR, tumor size, number, encapsulation, Barcelona Clinic Liver Cancer (BCLC) stage, and MVI predicted an unfavorable prognosis in HCC; only CA19-9 and BCLC stage, rather than HBV-related parameters, had prognostic values in ICC. A nomogram constructed with preoperative HBV-related parameters including HBV load, ultrasonic cirrhosis, and α-fetoprotein perform better than the current staging systems in predicting postoperative survival in HCC.Conclusion: HBV promotes the aggressiveness of HCC in Chinese population. The contributions of HBV to ICC and other etiological factors to HCC might be indirect via arousing non-resolving inflammation
    corecore