48 research outputs found
Accessible Robot Control in Mixed Reality
A novel method to control the Spot robot of Boston Dynamics by Hololens 2 is
proposed. This method is mainly designed for people with physical disabilities,
users can control the robot's movement and robot arm without using their hands.
The eye gaze tracking and head motion tracking technologies of Hololens 2 are
utilized for sending control commands. The movement of the robot would follow
the eye gaze and the robot arm would mimic the pose of the user's head. Through
our experiment, our method is comparable with the traditional control method by
joystick in both time efficiency and user experience. Demo can be found on our
project webpage: https://zhangganlin.github.io/Holo-Spot-Page/index.htmlComment: Course Project of Mixed Reality at ETH Zuric
Semantic-Enhanced Image Clustering
Image clustering is an important and open-challenging task in computer
vision. Although many methods have been proposed to solve the image clustering
task, they only explore images and uncover clusters according to the image
features, thus being unable to distinguish visually similar but semantically
different images. In this paper, we propose to investigate the task of image
clustering with the help of a visual-language pre-training model. Different
from the zero-shot setting, in which the class names are known, we only know
the number of clusters in this setting. Therefore, how to map images to a
proper semantic space and how to cluster images from both image and semantic
spaces are two key problems. To solve the above problems, we propose a novel
image clustering method guided by the visual-language pre-training model CLIP,
named \textbf{Semantic-Enhanced Image Clustering (SIC)}. In this new method, we
propose a method to map the given images to a proper semantic space first and
efficient methods to generate pseudo-labels according to the relationships
between images and semantics. Finally, we propose performing clustering with
consistency learning in both image space and semantic space, in a
self-supervised learning fashion. The theoretical result of convergence
analysis shows that our proposed method can converge at a sublinear speed.
Theoretical analysis of expectation risk also shows that we can reduce the
expected risk by improving neighborhood consistency, increasing prediction
confidence, or reducing neighborhood imbalance. Experimental results on five
benchmark datasets clearly show the superiority of our new method
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Building scalable vision-language models to learn from diverse, multimodal
data remains an open challenge. In this paper, we introduce an Efficient
Vision-languagE foundation model, namely EVE, which is one unified multimodal
Transformer pre-trained solely by one unified pre-training task. Specifically,
EVE encodes both vision and language within a shared Transformer network
integrated with modality-aware sparse Mixture-of-Experts (MoE) modules, which
capture modality-specific information by selectively switching to different
experts. To unify pre-training tasks of vision and language, EVE performs
masked signal modeling on image-text pairs to reconstruct masked signals, i.e.,
image pixels and text tokens, given visible signals. This simple yet effective
pre-training objective accelerates training by 3.5x compared to the model
pre-trained with Image-Text Contrastive and Image-Text Matching losses. Owing
to the combination of the unified architecture and pre-training task, EVE is
easy to scale up, enabling better downstream performance with fewer resources
and faster training speed. Despite its simplicity, EVE achieves
state-of-the-art performance on various vision-language downstream tasks,
including visual question answering, visual reasoning, and image-text
retrieval.Comment: Accepted by AAAI 202
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
The rapid growth of memory and computation requirements of large language
models (LLMs) has outpaced the development of hardware, hindering people who
lack large-scale high-end GPUs from training or deploying LLMs. However,
consumer-level GPUs, which constitute a larger market share, are typically
overlooked in LLM due to their weaker computing performance, smaller storage
capacity, and lower communication bandwidth. Additionally, users may have
privacy concerns when interacting with remote LLMs. In this paper, we envision
a decentralized system unlocking the potential vast untapped consumer-level
GPUs in pre-training, inference and fine-tuning of LLMs with privacy
protection. However, this system faces critical challenges, including limited
CPU and GPU memory, low network bandwidth, the variability of peer and device
heterogeneity. To address these challenges, our system design incorporates: 1)
a broker with backup pool to implement dynamic join and quit of computing
providers; 2) task scheduling with hardware performance to improve system
efficiency; 3) abstracting ML procedures into directed acyclic graphs (DAGs) to
achieve model and task universality; 4) abstracting intermediate represention
and execution planes to ensure compatibility of various devices and deep
learning (DL) frameworks. Our performance analysis demonstrates that 50 RTX
3080 GPUs can achieve throughputs comparable to those of 4 H100 GPUs, which are
significantly more expensive
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Large Language Models (LLMs) have seen great advance in both academia and
industry, and their popularity results in numerous open-source frameworks and
techniques in accelerating LLM pre-training, fine-tuning, and inference.
Training and deploying LLMs are expensive as it requires considerable computing
resources and memory, hence many efficient approaches have been developed for
improving system pipelines as well as operators. However, the runtime
performance can vary significantly across hardware and software stacks, which
makes it difficult to choose the best configuration. In this work, we aim to
benchmark the performance from both macro and micro perspectives. First, we
benchmark the end-to-end performance of pre-training, fine-tuning, and serving
LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and
70B) on three 8-GPU platforms with and without individual optimization
techniques, including ZeRO, quantization, recomputation, FlashAttention. Then,
we dive deeper to provide a detailed runtime analysis of the sub-modules,
including computing and communication operators in LLMs. For end users, our
benchmark and findings help better understand different optimization
techniques, training and inference frameworks, together with hardware platforms
in choosing configurations for deploying LLMs. For researchers, our in-depth
module-wise analyses discover potential opportunities for future work to
further optimize the runtime performance of LLMs
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
With the emergence of Large Language Models (LLMs), there has been a
significant improvement in the programming capabilities of models, attracting
growing attention from researchers. We propose CodeApex, a bilingual benchmark
dataset focusing on the programming comprehension and code generation abilities
of LLMs. CodeApex comprises three types of multiple-choice questions:
conceptual understanding, commonsense reasoning, and multi-hop reasoning,
designed to evaluate LLMs on programming comprehension tasks. Additionally,
CodeApex utilizes algorithmic questions and corresponding test cases to assess
the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs,
including both general-purpose and specialized models. GPT exhibits the best
programming capabilities, achieving approximate accuracies of 50% and 56% on
the two tasks, respectively. There is still significant room for improvement in
programming tasks. We hope that CodeApex can serve as a reference for
evaluating the coding capabilities of LLMs, further promoting their development
and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git.
CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.Comment: 21 page
GE11-antigen-loaded hepatitis B virus core antigen virus-like particles efficiently bind to TNBC tumor
PurposeThis study aimed to explore the possibility of utilizing hepatitis B core protein (HBc) virus-like particles (VLPs) encapsulate doxorubicin (Dox) to reduce the adverse effect caused by its off-target and toxic side effect.MethodsHere, a triple-negative breast cancer (TNBC) tumor-targeting GE11-HBc VLP was constructed through genetic engineering. The GE11 peptide, a 12-amino-acid peptide targeting epidermal growth factor receptor (EGFR), was inserted into the surface protein loops of VLPs. The Dox was loaded into HBc VLPs by a thermal-triggered encapsulation strategy. The in vitro release, cytotoxicity, and cellular uptake of TNBC tumor-targeting GE11-HBc VLPs was then evaluated.ResultsThese VLPs possessed excellent stability, DOX loading efficiency, and preferentially released drug payload at high GSH levels. The insertion of GE11 targeting peptide caused improved cellular uptake and enhanced cell viability inhibitory in EGFR high-expressed TNBC cells.ConclusionTogether, these results highlight DOX-loaded, EGFR-targeted VLPs as a potentially useful therapeutic choice for EGFR-overexpressing TNBC
Contribution of Hepatitis B Virus Infection to the Aggressiveness of Primary Liver Cancer: A Clinical Epidemiological Study in Eastern China
Background and aims: The contribution of hepatitis B virus (HBV) infection to the aggressiveness of primary liver cancer (PLC) remains controversial. We aimed to characterize this in eastern China.Methods: We enrolled 8,515 PLC patients whose specimens were reserved at the BioBank of the hepatobiliary hospital (Shanghai, China) during 2007–2016. Of those, 3,124 who received primary radical resection were involved in survival analysis. A nomogram was constructed to predict the survivals using preoperative parameters.Results: Hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC), and combined hepatocellular cholangiocarcinoma (CHC) accounted for 94.6, 3.7, and 1.7%, respectively. The rates of HBV infection were 87.5, 49.2, and 80.6%, respectively. HBV infection was significantly associated with 10-year earlier onset, more cirrhosis, higher α-fetoprotein, higher carbohydrate antigen 19-9 (CA19-9), more microvascular invasion (MVI), lower neutrophil-to-lymphocyte ratio (NLR), and lower platelet-to-lymphocyte ratio (PLR) in HCC. HBV infection was also associated with 7-year earlier onset, more cirrhosis, higher α-fetoprotein, more MVI, and lower PLR in ICC. In the multivariate Cox analysis, high circulating HBV DNA, α-fetoprotein, CA19-9, NLR, tumor size, number, encapsulation, Barcelona Clinic Liver Cancer (BCLC) stage, and MVI predicted an unfavorable prognosis in HCC; only CA19-9 and BCLC stage, rather than HBV-related parameters, had prognostic values in ICC. A nomogram constructed with preoperative HBV-related parameters including HBV load, ultrasonic cirrhosis, and α-fetoprotein perform better than the current staging systems in predicting postoperative survival in HCC.Conclusion: HBV promotes the aggressiveness of HCC in Chinese population. The contributions of HBV to ICC and other etiological factors to HCC might be indirect via arousing non-resolving inflammation