26 research outputs found
CopyScope: Model-level Copyright Infringement Quantification in the Diffusion Workflow
Web-based AI image generation has become an innovative art form that can
generate novel artworks with the rapid development of the diffusion model.
However, this new technique brings potential copyright infringement risks as it
may incorporate the existing artworks without the owners' consent. Copyright
infringement quantification is the primary and challenging step towards
AI-generated image copyright traceability. Previous work only focused on data
attribution from the training data perspective, which is unsuitable for tracing
and quantifying copyright infringement in practice because of the following
reasons: (1) the training datasets are not always available in public; (2) the
model provider is the responsible party, not the image. Motivated by this, in
this paper, we propose CopyScope, a new framework to quantify the infringement
of AI-generated images from the model level. We first rigorously identify
pivotal components within the AI image generation pipeline. Then, we propose to
take advantage of Fr\'echet Inception Distance (FID) to effectively capture the
image similarity that fits human perception naturally. We further propose the
FID-based Shapley algorithm to evaluate the infringement contribution among
models. Extensive experiments demonstrate that our work not only reveals the
intricacies of infringement quantification but also effectively depicts the
infringing models quantitatively, thus promoting accountability in AI
image-generation tasks
EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect
Recent advances in federated learning (FL) enable collaborative training of
machine learning (ML) models from large-scale and widely dispersed clients
while protecting their privacy. However, when different clients' datasets are
heterogeneous, traditional FL mechanisms produce a global model that does not
adequately represent the poorer clients with limited data resources, resulting
in lower accuracy and higher bias on their local data. According to the Matthew
effect, which describes how the advantaged gain more advantage and the
disadvantaged lose more over time, deploying such a global model in client
applications may worsen the resource disparity among the clients and harm the
principles of social welfare and fairness. To mitigate the Matthew effect, we
propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian
fairness refers to the global model learned from FL has: (1) equal accuracy
among clients; (2) equal decision bias among clients. Besides achieving
egalitarian fairness among the clients, EFFL also aims for performance
optimality, minimizing the empirical risk loss and the bias for each client;
both are essential for any ML model training, whether centralized or
decentralized. We formulate EFFL as a constrained multi-constrained
multi-objectives optimization (MCMOO) problem, with the decision bias and
egalitarian fairness as constraints and the minimization of the empirical risk
losses on all clients as multiple objectives to be optimized. We propose a
gradient-based three-stage algorithm to obtain the Pareto optimal solutions
within the constraint space. Extensive experiments demonstrate that EFFL
outperforms other state-of-the-art FL algorithms in achieving a
high-performance global model with enhanced egalitarian fairness among all
clients
ChatAnything: Facetime Chat with LLM-Enhanced Personas
In this technical report, we target generating anthropomorphized personas for
LLM-based characters in an online manner, including visual appearance,
personality and tones, with only text descriptions. To achieve this, we first
leverage the in-context learning capability of LLMs for personality generation
by carefully designing a set of system prompts. We then propose two novel
concepts: the mixture of voices (MoV) and the mixture of diffusers (MoD) for
diverse voice and appearance generation. For MoV, we utilize the text-to-speech
(TTS) algorithms with a variety of pre-defined tones and select the most
matching one based on the user-provided text description automatically. For
MoD, we combine the recent popular text-to-image generation techniques and
talking head algorithms to streamline the process of generating talking
objects. We termed the whole framework as ChatAnything. With it, users could be
able to animate anything with any personas that are anthropomorphic using just
a few text inputs. However, we have observed that the anthropomorphic objects
produced by current generative models are often undetectable by pre-trained
face landmark detectors, leading to failure of the face motion generation, even
if these faces possess human-like appearances because those images are nearly
seen during the training (e.g., OOD samples). To address this issue, we
incorporate pixel-level guidance to infuse human face landmarks during the
image generation phase. To benchmark these metrics, we have built an evaluation
dataset. Based on it, we verify that the detection rate of the face landmark is
significantly increased from 57.0% to 92.5% thus allowing automatic face
animation based on generated speech content. The code and more results can be
found at https://chatanything.github.io/
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
In the era of large language models, Mixture-of-Experts (MoE) is a promising
architecture for managing computational costs when scaling up model parameters.
However, conventional MoE architectures like GShard, which activate the top-
out of experts, face challenges in ensuring expert specialization, i.e.
each expert acquires non-overlapping and focused knowledge. In response, we
propose the DeepSeekMoE architecture towards ultimate expert specialization. It
involves two principal strategies: (1) finely segmenting the experts into
ones and activating from them, allowing for a more flexible combination of
activated experts; (2) isolating experts as shared ones, aiming at
capturing common knowledge and mitigating redundancy in routed experts.
Starting from a modest scale with 2B parameters, we demonstrate that
DeepSeekMoE 2B achieves comparable performance with GShard 2.9B, which has 1.5
times the expert parameters and computation. In addition, DeepSeekMoE 2B nearly
approaches the performance of its dense counterpart with the same number of
total parameters, which set the upper bound of MoE models. Subsequently, we
scale up DeepSeekMoE to 16B parameters and show that it achieves comparable
performance with LLaMA2 7B, with only about 40% of computations. Further, our
preliminary efforts to scale up DeepSeekMoE to 145B parameters consistently
validate its substantial advantages over the GShard architecture, and show its
performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%)
of computations
Comprehensive Analysis of the Relationship Between RAS and RAF Mutations and MSI Status of Colorectal Cancer in Northeastern China
Background/Aims: Colorectal cancer (CRC) is mainly caused by chromosomal instability (CIN) and microsatellite instability (MSI). The RAS and RAF genes are essential components of the CIN pathway, and several studies have found that RAS and RAF mutations are associated with MSI status in CRC. Here, we examined these three factors in CRC in Northeast China and aimed to reveal new details of the relationship between these mutations and MSI status. Methods: This study involved 290 patients with CRC who had RAS or RAF gene mutation detected using fluorescence-based allele-specific polymerase chain reaction or Sanger sequencing. The majority of the identified patients were found to harbor MSI (MSI status). Accurate molecular detection was carried out using formalin-fixed paraffin-embedded tissue or blood samples. Results: The rates of RAS and RAF mutations were 58.5% and 4.1%, respectively. The prevalence of RAS mutation in CRC was clearly higher and that of RAF mutation was lower in Northeast China compared with previously reported cohorts in other locations. High MSI level (MSI-H status) was more complex, at around 10%. This was consistent with previous data from China. However, compared with data reported from other continents, MSI-H was higher than that of Japan or South Korea in Asia, and lower than that of Europe or the United States. Conclusion: RAS/RAF mutations and MSI status in CRC are closely associated with tumor location and ethnicity. Further studies investigating the relationship between these three factors can help in the development of treatment strategies for patients with CRC
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
The rapid development of open-source large language models (LLMs) has been
truly remarkable. However, the scaling law described in previous literature
presents varying conclusions, which casts a dark cloud over scaling LLMs. We
delve into the study of scaling laws and present our distinctive findings that
facilitate scaling of large scale models in two commonly used open-source
configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek
LLM, a project dedicated to advancing open-source language models with a
long-term perspective. To support the pre-training phase, we have developed a
dataset that currently consists of 2 trillion tokens and is continuously
expanding. We further conduct supervised fine-tuning (SFT) and Direct
Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the
creation of DeepSeek Chat models. Our evaluation results demonstrate that
DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in
the domains of code, mathematics, and reasoning. Furthermore, open-ended
evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance
compared to GPT-3.5
Legendre Cooperative PSO Strategies for Trajectory Optimization
Particle swarm optimization (PSO) is a population-based stochastic optimization technique in a smooth search space. However, in a category of trajectory optimization problem with arbitrary final time and multiple control variables, the smoothness of variables cannot be satisfied since the linear interpolation is widely used. In the paper, a novel Legendre cooperative PSO (LCPSO) is proposed by introducing Legendre orthogonal polynomials instead of the linear interpolation. An additional control variable is introduced to transcribe the original optimal problem with arbitrary final time to the fixed one. Then, a practical fast one-dimensional interval search algorithm is designed to optimize the additional control variable. Furthermore, to improve the convergence and prevent explosion of the LCPSO, a theorem on how to determine the boundaries of the coefficient of polynomials is given and proven. Finally, in the numeral simulations, compared with the ordinary PSO and other typical intelligent optimization algorithms GA and DE, the proposed LCPSO has traits of lower dimension, faster speed of convergence, and higher accuracy, while providing smoother control variables