63 research outputs found
Convergence of the Ginzburg-Landau approximation for the Ericksen-Leslie system
We establish the local well-posedness of the general Ericksen-Leslie system
in liquid crystals with the initial velocity and director field in . In particular, we prove that the solutions of the Ginzburg-Landau
approximation system converge smoothly to the solution of the Ericksen-Leslie
system for any with a maximal existence time of the
Ericksen- Leslie system
Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition
Neural Network designs are quite diverse, from VGG-style to ResNet-style, and
from Convolutional Neural Networks to Transformers. Towards the design of
efficient accelerators, many works have adopted a dataflow-based, inter-layer
pipelined architecture, with a customised hardware towards each layer,
achieving ultra high throughput and low latency. The deployment of neural
networks to such dataflow architecture accelerators is usually hindered by the
available on-chip memory as it is desirable to preload the weights of neural
networks on-chip to maximise the system performance. To address this, networks
are usually compressed before the deployment through methods such as pruning,
quantization and tensor decomposition. In this paper, a framework for mapping
CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD is
proposed. The proposed method applies layer-specific Singular Value
Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed
manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs.
Our work is open-sourced: https://github.com/Yu-Zhewen/Mixed-TDComment: accepted by FPL202
Heavy quark fragmentation function in 't Hooft Model
We carry out a comprehensive study of the quark-to-meson fragmentation
function in the 't Hooft model, i.e., the two-dimensional Quantum
Chromodynamics (QCD) in limit, following the operator
definition pioneered by Collins and Soper. We apply the Hamiltonian approach as
well as the diagrammatic approach to construct the functional form of the
quark-to-meson fragmentation function in terms of the meson's light-cone wave
function. For the sake of comparison, we also investigate the heavy quark
fragmentation into quarkonium in two-dimensional QCD within the framework of
the nonrelativistic QCD (NRQCD) factorization, at the lowest order in quark
velocity. In the heavy quark limit, the quark fragmentation function obtained
from the {\it ab initio} method agrees well, both analytically and numerically,
with that obtained from the NRQCD approach. This agreement might be regarded as
a nontrivial justification for the validity of both field-theoretical
approaches to compute the heavy quark fragmentation function.Comment: 23 pages, 4 figures, 1 tabl
Photoproduction of C-even quarkonia at EIC and EicC
The photoproduction in collision has long been proposed as an
ideal process to probe the existence of odderon. In the current work, we
systematically investigate the photoproduction of various -even heavy
quarkonia (exemplified by , and with ) via
one-photon exchange channel, at the lowest order in and heavy quark
velocity in the context of NRQCD factorization. We find that the
photoproduction rates of the -even quarkonia through this mechanism are
comparable in magnitude with that through the odderon-initiated mechanism, even
in the Regge limit (), though the latter types of predictions suffers
from considerable theoretical uncertainties. The future measurements of these
types of quarkonium photoproduction processes in \texttt{EIC} and \texttt{EicC}
are crucial to ascertain which mechanism plays the dominant role.Comment: 16 pages, 9 figure
SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices
AI has led to significant advancements in computer vision and image
processing tasks, enabling a wide range of applications in real-life scenarios,
from autonomous vehicles to medical imaging. Many of those applications require
efficient object detection algorithms and complementary real-time, low latency
hardware to perform inference of these algorithms. The YOLO family of models is
considered the most efficient for object detection, having only a single model
pass. Despite this, the complexity and size of YOLO models can be too
computationally demanding for current edge-based platforms. To address this, we
present SATAY: a Streaming Architecture Toolflow for Accelerating YOLO. This
work tackles the challenges of deploying stateof-the-art object detection
models onto FPGA devices for ultralow latency applications, enabling real-time,
edge-based object detection. We employ a streaming architecture design for our
YOLO accelerators, implementing the complete model on-chip in a deeply
pipelined fashion. These accelerators are generated using an automated
toolflow, and can target a range of suitable FPGA devices. We introduce novel
hardware components to support the operations of YOLO models in a dataflow
manner, and off-chip memory buffering to address the limited on-chip memory
resources. Our toolflow is able to generate accelerator designs which
demonstrate competitive performance and energy characteristics to GPU devices,
and which outperform current state-of-the-art FPGA accelerators
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Machine learning (ML) accelerators have been studied and used extensively to
compute ML models with high performance and low power. However, designing such
accelerators normally takes a long time and requires significant effort.
Unfortunately, the pace of development of ML software models is much faster
than the accelerator design cycle, leading to frequent and drastic
modifications in the model architecture, thus rendering many accelerators
obsolete. Existing design tools and frameworks can provide quick accelerator
prototyping, but only for a limited range of models that can fit into a single
hardware device, such as an FPGA. Furthermore, with the emergence of large
language models, such as GPT-3, there is an increased need for hardware
prototyping of these large models within a many-accelerator system to ensure
the hardware can scale with the ever-growing model sizes. In this paper, we
propose an efficient and scalable approach for exploring accelerator systems to
compute large ML models. We developed a tool named MASE that can directly map
large ML models onto an efficient streaming accelerator system. Over a set of
ML models, we show that MASE can achieve better energy efficiency to GPUs when
computing inference for recent transformer models. Our tool will open-sourced
upon publication
Cancer-associated fibroblast related gene signature in Helicobacter pylori-based subtypes of gastric carcinoma for prognosis and tumor microenvironment estimation in silico analysis
IntroductionGastric cancer (GC) remains the major constituent of cancer-related deaths and a global public health challenge with a high incidence rate. Helicobacter pylori (HP) plays an essential role in promoting the occurrence and progression of GC. Cancer-associated fibroblasts (CAFs) are regarded as a significant component in the tumor microenvironment (TME), which is related to the metastasis of GC. However, the regulation mechanisms of CAFs in HP-related GC are not elucidated thoroughly.MethodsHP-related genes (HRGs) were downloaded from the GSE84437 and TCGA-GC databases. The two databases were combined into one cohort for training. Furthermore, the consensus unsupervised clustering analysis was obtained to sort the training cohort into different groups for the identification of differential expression genes (DEGs). Weighted correlation network analysis (WGCNA) was performed to verify the correlation between the DEGs and cancer-associated fibroblasts which were key components in the tumor microenvironment. The least absolute shrinkage and selection operator (LASSO) was executed to find cancer-associated fibroblast-related differential expression genes (CDEGs) for the further establishment of a prognostic model.Results and discussionIn this study, 52 HP-related genes (HRGs) were screened out based on the GSE84437 and TCGA-GC databases. A total of 804 GC samples were analyzed, respectively, and clustered into two HP-related subtypes. The DEGs identified from the two subtypes were proved to have a relationship with TME. After WGCNA and LASSO, the CAFs-related module was identified, from which 21 gene signatures were confirmed. Then, a CDEGs-Score was constructed and its prediction efficiency in GC patients was conducted for validation. Overall, a highly precise nomogram was established for enhancing the adaptability of the CDEGs-Score. Furthermore, our findings revealed the applicability of CDEGs-Score in the sensitivity of chemotherapeutic drugs. In general, our research provided brand-new possibilities for comprehending HP-related GC, evaluating survival, and more efficient therapeutic strategies
PGAweb: A Web Server for Bacterial Pan-Genome Analysis
An astronomical increase in microbial genome data in recent years has led to strong demand for bioinformatic tools for pan-genome analysis within and across species. Here, we present PGAweb, a user-friendly, web-based tool for bacterial pan-genome analysis, which is composed of two main pan-genome analysis modules, PGAP and PGAP-X. PGAweb provides key interactive and customizable functions that include orthologous clustering, pan-genome profiling, sequence variation and evolution analysis, and functional classification. PGAweb presents features of genomic structural dynamics and sequence diversity with different visualization methods that are helpful for intuitively understanding the dynamics and evolution of bacterial genomes. PGAweb has an intuitive interface with one-click setting of parameters and is freely available at http://PGAweb.vlcc.cn/
- …