33 research outputs found
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Machine learning (ML) accelerators have been studied and used extensively to
compute ML models with high performance and low power. However, designing such
accelerators normally takes a long time and requires significant effort.
Unfortunately, the pace of development of ML software models is much faster
than the accelerator design cycle, leading to frequent and drastic
modifications in the model architecture, thus rendering many accelerators
obsolete. Existing design tools and frameworks can provide quick accelerator
prototyping, but only for a limited range of models that can fit into a single
hardware device, such as an FPGA. Furthermore, with the emergence of large
language models, such as GPT-3, there is an increased need for hardware
prototyping of these large models within a many-accelerator system to ensure
the hardware can scale with the ever-growing model sizes. In this paper, we
propose an efficient and scalable approach for exploring accelerator systems to
compute large ML models. We developed a tool named MASE that can directly map
large ML models onto an efficient streaming accelerator system. Over a set of
ML models, we show that MASE can achieve better energy efficiency to GPUs when
computing inference for recent transformer models. Our tool will open-sourced
upon publication
Cancer-associated fibroblast related gene signature in Helicobacter pylori-based subtypes of gastric carcinoma for prognosis and tumor microenvironment estimation in silico analysis
IntroductionGastric cancer (GC) remains the major constituent of cancer-related deaths and a global public health challenge with a high incidence rate. Helicobacter pylori (HP) plays an essential role in promoting the occurrence and progression of GC. Cancer-associated fibroblasts (CAFs) are regarded as a significant component in the tumor microenvironment (TME), which is related to the metastasis of GC. However, the regulation mechanisms of CAFs in HP-related GC are not elucidated thoroughly.MethodsHP-related genes (HRGs) were downloaded from the GSE84437 and TCGA-GC databases. The two databases were combined into one cohort for training. Furthermore, the consensus unsupervised clustering analysis was obtained to sort the training cohort into different groups for the identification of differential expression genes (DEGs). Weighted correlation network analysis (WGCNA) was performed to verify the correlation between the DEGs and cancer-associated fibroblasts which were key components in the tumor microenvironment. The least absolute shrinkage and selection operator (LASSO) was executed to find cancer-associated fibroblast-related differential expression genes (CDEGs) for the further establishment of a prognostic model.Results and discussionIn this study, 52 HP-related genes (HRGs) were screened out based on the GSE84437 and TCGA-GC databases. A total of 804 GC samples were analyzed, respectively, and clustered into two HP-related subtypes. The DEGs identified from the two subtypes were proved to have a relationship with TME. After WGCNA and LASSO, the CAFs-related module was identified, from which 21 gene signatures were confirmed. Then, a CDEGs-Score was constructed and its prediction efficiency in GC patients was conducted for validation. Overall, a highly precise nomogram was established for enhancing the adaptability of the CDEGs-Score. Furthermore, our findings revealed the applicability of CDEGs-Score in the sensitivity of chemotherapeutic drugs. In general, our research provided brand-new possibilities for comprehending HP-related GC, evaluating survival, and more efficient therapeutic strategies
PGAweb: A Web Server for Bacterial Pan-Genome Analysis
An astronomical increase in microbial genome data in recent years has led to strong demand for bioinformatic tools for pan-genome analysis within and across species. Here, we present PGAweb, a user-friendly, web-based tool for bacterial pan-genome analysis, which is composed of two main pan-genome analysis modules, PGAP and PGAP-X. PGAweb provides key interactive and customizable functions that include orthologous clustering, pan-genome profiling, sequence variation and evolution analysis, and functional classification. PGAweb presents features of genomic structural dynamics and sequence diversity with different visualization methods that are helpful for intuitively understanding the dynamics and evolution of bacterial genomes. PGAweb has an intuitive interface with one-click setting of parameters and is freely available at http://PGAweb.vlcc.cn/
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
The rapid development of open-source large language models (LLMs) has been
truly remarkable. However, the scaling law described in previous literature
presents varying conclusions, which casts a dark cloud over scaling LLMs. We
delve into the study of scaling laws and present our distinctive findings that
facilitate scaling of large scale models in two commonly used open-source
configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek
LLM, a project dedicated to advancing open-source language models with a
long-term perspective. To support the pre-training phase, we have developed a
dataset that currently consists of 2 trillion tokens and is continuously
expanding. We further conduct supervised fine-tuning (SFT) and Direct
Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the
creation of DeepSeek Chat models. Our evaluation results demonstrate that
DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in
the domains of code, mathematics, and reasoning. Furthermore, open-ended
evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance
compared to GPT-3.5
Recent Advances in the Distribution, Chemical Composition, Health Benefits, and Application of the Fruit of <i>Siraitia grosvenorii</i>
The fruits of Siraitia grosvenorii (S. grosvenorii) have attracted a lot of scientific interest as part of the current healthy diet. S. grosvenorii has diverse health-promoting effects, including antioxidant, anti-inflammatory, antimicrobial, respiratory modulation, metabolic modulation, antitumor, and neuroprotective effects, as well as gastrointestinal function modulation. As a plant resource, S. grosvenorii has broad application prospects, which promotes the development of the horticultural industry. Moreover, Mogroside has attracted much attention as an important active ingredient of S. grosvenorii. This review provides an in-depth exploration of the distribution, chemical composition, health benefits, and application of S. grosvenorii, particularly Mogroside. This comprehensive exploration highlights the important therapeutic potential of S. grosvenorii, prompting further research into its applications. As value-added functional ingredients, S. grosvenorii and its constituents have significant potential for disease prevention and are widely used in the development of food and health supplements
Optimization of OpenCV based spot identification method for surface plasmon resonance imaging
In this work, we focus on the OpenCV based microarray recognition method for Surface Plasmon Resonance Imaging (SPRi), proposing the hit-ratio of global light pixels and coverage of the potential spots in a microarray as the criteria for identification evaluation in SPRi data. We optimized the design of the ellipse fitting strategy by analyzing the impact of different parameters in the method. After optimization of the parameters, the accuracy of microarray recognition was successfully increased to over 90%. This work not only contributes to reducing errors in microarray signal extraction and improving signal processing quality but also has significant implications for applying computer graphic technology in high-throughput biochemical analysis
Combining aggregate and individual-level data to estimate individual-level associations between air pollution and COVID-19 mortality in the United States.
Imposing stricter regulations for PM2.5 has the potential to mitigate damaging health and climate change effects. Recent evidence establishing a link between exposure to air pollution and COVID-19 outcomes is one of many arguments for the need to reduce the National Ambient Air Quality Standards (NAAQS) for PM2.5. However, many studies reporting a relationship between COVID-19 outcomes and PM2.5 have been criticized because they are based on ecological regression analyses, where area-level counts of COVID-19 outcomes are regressed on area-level exposure to air pollution and other covariates. It is well known that regression models solely based on area-level data are subject to ecological bias, i.e., they may provide a biased estimate of the association at the individual-level, due to within-area variability of the data. In this paper, we augment county-level COVID-19 mortality data with a nationally representative sample of individual-level covariate information from the American Community Survey along with high-resolution estimates of PM2.5 concentrations obtained from a validated model and aggregated to the census tract for the contiguous United States. We apply a Bayesian hierarchical modeling approach to combine county-, census tract-, and individual-level data to ultimately draw inference about individual-level associations between long-term exposure to PM2.5 and mortality for COVID-19. By analyzing data prior to the Emergency Use Authorization for the COVID-19 vaccines we found that an increase of 1 μg/m3 in long-term PM2.5 exposure, averaged over the 17-year period 2000-2016, is associated with a 3.3% (95% credible interval, 2.8 to 3.8%) increase in an individual's odds of COVID-19 mortality. Code to reproduce our study is publicly available at https://github.com/NSAPH/PM_COVID_ecoinference. The results confirm previous evidence of an association between long-term exposure to PM2.5 and COVID-19 mortality and strengthen the case for tighter regulations on harmful air pollution and greenhouse gas emissions
The C-Terminal Repeat Units of SpaA Mediate Adhesion of <i>Erysipelothrix rhusiopathiae</i> to Host Cells and Regulate Its Virulence
Erysipelothrix rhusiopathiae is a causative agent of erysipelas in animals and erysipeloid in humans. However, current information regarding E. rhusiopathiae pathogenesis remains limited. Previously, we identified two E. rhusiopathiae strains, SE38 and G4T10, which were virulent and avirulent in pigs, respectively. Here, to further study the pathogenic mechanism of E. rhusiopathiae, we sequenced and assembled the genomes of strains SE38 and G4T10, and performed a comparative genomic analysis to identify differences or mutations in virulence-associated genes. Next, we comparatively analyzed 25 E. rhusiopathiae virulence-associated genes in SE38 and G4T10. Compared with that of SE38, the spaA gene of the G4T10 strain lacked 120 bp, encoding repeat units at the C-terminal of SpaA. To examine whether these deletions or splits influence E. rhusiopathiae virulence, these 120 bp were successfully deleted from the spaA gene in strain SE38 by homologous recombination. The mutant strain ΔspaA displayed attenuated virulence in mice and decreased adhesion to porcine iliac artery endothelial cells, which was also observed using the corresponding mutant protein SpaA’. Our results demonstrate that SpaA-mediated adhesion between E. rhusiopathiae and host cells is dependent on its C-terminal repeat units