140 research outputs found
Riemannian Adaptive Regularized Newton Methods with H\"older Continuous Hessians
This paper presents strong worst-case iteration and operation complexity
guarantees for Riemannian adaptive regularized Newton methods, a unified
framework encompassing both Riemannian adaptive regularization (RAR) methods
and Riemannian trust region (RTR) methods. We comprehensively characterize the
sources of approximation in second-order manifold optimization methods: the
objective function's smoothness, retraction's smoothness, and subproblem
solver's inexactness. Specifically, for a function with a -H\"older
continuous Hessian, when equipped with a retraction featuring a -H\"older
continuous differential and a -inexact subproblem solver, both RTR and
RAR with regularization (where )
locate an -approximate second-order
stationary point within at most
iterations and at most
Hessian-vector products. These complexity results are novel and sharp, and
reduce to an iteration complexity of and an operation
complexity of when
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
Multi-task model training has been adopted to enable a single deep neural
network model (often a large language model) to handle multiple tasks (e.g.,
question answering and text summarization). Multi-task training commonly
receives input sequences of highly different lengths due to the diverse
contexts of different tasks. Padding (to the same sequence length) or packing
(short examples into long sequences of the same length) is usually adopted to
prepare input samples for model training, which is nonetheless not space or
computation efficient. This paper proposes a dynamic micro-batching approach to
tackle sequence length variation and enable efficient multi-task model
training. We advocate pipeline-parallel training of the large model with
variable-length micro-batches, each of which potentially comprises a different
number of samples. We optimize micro-batch construction using a dynamic
programming-based approach, and handle micro-batch execution time variation
through dynamic pipeline and communication scheduling, enabling highly
efficient pipeline training. Extensive evaluation on the FLANv2 dataset
demonstrates up to 4.39x higher training throughput when training T5, and 3.25x
when training GPT, as compared with packing-based baselines. DynaPipe's source
code is publicly available at
https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.Comment: 18 pages, 18 figure
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification
Retrieval augmentation, which enhances downstream models by a knowledge
retriever and an external corpus instead of by merely increasing the number of
model parameters, has been successfully applied to many natural language
processing (NLP) tasks such as text classification, question answering and so
on. However, existing methods that separately or asynchronously train the
retriever and downstream model mainly due to the non-differentiability between
the two parts, usually lead to degraded performance compared to end-to-end
joint training. In this paper, we propose Differentiable Retrieval Augmentation
via Generative lANguage modeling(Dragan), to address this problem by a novel
differentiable reformulation. We demonstrate the effectiveness of our proposed
method on a challenging NLP task in e-commerce search, namely query intent
classification. Both the experimental results and ablation study show that the
proposed method significantly and reasonably improves the state-of-the-art
baselines on both offline evaluation and online A/B test.Comment: 5 pages, 2 figures; accepted by CIKM202
A Deep Learning Framework for Traffic Data Imputation Considering Spatiotemporal Dependencies
Spatiotemporal (ST) data collected by sensors can be represented as
multi-variate time series, which is a sequence of data points listed in an
order of time. Despite the vast amount of useful information, the ST data
usually suffer from the issue of missing or incomplete data, which also limits
its applications. Imputation is one viable solution and is often used to
prepossess the data for further applications. However, in practice, n practice,
spatiotemporal data imputation is quite difficult due to the complexity of
spatiotemporal dependencies with dynamic changes in the traffic network and is
a crucial prepossessing task for further applications. Existing approaches
mostly only capture the temporal dependencies in time series or static spatial
dependencies. They fail to directly model the spatiotemporal dependencies, and
the representation ability of the models is relatively limited.Comment: accepted at ICITE 202
Application of deep eutectic solvents in protein extraction and purification
Deep eutectic solvents (DESs) are a mixture of hydrogen bond donor (HBD) and hydrogen bond acceptor (HBA) molecules that can consist, respectively, of natural plant metabolites such as sugars, carboxylic acids, amino acids, and ionic molecules, which are for the vast majority ammonium salts. Media such as DESs are modular tools of sustainability that can be pointed toward the extraction of bioactive molecules due to their excellent physicochemical properties, their relatively low price, and accessibility. The present review focuses on the application of DESs for protein extraction and purification. The in-depth effects and principles that apply to DES-mediated extraction using various renewable biomasses will be discussed as well. One of the most important observations being made is that DESs have a clear ability to maintain the biological and/or functional activity of the extracted proteins, as well as increase their stability compared to traditional solvents. They demonstrate true potential for a reproducible but more importantly, scalable protein extraction and purification compared to traditional methods while enabling waste valorization in some particular cases
Market Stakeholder Analysis of the Practical Implementation of Carbonation Curing on Steel Slag for Urban Sustainable Governance
Carbonation curing on steel slag is one of the most promising technologies for the iron and steel industry to manage its solid waste and carbon emissions. However, the technology is still in its demonstration stage. This paper investigates the market stakeholders of carbonation curing on steel slag for construction materials for its effective application by taking China as a case study. A holistic analysis of the competition, market size, and stakeholders of carbonation curing on steel slag was carried out through a literature review, a survey, a questionnaire, and interviews. The results showed that carbonation curing on steel slag had the advantages of high quality, high efficiency, low cost, and carbon reduction compared with other technologies. Shandong province was the most suitable province for the large-scale primary application of the technology. Stakeholder involvement to establish information platforms, enhance economic incentives, and promote adequate R&D activities would promote carbonation curing of steel slag into practice. This paper provides a reference for the commercialization of carbonation curing on similar calcium- and magnesium-based solid waste materials
Exploring a Multi-Layer Coupled Network Propagation Model Based on Information Diffusion and Bounded Trust
Objective: To explore the law of opinion dissemination and individual opinion evolution at the micro level, this paper analyzes the influence of variation and oyster on communication from the perspective of network structure.Methods: In this paper, we introduce the concepts of “variation” and “oyster”, build a multi-layer coupled network environment combined with the ISOVR model, and conduct simulation experiments of network information dissemination based on the bounded trust model.Results: The experimental results reveal that the extent and scope of variation’s spread in the network are more dependent on the trust of nodes themselves, and decreasing the trust of nodes significantly reduces the rate and peak value of variation. Changing the silence coefficient of variation does not effectively change the direction of rumor propagation, which indicates that rumor has a strong propagation ability after mutation.Conclusion: The insights of this paper on the dissemination of public opinions include: 1) pay attention to people with high trust levels, such as opinion leaders; 2) clarify the misinformation in time to prevent further spread of rumors
MiR-181d-5p Targets KLF6 to Improve Ischemia/Reperfusion-Induced AKI Through Effects on Renal Function, Apoptosis, and Inflammation
Renal tubular epithelial cell (RTEC) death and renal interstitial inflammation are the most crucial pathophysiological changes in acute kidney ischemia/reperfusion injury (IRI). The microRNA (miR)-181d family plays diverse roles in cell proliferation, apoptosis and inflammation, but its renal target and potential role in IRI are unknown. Here, we showed that the expression of miR-181d-5p decreased and Krueppel-like factor 6 (KLF6) increased in a renal cell (HK-2) model of hypoxia/reoxygenation (H/R) injury and a mouse model of renal IRI. They were mainly distributed in the renal tubules. After renal IRI, miR-181d-5p overexpression significantly inhibited inflammatory mediators, reduced apoptosis and further improved renal function. KLF6 exacerbated RTEC damage and acted as a NF-κB co-activator to aggravate the renal IRI inflammatory response. Mechanistically, KLF6 was predicted as a new potential target gene of miR-181d-5p through bioinformatic analysis and luciferase reporter assay verification. After overexpressing miR-181d-5p and inhibiting KLF6, the role of miR-181d-5p was weakened on the renal damage improvement. In conclusion, miR-181d-5p upregulation produced protective antiapoptotic and anti-inflammatory effects against IRI in kidneys in vivo and H/R injury in HK-2 cells in vitro, and these effects were achieved by targeted inhibition of KLF6. Thus, our results provide novel insights into the molecular mechanisms associated with IRI and a potential novel therapeutic target
Avoiding the Great Filter: A Simulation of Important Factors for Human Survival
Humanity's path to avoiding extinction is a daunting and inevitable challenge
which proves difficult to solve, partially due to the lack of data and evidence
surrounding the concept. We aim to address this confusion by addressing the
most dangerous threats to humanity, in hopes of providing a direction to
approach this problem. Using a probabilistic model, we observed the effects of
nuclear war, climate change, asteroid impacts, artificial intelligence and
pandemics, which are the most harmful disasters in terms of their extent of
destruction on the length of human survival. We consider the starting point of
the predicted average number of survival years as the present calendar year.
Nuclear war, when sampling from an artificial normal distribution, results in
an average human survival time of 60 years into the future starting from the
present, before a civilization-ending disaster. While climate change results in
an average human survival time of 193 years, the simulation based on impact
from asteroids results in an average of 1754 years. Since the risks from
asteroid impacts could be considered to reside mostly in the far future, it can
be concluded that nuclear war, climate change, and pandemics are presently the
most prominent threats to humanity. Additionally, the danger from superiority
of artificial intelligence over humans, although still somewhat abstract, is
worthy of further study as its potential for impeding humankind's progress
towards becoming a more advanced civilization cannot be confidently dismissed
- …