140 research outputs found

    Riemannian Adaptive Regularized Newton Methods with H\"older Continuous Hessians

    Full text link
    This paper presents strong worst-case iteration and operation complexity guarantees for Riemannian adaptive regularized Newton methods, a unified framework encompassing both Riemannian adaptive regularization (RAR) methods and Riemannian trust region (RTR) methods. We comprehensively characterize the sources of approximation in second-order manifold optimization methods: the objective function's smoothness, retraction's smoothness, and subproblem solver's inexactness. Specifically, for a function with a μ\mu-H\"older continuous Hessian, when equipped with a retraction featuring a ν\nu-H\"older continuous differential and a θ\theta-inexact subproblem solver, both RTR and RAR with 2+α2+\alpha regularization (where α=min{μ,ν,θ}\alpha=\min\{\mu,\nu,\theta\}) locate an (ϵ,ϵα/(1+α))(\epsilon,\epsilon^{\alpha/(1+\alpha)})-approximate second-order stationary point within at most O(ϵ(2+α)/(1+α))O(\epsilon^{-(2+\alpha)/(1+\alpha)}) iterations and at most O~(ϵ(4+3α)/(2(1+α)))\tilde{O}(\epsilon^{-(4+3\alpha)/(2(1+\alpha))}) Hessian-vector products. These complexity results are novel and sharp, and reduce to an iteration complexity of O(ϵ3/2)O(\epsilon^{-3/2}) and an operation complexity of O~(ϵ7/4)\tilde{O}(\epsilon^{-7/4}) when α=1\alpha=1

    DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

    Full text link
    Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.Comment: 18 pages, 18 figure

    Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification

    Full text link
    Retrieval augmentation, which enhances downstream models by a knowledge retriever and an external corpus instead of by merely increasing the number of model parameters, has been successfully applied to many natural language processing (NLP) tasks such as text classification, question answering and so on. However, existing methods that separately or asynchronously train the retriever and downstream model mainly due to the non-differentiability between the two parts, usually lead to degraded performance compared to end-to-end joint training. In this paper, we propose Differentiable Retrieval Augmentation via Generative lANguage modeling(Dragan), to address this problem by a novel differentiable reformulation. We demonstrate the effectiveness of our proposed method on a challenging NLP task in e-commerce search, namely query intent classification. Both the experimental results and ablation study show that the proposed method significantly and reasonably improves the state-of-the-art baselines on both offline evaluation and online A/B test.Comment: 5 pages, 2 figures; accepted by CIKM202

    A Deep Learning Framework for Traffic Data Imputation Considering Spatiotemporal Dependencies

    Full text link
    Spatiotemporal (ST) data collected by sensors can be represented as multi-variate time series, which is a sequence of data points listed in an order of time. Despite the vast amount of useful information, the ST data usually suffer from the issue of missing or incomplete data, which also limits its applications. Imputation is one viable solution and is often used to prepossess the data for further applications. However, in practice, n practice, spatiotemporal data imputation is quite difficult due to the complexity of spatiotemporal dependencies with dynamic changes in the traffic network and is a crucial prepossessing task for further applications. Existing approaches mostly only capture the temporal dependencies in time series or static spatial dependencies. They fail to directly model the spatiotemporal dependencies, and the representation ability of the models is relatively limited.Comment: accepted at ICITE 202

    Application of deep eutectic solvents in protein extraction and purification

    Get PDF
    Deep eutectic solvents (DESs) are a mixture of hydrogen bond donor (HBD) and hydrogen bond acceptor (HBA) molecules that can consist, respectively, of natural plant metabolites such as sugars, carboxylic acids, amino acids, and ionic molecules, which are for the vast majority ammonium salts. Media such as DESs are modular tools of sustainability that can be pointed toward the extraction of bioactive molecules due to their excellent physicochemical properties, their relatively low price, and accessibility. The present review focuses on the application of DESs for protein extraction and purification. The in-depth effects and principles that apply to DES-mediated extraction using various renewable biomasses will be discussed as well. One of the most important observations being made is that DESs have a clear ability to maintain the biological and/or functional activity of the extracted proteins, as well as increase their stability compared to traditional solvents. They demonstrate true potential for a reproducible but more importantly, scalable protein extraction and purification compared to traditional methods while enabling waste valorization in some particular cases

    Market Stakeholder Analysis of the Practical Implementation of Carbonation Curing on Steel Slag for Urban Sustainable Governance

    Get PDF
    Carbonation curing on steel slag is one of the most promising technologies for the iron and steel industry to manage its solid waste and carbon emissions. However, the technology is still in its demonstration stage. This paper investigates the market stakeholders of carbonation curing on steel slag for construction materials for its effective application by taking China as a case study. A holistic analysis of the competition, market size, and stakeholders of carbonation curing on steel slag was carried out through a literature review, a survey, a questionnaire, and interviews. The results showed that carbonation curing on steel slag had the advantages of high quality, high efficiency, low cost, and carbon reduction compared with other technologies. Shandong province was the most suitable province for the large-scale primary application of the technology. Stakeholder involvement to establish information platforms, enhance economic incentives, and promote adequate R&D activities would promote carbonation curing of steel slag into practice. This paper provides a reference for the commercialization of carbonation curing on similar calcium- and magnesium-based solid waste materials

    Exploring a Multi-Layer Coupled Network Propagation Model Based on Information Diffusion and Bounded Trust

    Get PDF
    Objective: To explore the law of opinion dissemination and individual opinion evolution at the micro level, this paper analyzes the influence of variation and oyster on communication from the perspective of network structure.Methods: In this paper, we introduce the concepts of “variation” and “oyster”, build a multi-layer coupled network environment combined with the ISOVR model, and conduct simulation experiments of network information dissemination based on the bounded trust model.Results: The experimental results reveal that the extent and scope of variation’s spread in the network are more dependent on the trust of nodes themselves, and decreasing the trust of nodes significantly reduces the rate and peak value of variation. Changing the silence coefficient of variation does not effectively change the direction of rumor propagation, which indicates that rumor has a strong propagation ability after mutation.Conclusion: The insights of this paper on the dissemination of public opinions include: 1) pay attention to people with high trust levels, such as opinion leaders; 2) clarify the misinformation in time to prevent further spread of rumors

    MiR-181d-5p Targets KLF6 to Improve Ischemia/Reperfusion-Induced AKI Through Effects on Renal Function, Apoptosis, and Inflammation

    Get PDF
    Renal tubular epithelial cell (RTEC) death and renal interstitial inflammation are the most crucial pathophysiological changes in acute kidney ischemia/reperfusion injury (IRI). The microRNA (miR)-181d family plays diverse roles in cell proliferation, apoptosis and inflammation, but its renal target and potential role in IRI are unknown. Here, we showed that the expression of miR-181d-5p decreased and Krueppel-like factor 6 (KLF6) increased in a renal cell (HK-2) model of hypoxia/reoxygenation (H/R) injury and a mouse model of renal IRI. They were mainly distributed in the renal tubules. After renal IRI, miR-181d-5p overexpression significantly inhibited inflammatory mediators, reduced apoptosis and further improved renal function. KLF6 exacerbated RTEC damage and acted as a NF-κB co-activator to aggravate the renal IRI inflammatory response. Mechanistically, KLF6 was predicted as a new potential target gene of miR-181d-5p through bioinformatic analysis and luciferase reporter assay verification. After overexpressing miR-181d-5p and inhibiting KLF6, the role of miR-181d-5p was weakened on the renal damage improvement. In conclusion, miR-181d-5p upregulation produced protective antiapoptotic and anti-inflammatory effects against IRI in kidneys in vivo and H/R injury in HK-2 cells in vitro, and these effects were achieved by targeted inhibition of KLF6. Thus, our results provide novel insights into the molecular mechanisms associated with IRI and a potential novel therapeutic target

    Avoiding the Great Filter: A Simulation of Important Factors for Human Survival

    Full text link
    Humanity's path to avoiding extinction is a daunting and inevitable challenge which proves difficult to solve, partially due to the lack of data and evidence surrounding the concept. We aim to address this confusion by addressing the most dangerous threats to humanity, in hopes of providing a direction to approach this problem. Using a probabilistic model, we observed the effects of nuclear war, climate change, asteroid impacts, artificial intelligence and pandemics, which are the most harmful disasters in terms of their extent of destruction on the length of human survival. We consider the starting point of the predicted average number of survival years as the present calendar year. Nuclear war, when sampling from an artificial normal distribution, results in an average human survival time of 60 years into the future starting from the present, before a civilization-ending disaster. While climate change results in an average human survival time of 193 years, the simulation based on impact from asteroids results in an average of 1754 years. Since the risks from asteroid impacts could be considered to reside mostly in the far future, it can be concluded that nuclear war, climate change, and pandemics are presently the most prominent threats to humanity. Additionally, the danger from superiority of artificial intelligence over humans, although still somewhat abstract, is worthy of further study as its potential for impeding humankind's progress towards becoming a more advanced civilization cannot be confidently dismissed
    corecore