37 research outputs found

    Solving Regularized Exp, Cosh and Sinh Regression Problems

    Full text link
    In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix ARn×dA \in \mathbb{R}^{n \times d}, bRnb \in \mathbb{R}^n, wRnw \in \mathbb{R}^n and any of functions exp,cosh\exp, \cosh and sinh\sinh denoted as ff. The goal is to find the optimal xx that minimize 0.5f(Ax)b22+0.5diag(w)Ax22 0.5 \| f(Ax) - b \|_2^2 + 0.5 \| \mathrm{diag}(w) A x \|_2^2. The straightforward method is to use the naive Newton's method. Let nnz(A)\mathrm{nnz}(A) denote the number of non-zeros entries in matrix AA. Let ω\omega denote the exponent of matrix multiplication. Currently, ω2.373\omega \approx 2.373. Let ϵ\epsilon denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use log(x0x2/ϵ)\log ( \|x_0 - x^*\|_2 / \epsilon) iterations and O~(nnz(A)+dω)\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} ) per iteration time to solve the problem

    Attention Scheme Inspired Softmax Regression

    Full text link
    Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible next words or phrases, given a sequence of input words. This distribution is then used to select the most likely next word or phrase, based on the probabilities assigned by the model. The softmax unit plays a crucial role in training LLMs, as it allows the model to learn from the data by adjusting the weights and biases of the neural network. In the area of convex optimization such as using central path method to solve linear programming. The softmax function has been used a crucial tool for controlling the progress and stability of potential function [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. In this work, inspired the softmax unit, we define a softmax regression problem. Formally speaking, given a matrix ARn×dA \in \mathbb{R}^{n \times d} and a vector bRnb \in \mathbb{R}^n, the goal is to use greedy type algorithm to solve \begin{align*} \min_{x} \| \langle \exp(Ax), {\bf 1}_n \rangle^{-1} \exp(Ax) - b \|_2^2. \end{align*} In certain sense, our provable convergence result provides theoretical support for why we can use greedy algorithm to train softmax function in practice

    Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

    Full text link
    There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after O(log(1/ϵ))O(\log(1/\epsilon)) iterations, our algorithm can find an ϵ\epsilon-approximate minimizer of the training loss with high probability. Each iteration requires approximately O(nnz(C)+dω)O(\mathrm{nnz}(C) + d^\omega) time, where dd is the model size, CC is the input matrix, and ω<2.374\omega < 2.374 is the matrix multiplication exponent

    Plant Phenotyping on Mobile Devices

    Get PDF
    Plants phenotyping is a fast and non-destructive method to obtain the physiological features of plants, compared with the expensive and time costing chemical analysis with plant sampling. Through plant phenotyping, scientists and farmers can tell plant health status more accurately compared to visual inspection, thus avoid the waste in time and resources and even to predict the productivity. However, the size and price of current plant phenotyping equipment restrict them from being widely applied at a farmer’s household level. Everyday field operation is barely achieved because of the availability of easy-to-carry and cost-effective equipment such as hyper-spectrum cameras, infrared cameras and thermal cameras. A plant phenotyping tool on mobile devices will make plant phenotyping technology more accessible to ordinary farmers and researchers. This application incorporates the use of physical optics, plant science models, and image processing ability of smartphones. With our special optical design, multispectral instead of RGB (red, green and blue) images can be obtained from the smartphones with fairly low cost. Through quick image processing on the smartphones, the APP will provide accurate plant physiological features predictions such as water, chlorophyll, and nitrogen. The sophisticated prediction models are applied which are provided by the Purdue’s plant phenotyping team. Once widely adopted, the information collected by the smartphones with the developed APP will be sent back to Purdue’s plant health big-data database. The feedback will not only allow us to improve our models, but also provide farmers and agricultural researchers easy access to real-time crop plant health data

    The Next-Gen Crop Nutrient Stress Identification with High-Precision Sensing Technology in Digital Agriculture

    Get PDF
    Crop yields are facing significant losses from nutrient deficiencies. Over-fertilizing also has negative economic and environmental impacts. It is challenging to optimize fertilizing without an accurate diagnosis. Recently, plant phenotyping has demonstrated outstanding capabilities in estimating crop traits. As one of the leading technologies, LeafSpec, provides high-quality crop image data for improving phenotyping quality. In this study, novel algorithms are developed for LeafSpec to identify crop nutrient deficiencies more accurately. Combined with UAV system, this technology will bring growers a robust solution for fertilizing diagnosis and scientific crop management

    Latency-aware Unified Dynamic Networks for Efficient Image Recognition

    Full text link
    Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet

    Progress and Opportunities of Foundation Models in Bioinformatics

    Full text link
    Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial intelligence (AI), particularly through the adoption of foundation models (FMs). These AI techniques have rapidly advanced, addressing historical challenges in bioinformatics such as the scarcity of annotated data and the presence of data noise. FMs are particularly adept at handling large-scale, unlabeled data, a common scenario in biological contexts due to the time-consuming and costly nature of experimentally determining labeled data. This characteristic has allowed FMs to excel and achieve notable results in various downstream validation tasks, demonstrating their ability to represent diverse biological entities effectively. Undoubtedly, FMs have ushered in a new era in computational biology, especially in the realm of deep learning. The primary goal of this survey is to conduct a systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs. We delve into the specifics of the problem at hand including sequence analysis, structure prediction, function annotation, and multimodal integration, comparing the structures and advancements against traditional methods. Furthermore, the review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases. Finally, we outline potential development paths and strategies for FMs in future biological research, setting the stage for continued innovation and application in this rapidly evolving field. This comprehensive review serves not only as an academic resource but also as a roadmap for future explorations and applications of FMs in biology.Comment: 27 pages, 3 figures, 2 table

    Synthetic Datasets for Autonomous Driving: A Survey

    Full text link
    Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution.Comment: 19 pages, 5 figure

    Prevalence and trend of hepatitis C virus infection among blood donors in Chinese mainland: a systematic review and meta-analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Blood transfusion is one of the most common transmission pathways of hepatitis C virus (HCV). This paper aims to provide a comprehensive and reliable tabulation of available data on the epidemiological characteristics and risk factors for HCV infection among blood donors in Chinese mainland, so as to help make prevention strategies and guide further research.</p> <p>Methods</p> <p>A systematic review was constructed based on the computerized literature database. Infection rates and 95% confidence intervals (95% CI) were calculated using the approximate normal distribution model. Odds ratios and 95% CI were calculated by fixed or random effects models. Data manipulation and statistical analyses were performed using STATA 10.0 and ArcGIS 9.3 was used for map construction.</p> <p>Results</p> <p>Two hundred and sixty-five studies met our inclusion criteria. The pooled prevalence of HCV infection among blood donors in Chinese mainland was 8.68% (95% CI: 8.01%-9.39%), and the epidemic was severer in North and Central China, especially in Henan and Hebei. While a significant lower rate was found in Yunnan. Notably, before 1998 the pooled prevalence of HCV infection was 12.87% (95%CI: 11.25%-14.56%) among blood donors, but decreased to 1.71% (95%CI: 1.43%-1.99%) after 1998. No significant difference was found in HCV infection rates between male and female blood donors, or among different blood type donors. The prevalence of HCV infection was found to increase with age. During 1994-1995, the prevalence rate reached the highest with a percentage of 15.78% (95%CI: 12.21%-19.75%), and showed a decreasing trend in the following years. A significant difference was found among groups with different blood donation types, Plasma donors had a relatively higher prevalence than whole blood donors of HCV infection (33.95% <it>vs </it>7.9%).</p> <p>Conclusions</p> <p>The prevalence of HCV infection has rapidly decreased since 1998 and kept a low level in recent years, but some provinces showed relatively higher prevalence than the general population. It is urgent to make efficient measures to prevent HCV secondary transmission and control chronic progress, and the key to reduce the HCV incidence among blood donors is to encourage true voluntary blood donors, strictly implement blood donation law, and avoid cross-infection.</p

    Misiroot: A Robotic Minimum Invasion in Situ Imaging System for Plant Root Phenotyping

    No full text
    Plant root phenotyping technologies play an important role in breeding, plant protection, and other plant science research projects. The root phenotyping customers urgently need technologies that are low-cost, in situ, non-destructive to the roots, and suitable for the natural soil environment. Many recently developed root phenotyping methods such as minirhizotron, CT, and MRI scanners have their unique advantages in observing plant roots, but they also have disadvantages and cannot meet all the critical requirements simultaneously. The study in this paper focuses on the development of a new plant root phenotyping robot that is minimally invasive to plants and working in situ inside natural soil, called “MISIRoot”. The MISIRoot system (patent pending) mainly consists of an industriallevel robotic arm, a mini-size camera with lighting set, a plant pot holding platform, and the image processing software for root recognition and feature extraction. MISIRoot can take high-resolution color images of the roots in soil with minimal disturbance to the root and reconstruct the plant roots’ three-dimensional (3D) structure at an accuracy of 0.1 mm. In a test assay, well-watered and drought-stressed groups of corn plants were measured by MISIRoot at V3, V4, and V5 stages. The system successfully acquired the RGB color images of the roots and extracted the 3D points cloud data which showed the locations of the detected roots in the soil. The plants measured by MISIRoot and plants not measured (controls) were carefully compared with Purdue’s Lilly 13-4 Hyperspectral Imaging Facility (reference). No significant differences were found between the two groups of plants at different growth stages. Therefore, it was concluded that MISIRoot measurements had no significant disturbance to the corn plant’s growth
    corecore