Search CORE

232 research outputs found

The Research on Consumer Preferences of Dairy Products in China -The comparison between inside and outside Guangdong Province-

Author: Li Zhihang
Publication venue: 東北大学大学院農学研究科資源生物科学専攻資源環境経済学講座
Publication date: 27/05/2020
Field of study

平成29年度修士論文要

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Institutional Repositories DataBase (IRDB)

Solving Regularized Exp, Cosh and Sinh Regression Problems

Author: Li Zhihang
Song Zhao
Zhou Tianyi
Publication venue
Publication date: 28/03/2023
Field of study

In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix

A \in \mathbb{R}^{n \times d}

b \in \mathbb{R}^n

w \in \mathbb{R}^n

and any of functions

\exp, \cosh

and

\sinh

denoted as

f

. The goal is to find the optimal

x

that minimize

0.5 \| f(Ax) - b \|_2^2 + 0.5 \| \mathrm{diag}(w) A x \|_2^2

. The straightforward method is to use the naive Newton's method. Let

\mathrm{nnz}(A)

denote the number of non-zeros entries in matrix

A

. Let

\omega

denote the exponent of matrix multiplication. Currently,

\omega \approx 2.373

. Let

\epsilon

denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use

\log ( \|x_0 - x^*\|_2 / \epsilon)

iterations and

\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} )

per iteration time to solve the problem

arXiv.org e-Print Archive

Domain-decomposed Bayesian inversion based on local Karhunen-Loève expansions

Author: Li Jinglai
Liao Qifeng
Xu Zhihang
Publication venue
Publication date: 15/02/2024
Field of study

In many Bayesian inverse problems the goal is to recover a spatially varying random field. Such problems are often computationally challenging especially when the forward model is governed by complex partial differential equations (PDEs). The challenge is particularly severe when the spatial domain is large and the unknown random field needs to be represented by a high-dimensional parameter. In this paper, we present a domain-decomposed method to attack the dimensionality issue and the method decomposes the spatial domain and the parameter domain simultaneously. On each subdomain, a local Karhunen-Loève (KL) expansion is constructed, and a local inversion problem is solved independently in a parallel manner, and more importantly, in a lower-dimensional space. After local posterior samples are generated through conducting Markov chain Monte Carlo (MCMC) simulations on subdomains, a novel projection procedure is developed to effectively reconstruct the global field. In addition, the domain decomposition interface conditions are dealt with an adaptive Gaussian process-based fitting strategy. Numerical examples are provided to demonstrate the performance of the proposed method

University of Birmingham Research Portal

Domain-decomposed Bayesian inversion based on local Karhunen-Lo\`{e}ve expansions

Author: Li Jinglai
Liao Qifeng
Xu Zhihang
Publication venue
Publication date: 08/11/2022
Field of study

In many Bayesian inverse problems the goal is to recover a spatially varying random field. Such problems are often computationally challenging especially when the forward model is governed by complex partial differential equations (PDEs). The challenge is particularly severe when the spatial domain is large and the unknown random field needs to be represented by a high-dimensional parameter. In this paper, we present a domain-decomposed method to attack the dimensionality issue and the method decomposes the spatial domain and the parameter domain simultaneously. On each subdomain, a local Karhunen-Lo`eve (KL) expansion is constructed, and a local inversion problem is solved independently in a parallel manner, and more importantly, in a lower-dimensional space. After local posterior samples are generated through conducting Markov chain Monte Carlo (MCMC) simulations on subdomains, a novel projection procedure is developed to effectively reconstruct the global field. In addition, the domain decomposition interface conditions are dealt with an adaptive Gaussian process-based fitting strategy. Numerical examples are provided to demonstrate the performance of the proposed method

arXiv.org e-Print Archive

Attention Scheme Inspired Softmax Regression

Author: Deng Yichuan
Li Zhihang
Song Zhao
Publication venue
Publication date: 26/04/2023
Field of study

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible next words or phrases, given a sequence of input words. This distribution is then used to select the most likely next word or phrase, based on the probabilities assigned by the model. The softmax unit plays a crucial role in training LLMs, as it allows the model to learn from the data by adjusting the weights and biases of the neural network. In the area of convex optimization such as using central path method to solve linear programming. The softmax function has been used a crucial tool for controlling the progress and stability of potential function [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. In this work, inspired the softmax unit, we define a softmax regression problem. Formally speaking, given a matrix

A \in \mathbb{R}^{n \times d}

and a vector

b \in \mathbb{R}^n

, the goal is to use greedy type algorithm to solve \begin{align*} \min_{x} \| \langle \exp(Ax), {\bf 1}_n \rangle^{-1} \exp(Ax) - b \|_2^2. \end{align*} In certain sense, our provable convergence result provides theoretical support for why we can use greedy algorithm to train softmax function in practice

arXiv.org e-Print Archive

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

Author: Li Zhihang
Song Zhao
Wang Zifan
Yin Junze
Publication venue
Publication date: 26/11/2023
Field of study

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after

O(\log(1/\epsilon))

iterations, our algorithm can find an

\epsilon

-approximate minimizer of the training loss with high probability. Each iteration requires approximately

O(\mathrm{nnz}(C) + d^\omega)

time, where

d

is the model size,

C

is the input matrix, and

\omega < 2.374

is the matrix multiplication exponent

arXiv.org e-Print Archive

Serial Dependence in Dermatological Judgments

Author: Li Xinyu
Manassi Mauro
Pietralla Dana
Ren Zhihang
Whitney David
Publication venue: 'MDPI AG'
Publication date: 17/05/2023
Field of study

This research was funded by the National Institutes of Health (NIH) grant number R01CA236793.Peer reviewedPublisher PD

Aberdeen University Research

Fooling Polarization-based Vision using Locally Controllable Polarizing Projection

Author: Li Zhuoxiao
Nishino Ko
Nobuhara Shohei
Zheng Yinqiang
Zhong Zhihang
Publication venue
Publication date: 31/03/2023
Field of study

Polarization is a fundamental property of light that encodes abundant information regarding surface shape, material, illumination and viewing geometry. The computer vision community has witnessed a blossom of polarization-based vision applications, such as reflection removal, shape-from-polarization, transparent object segmentation and color constancy, partially due to the emergence of single-chip mono/color polarization sensors that make polarization data acquisition easier than ever. However, is polarization-based vision vulnerable to adversarial attacks? If so, is that possible to realize these adversarial attacks in the physical world, without being perceived by human eyes? In this paper, we warn the community of the vulnerability of polarization-based vision, which can be more serious than RGB-based vision. By adapting a commercial LCD projector, we achieve locally controllable polarizing projection, which is successfully utilized to fool state-of-the-art polarization-based vision algorithms for glass segmentation and color constancy. Compared with existing physical attacks on RGB-based vision, which always suffer from the trade-off between attack efficacy and eye conceivability, the adversarial attackers based on polarizing projection are contact-free and visually imperceptible, since naked human eyes can rarely perceive the difference of viciously manipulated polarizing light and ordinary illumination. This poses unprecedented risks on polarization-based vision, both in the monochromatic and trichromatic domain, for which due attentions should be paid and counter measures be considered

arXiv.org e-Print Archive