Search CORE

648 research outputs found

Runtime Analysis for the NSGA-II: Proving, Quantifying, and Explaining the Inefficiency For Many Objectives

Author: Doerr Benjamin
Zheng Weijie
Publication venue
Publication date: 19/06/2023
Field of study

The NSGA-II is one of the most prominent algorithms to solve multi-objective optimization problems. Despite numerous successful applications, several studies have shown that the NSGA-II is less effective for larger numbers of objectives. In this work, we use mathematical runtime analyses to rigorously demonstrate and quantify this phenomenon. We show that even on the simple

m

-objective generalization of the discrete OneMinMax benchmark, where every solution is Pareto optimal, the NSGA-II also with large population sizes cannot compute the full Pareto front (objective vectors of all Pareto optima) in sub-exponential time when the number of objectives is at least three. The reason for this unexpected behavior lies in the fact that in the computation of the crowding distance, the different objectives are regarded independently. This is not a problem for two objectives, where any sorting of a pair-wise incomparable set of solutions according to one objective is also such a sorting according to the other objective (in the inverse order)

arXiv.org e-Print Archive

Sharp Bounds for Genetic Drift in EDAs

Author: Doerr Benjamin
Zheng Weijie
Publication venue
Publication date: 31/10/2019
Field of study

Estimation of Distribution Algorithms (EDAs) are one branch of Evolutionary Algorithms (EAs) in the broad sense that they evolve a probabilistic model instead of a population. Many existing algorithms fall into this category. Analogous to genetic drift in EAs, EDAs also encounter the phenomenon that updates of the probabilistic model not justified by the fitness move the sampling frequencies to the boundary values. This can result in a considerable performance loss. This paper proves the first sharp estimates of the boundary hitting time of the sampling frequency of a neutral bit for several univariate EDAs. For the UMDA that selects

\mu

best individuals from

\lambda

offspring each generation, we prove that the expected first iteration when the frequency of the neutral bit leaves the middle range

[\tfrac 14, \tfrac 34]

and the expected first time it is absorbed in 0 or 1 are both

\Theta(\mu)

. The corresponding hitting times are

\Theta(K^2)

for the cGA with hypothetical population size

K

. This paper further proves that for PBIL with parameters

\mu

\lambda

, and

\rho

, in an expected number of

\Theta(\mu/\rho^2)

iterations the sampling frequency of a neutral bit leaves the interval

[\Theta(\rho/\mu),1-\Theta(\rho/\mu)]

and then always the same value is sampled for this bit, that is, the frequency approaches the corresponding boundary value with maximum speed. For the lower bounds implicit in these statements, we also show exponential tail bounds. If a bit is not neutral, but neutral or has a preference for ones, then the lower bounds on the times to reach a low frequency value still hold. An analogous statement holds for bits that are neutral or prefer the value zero

arXiv.org e-Print Archive

From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms

Author: Doerr Benjamin
Zheng Weijie
Publication venue
Publication date: 22/09/2023
Field of study

Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.Comment: Accepted for publication in "Journal of Machine Learning Research". Extended version of our GECCO 2020 paper. This article supersedes arXiv:2004.0714

arXiv.org e-Print Archive

Building K-Anonymous User Cohorts with\\ Consecutive Consistent Weighted Sampling (CCWS)

Author: Li Ping
Li Xiaoyun
Zhao Weijie
Zheng Xinyi
Publication venue
Publication date: 26/04/2023
Field of study

To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable

K

-anonymous cohort building algorithm called {\em consecutive consistent weighted sampling} (CCWS). The proposed method combines the spirit of the (

p

-powered) consistent weighted sampling and hierarchical clustering, so that the

K

-anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of

>70

M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as the vanilla CWS

arXiv.org e-Print Archive

Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction

Author: Bian Weijie
Fang Yuchen
Gai Kun
Qin Jiarui
Ren Kan
Xu Jian
Yu Yong
Zhang Weinan
Zheng Lei
Zhou Guorui
Zhu Xiaoqiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/05/2019
Field of study

User response prediction, which models the user preference w.r.t. the presented items, plays a key role in online services. With two-decade rapid development, nowadays the cumulated user behavior sequences on mature Internet service platforms have become extremely long since the user's first registration. Each user not only has intrinsic tastes, but also keeps changing her personal interests during lifetime. Hence, it is challenging to handle such lifelong sequential modeling for each individual user. Existing methodologies for sequential modeling are only capable of dealing with relatively recent user behaviors, which leaves huge space for modeling long-term especially lifelong sequential patterns to facilitate user modeling. Moreover, one user's behavior may be accounted for various previous behaviors within her whole online activity history, i.e., long-term dependency with multi-scale sequential patterns. In order to tackle these challenges, in this paper, we propose a Hierarchical Periodic Memory Network for lifelong sequential modeling with personalized memorization of sequential patterns for each user. The model also adopts a hierarchical and periodical updating mechanism to capture multi-scale sequential patterns of user interests while supporting the evolving user behavior logs. The experimental results over three large-scale real-world datasets have demonstrated the advantages of our proposed model with significant improvement in user response prediction performance against the state-of-the-arts.Comment: SIGIR 2019. Reproducible codes and datasets: https://github.com/alimamarankgroup/HPM

arXiv.org e-Print Archive

Crossref

A novel deflection shape function for rectangular capacitive micromachined ultrasonic transducer diaphragms

Author: Sun Weijie
Sun Zhendong
Suo Xudong
Wong Lawrence
Yeow John
Zheng Zhou
Publication venue: 'Elsevier BV'
Publication date: 15/07/2015
Field of study

The final publication is available at Elsevier via http://dx.doi.org/10.2174/1874347101206010001. © 2015. This version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/A highly accurate analytical deflection shape function that describes the deflection profiles of capacitive micromachined ultrasonic transducers (CMUTs) with rectangular membranes under electrostatic pressure has been formulated. The rectangular diaphragms have a thickness range of 0.6–1.5 μm and a side length range of 100–1000 μm. The new deflection shape function generates deflection profiles that are in excellent agreement with finite element analysis (FEA) results for a wide range of geometry dimensions and loading conditions. The deflection shape function is used to analyze membrane deformations and to calculate the capacitances between the deformed membranes and the fixed back plates. In 50 groups of random tests, compared with FEA results, the calculated capacitance values have a maximum deviation of 1.486% for rectangular membranes. The new analytical deflection function can provide designers with a simple way of gaining insight into the effects of designed parameters for CMUTs and other MEMS-based capacitive type sensors.National Basic Research Program of China under Grant 2014CB845302 and by National Natural Science Foundation (NNSF) of China under Grants 61374036, 61273121, and Natural Science Foundation of Guangdong Province under Grant 2014A030313237, and by Natural Science and Engineering Research Council of Canada

University of Waterloo's Institutional Repository

Crossref

Directory of Open Access Journals