Search CORE

344 research outputs found

Smart Grid Relay Protection and Network Resource Management for Real-Time Communications.

Author: Zhang Jiapeng
Publication venue: University of Hawaiʻi at Mānoa
Publication date: 01/12/2017
Field of study

Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

ScholarSpace at University of Hawai'i at Manoa

Active classification with comparison queries

Author: Kane Daniel M.
Lovett Shachar
Moran Shay
Zhang Jiapeng
Publication venue
Publication date: 01/06/2017
Field of study

We study an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size

n

using approximately

O(\log n)

queries. This implies an exponential improvement over classical active learning, where only label queries are allowed. We complement these results by showing that if any of these assumptions is removed then, in the worst case,

\Omega(n)

queries are required. Our results follow from a new general framework of active learning with additional queries. We identify a combinatorial dimension, called the \emph{inference dimension}, that captures the query complexity when each additional query is determined by

O(1)

examples (such as comparison queries, each of which is determined by the two compared examples). Our results for half spaces follow by bounding the inference dimension in the cases discussed above.Comment: 23 pages (not including references), 1 figure. The new version contains a minor fix in the proof of Lemma 4.

arXiv.org e-Print Archive

Crossref

Fractional Certificates for Bounded Functions

Author: Lovett Shachar
Zhang Jiapeng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Lifting Theorems Meet Information Complexity: Known and New Lower Bounds of Set-disjointness

Author: Yang Guangxu
Zhang Jiapeng
Publication venue
Publication date: 23/09/2023
Field of study

Set-disjointness problems are one of the most fundamental problems in communication complexity and have been extensively studied in past decades. Given its importance, many lower bound techniques were introduced to prove communication lower bounds of set-disjointness. Combining ideas from information complexity and query-to-communication lifting theorems, we introduce a density increment argument to prove communication lower bounds for set-disjointness: We give a simple proof showing that a large rectangle cannot be

0

-monochromatic for multi-party unique-disjointness. We interpret the direct-sum argument as a density increment process and give an alternative proof of randomized communication lower bounds for multi-party unique-disjointness. Avoiding full simulations in lifting theorems, we simplify and improve communication lower bounds for sparse unique-disjointness. Potential applications to be unified and improved by our density increment argument are also discussed.Comment: Working Pape

arXiv.org e-Print Archive

Detecting Hidden Communities by Power Iterations with Connections to Vanilla Spectral Algorithms

Author: Mukherjee Chandra Sekhar
Zhang Jiapeng
Publication venue
Publication date: 07/11/2022
Field of study

Community detection in the stochastic block model is one of the central problems of graph clustering. Since its introduction, many subsequent papers have made great strides in solving and understanding this model. In this setup, spectral algorithms have been one of the most widely used frameworks. However, despite the long history of study, there are still unsolved challenges. One of the main open problems is the design and analysis of "simple"(vanilla) spectral algorithms, especially when the number of communities is large. In this paper, we provide two algorithms. The first one is based on the power-iteration method. It is a simple algorithm which only compares the rows of the powered adjacency matrix. Our algorithm performs optimally (up to logarithmic factors) compared to the best known bounds in the dense graph regime by Van Vu (Combinatorics Probability and Computing, 2018). Furthermore, our algorithm is also robust to the "small cluster barrier", recovering large clusters in the presence of an arbitrary number of small clusters. Then based on a connection between the powered adjacency matrix and eigenvectors, we provide a vanilla spectral algorithm for large number of communities in the balanced case. This answers an open question by Van Vu (Combinatorics Probability and Computing, 2018) in the balanced case. Our methods also partially solve technical barriers discussed by Abbe, Fan, Wang and Zhong (Annals of Statistics, 2020). In the technical side, we introduce a random partition method to analyze each entry of a powered random matrix. This method can be viewed as an eigenvector version of Wigner's trace method. Recall that Wigner's trace method links the trace of powered matrix to eigenvalues. Our method links the whole powered matrix to the span of eigenvectors. We expect our method to have more applications in random matrix theory

arXiv.org e-Print Archive

Sunflowers and Quasi-Sunflowers from Randomness Extractors

Author: Li Xin
Lovett Shachar
Zhang Jiapeng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)
Publication date: 01/01/2018
Field of study

Dagstuhl Research Online Publication Server

G2T: A simple but versatile framework for topic modeling based on pretrained language model and community detection

Author: Liu Jiapeng
Yan Qiang
Zhang Leihang
Publication venue
Publication date: 13/04/2023
Field of study

It has been reported that clustering-based topic models, which cluster high-quality sentence embeddings with an appropriate word selection method, can generate better topics than generative probabilistic topic models. However, these approaches suffer from the inability to select appropriate parameters and incomplete models that overlook the quantitative relation between words with topics and topics with text. To solve these issues, we propose graph to topic (G2T), a simple but effective framework for topic modelling. The framework is composed of four modules. First, document representation is acquired using pretrained language models. Second, a semantic graph is constructed according to the similarity between document representations. Third, communities in document semantic graphs are identified, and the relationship between topics and documents is quantified accordingly. Fourth, the word--topic distribution is computed based on a variant of TFIDF. Automatic evaluation suggests that G2T achieved state-of-the-art performance on both English and Chinese documents with different lengths. Human judgements demonstrate that G2T can produce topics with better interpretability and coverage than baselines. In addition, G2T can not only determine the topic number automatically but also give the probabilistic distribution of words in topics and topics in documents. Finally, G2T is publicly available, and the distillation experiments provide instruction on how it works

arXiv.org e-Print Archive

Recommended from our members

Discrete Fourier Analysis and Its Applications

Author: Zhang Jiapeng
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The topic of discrete Fourier analysis has been extensively studied in recent decades. It plays an important role in theoretical computer science and discrete mathematics. One hand it is interesting to study the structure of boolean functions via discrete Fourier analysis. On the other hand, these structural results also provide a huge number of applications in theoretical computer science, including computational complexity, pseudorandomness, cryptography, learning theory. In this dissertation, we extend some more connections between discrete Fourier analysis and theoretical computer science. In particular, we study the following questions.\begin{itemize}\item Robust sensitivity of boolean function. In this part, we study the connection between the Fourier tail bound and the sensitivity tail bound of boolean functions, which is an analogue of the sensitivity conjecture, which was proposed by Nisan \cite{nisan1991crew}.\item DNF sparsification. The disjunctive normal form (or DNF) is a widely used representation of boolean functions. It is very interesting to study the structure of DNFs. There are two natural ways to measure the complexity of DNFs, the width and the size. In this thesis, we study a connection between these two measures. We propose a new approach by combing the swithing lemma (a combinatoric tool) and the hypercontrativity inequality (an analytic inequality). This framework does also suggest a new approach to the famous sunflower conjecture.\item Applications in learning theory. In 1989, the first Fourier-based learning algorithms was introduced by a seminar paper of Linial, Mansour and Nisan \cite{linial1989constant}. Followed by a series of subsequent works, people found that discrete Fourier analysis is powerful to design learning algorithms. One hand sparse Fourier functions are strong enough to approximate a lot of functions, on the other hand sparse Fourier functions are relatively easy to learn. Build on this framework, we give a more efficient algorithm to solve the \emph{population recovery} problem. That is how to recover a unknown distribution from noisy samples.\end{itemize

eScholarship - University of California