Search CORE

477 research outputs found

An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions

Author: Huang Ziyun
Xu Jinhui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

In this paper, we consider the following sum query problem: Given a point set P in R^d, and a distance-based function f(p,q) (i.e. a function of the distance between p and q) satisfying some general properties, the goal is to develop a data structure and a query algorithm for efficiently computing a (1+epsilon)-approximate solution to the sum sum_{p in P} f(p,q) for any query point q in R^d and any small constant epsilon>0. Existing techniques for this problem are mainly based on some core-set techniques which often have difficulties to deal with functions with local domination property. Based on several new insights to this problem, we develop in this paper a novel technique to overcome these encountered difficulties. Our algorithm is capable of answering queries with high success probability in time no more than ~O_{epsilon,d}(n^{0.5 + c}), and the underlying data structure can be constructed in ~O_{epsilon,d}(n^{1+c}) time for any c>0, where the hidden constant has only polynomial dependence on 1/epsilon and d. Our technique is simple and can be easily implemented for practical purpose

Dagstuhl Research Online Publication Server

Separation Of Soil Evaporation And Vegetation Transpiration By MODIS Data For Central And Northern China

Author: Huang Jinhui Jeanne
Li Tingting
Publication venue: CUNY Academic Works
Publication date: 01/08/2014
Field of study

Evapotranspiration(ET) plays a crucial role in the hydrologic system. To estimate evapotranspiration quantitatively in a large scale, remote sensing data has been used in a number of models and shows its applicability in the estimation of evapotranspiration. In this paper, evapotranspiration for central and northern China was derived from MODIS data. In arid and semi-arid regions, soil evaporation can be considered as the minimum water requirement for bare area, while evapotranspiration can be considered as the minimum water demand for the area covered by vegetation. Hence the separation of soil evaporation and vegetation transpiration is valuable for efficient water resources management. In this study, the land surface temperature-fractional vegetation coverage(Ts-f) trapezoid method was applied in conjunction with an operational two-layer model. A modified algorithm for the determination of actual dry/wet edges(MADE) of the Ts-f trapezoid was proposed, which is an improvement of the original method based on Ts-VI(vegetation index) triangle developed by Ronglin Tang(2010). The MADE algorithm was then integrated with the two-layer model to estimate the latent heat flux (evaporation and transpiration). It’s showed that the retrieved latent heat flux is in good agreement with FLUXNET data obtained from Department of Biogeochemical Integration. The root mean square error of monthly ET is below 25 W/m2. The result demonstrated that the accuracy of the modified algorithm to determine dry/wet edges in the Ts-f trapezoid was satisfactory. Finally, the spatial and temporal distribution of soil evaporation and vegetation transpiration of central and northern China was further investigated in this study

City University of New York

Implementation of a transformation from BPEL4Chor to BPEL

Author: Huang Jinhui
Publication venue
Publication date: 01/01/2014
Field of study

This thesis is engaged with implementing the conceptual approach to transform BPEL4Chor to BPEL. The transformation process takes topology; grounding and PBDs defined in BPEL4Chor as input, and outputs abstract BPEL processes and WSDL file. The transformation process is implemented using JAXB

Small Candidate Set for Translational Pattern Search

Author: Feng Qilong
Huang Ziyun
Wang Jianxin
Xu Jinhui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

In this paper, we study the following pattern search problem: Given a pair of point sets A and B in fixed dimensional space R^d, with |B| = n, |A| = m and n >= m, the pattern search problem is to find the translations T\u27s of A such that each of the identified translations induces a matching between T(A) and a subset B\u27 of B with cost no more than some given threshold, where the cost is defined as the minimum bipartite matching cost of T(A) and B\u27. We present a novel algorithm to produce a small set of candidate translations for the pattern search problem. For any B\u27 subseteq B with |B\u27| = |A|, there exists at least one translation T in the candidate set such that the minimum bipartite matching cost between T(A) and B\u27 is no larger than (1+epsilon) times the minimum bipartite matching cost between A and B\u27 under any translation (i.e., the optimal translational matching cost). We also show that there exists an alternative solution to this problem, which constructs a candidate set of size O(n log^2 n) in O(n log^2 n) time with high probability of success. As a by-product of our construction, we obtain a weak epsilon-net for hypercube ranges, which significantly improves the construction time and the size of the candidate set. Our technique can be applied to a number of applications, including the translational pattern matching problem

Dagstuhl Research Online Publication Server

Distributed and Robust Support Vector Machine

Author: Ding Hu
Huang Ziyun
Liu Yangwei
Xu Jinhui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Symposium on Algorithms and Computation (ISAAC 2016)
Publication date: 01/01/2016
Field of study

In this paper, we consider the distributed version of Support Vector Machine (SVM) under the coordinator model, where all input data (i.e., points in R^d space) of SVM are arbitrarily distributed among k nodes in some network with a coordinator which can communicate with all nodes. We investigate two variants of this problem, with and without outliers. For distributed SVM without outliers, we prove a lower bound on the communication complexity and give a distributed (1-epsilon)-approximation algorithm to reach this lower bound, where epsilon is a user specified small constant. For distributed SVM with outliers, we present a (1-epsilon)-approximation algorithm to explicitly remove the influence of outliers. Our algorithm is based on a deterministic distributed top t selection algorithm with communication complexity of O(k log (t)) in the coordinator model. Experimental results on benchmark datasets confirm the theoretical guarantees of our algorithms

Dagstuhl Research Online Publication Server

Improved Algorithms for Clustering with Outliers

Author: Feng Qilong
Huang Ziyun
Wang Jianxin
Xu Jinhui
Zhang Zhen
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

Clustering is a fundamental problem in unsupervised learning. In many real-world applications, the to-be-clustered data often contains various types of noises and thus needs to be removed from the learning process. To address this issue, we consider in this paper two variants of such clustering problems, called k-median with m outliers and k-means with m outliers. Existing techniques for both problems either incur relatively large approximation ratios or can only efficiently deal with a small number of outliers. In this paper, we present improved solution to each of them for the case where k is a fixed number and m could be quite large. Particularly, we gave the first PTAS for the k-median problem with outliers in Euclidean space R^d for possibly high m and d. Our algorithm runs in O(nd((1/epsilon)(k+m))^(k/epsilon)^O(1)) time, which considerably improves the previous result (with running time O(nd(m+k)^O(m+k) + (1/epsilon)k log n)^O(1))) given by [Feldman and Schulman, SODA 2012]. For the k-means with outliers problem, we introduce a (6+epsilon)-approximation algorithm for general metric space with running time O(n(beta (1/epsilon)(k+m))^k) for some constant beta>1. Our algorithm first uses the k-means++ technique to sample O((1/epsilon)(k+m)) points from input and then select the k centers from them. Compared to the more involving existing techniques, our algorithms are much simpler, i.e., using only random sampling, and achieving better performance ratios

Dagstuhl Research Online Publication Server

Simulation Of Non-Point Source Pollution In Zhang River Basin

Author: Huang Jinhui
Lin Xiaojuan
Publication venue: CUNY Academic Works
Publication date: 01/08/2014
Field of study

Since 1987, the China Government started the “The Water Resource Protection and Management of Haihe River Basin” program aimed at tackling the problem of water pollution and realizing the comprehensive control of the ecosystem and water resources in the whole basin. This study aims to model one of the program’s catchments – Zhang River Basin (ZRB), which is a typical area with the striking water scarcity and water quality deterioration in Haihe River Basin (HRB). The distributed hydrological model SWAT(Soil and Water Assessment Tool) was used to simulate all related processes affecting water quantity, sediment, and nutrient loads in the catchment using historical flow and meteorological data for 5 years(January 2005-December 2009). Data on management practices (crop rotation, planting date, fertilizer quantity and irrigation) were included in the model during the simulation period of 5 years .The main objectives of this study were to evaluate the long-term impact of point source (PS) and non-point source (NPS) pollution on water quality loadings and to determine the contribution of point and non-point sources in the entire catchment. Based on the simulated results, the spatio-temporal distributions of flow, sediment and nutrient loads are analyzed; hence the critical areas of soil erosion and nutrient loss were identified, so as to provide the scientific basis for the water resources allocation and pollutant control in the basin

City University of New York

Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension

Author: Cao Tingfeng
Huang Jun
Tan Chuanqi
Wang Chengyu
Zhu Jinhui
Publication venue
Publication date: 12/11/2023
Field of study

In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cross-lingual transfer because of the variation of answer span positions in different languages. In this paper, we propose X-STA, a new approach for cross-lingual MRC. Specifically, we leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target. A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block. In addition, we force the model to learn semantic alignments from multiple granularities and calibrate the model outputs with teacher guidance to enhance cross-lingual transferability. Experiments on three multi-lingual MRC datasets show the effectiveness of our method, outperforming state-of-the-art approaches.Comment: emnlp 202

arXiv.org e-Print Archive