Search CORE

1,419 research outputs found

A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound

Author: Han Jiawei
Ji Ming
Jin Rong
Lin Binbin
Yang Tianbao
Publication venue
Publication date: 01/01/2012
Field of study

In this work, we develop a simple algorithm for semi-supervised regression. The key idea is to use the top eigenfunctions of integral operator derived from both labeled and unlabeled examples as the basis functions and learn the prediction function by a simple linear regression. We show that under appropriate assumptions about the integral operator, this approach is able to achieve an improved regression error bound better than existing bounds of supervised learning. We also verify the effectiveness of the proposed algorithm by an empirical study.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

Global dynamics in a chemotaxis model describing tumor angiogenesis with/without mitosis in any dimensions

Author: Chu Jiawei
Jin Haiyang
Xiang Tian
Publication venue
Publication date: 21/06/2021
Field of study

In this work, we study the Neumann initial boundary value problem for a three-component chemotaxis model in any dimensional bounded and smooth domains; this model is used to describe the branching of capillary sprouts during angiogenesis. First, we find three qualitatively simple sufficient conditions for qualitative global boundedness, and then, we establish two types of global stability for bounded solutions in qualitative ways. As a consequence of our findings, the underlying system without chemotaxis and the effect of ECs mitosis can not give rise to pattern formations. Our findings quantify and extend significantly previous studies, which are set in lower dimensional convex domains and are with no qualitative information.Comment: 43 pages, under review in a journa

arXiv.org e-Print Archive

Configuration Tool and Experimental Platform for Pointing Devices

Author: Jin Jiawei
Publication venue: Helsingin yliopisto
Publication date: 01/01/2014
Field of study

In user studies of human-computer interaction, experiments on new devices and techniques are often made on experiment software, which is developed separately for each device and technique. A systematic experimental platform, capable of running experiments on a number of technologies, would facilitate the design and implementation of such experiments. To do this, a configurable framework was created to allow relative pointing and absolute pointing input to be enhanced with adaptive pointing and smoothed pointing techniques. This thesis discusses both the internals of the framework as well as how a platform is developed based on the framework. Additionally, two calibration modules were designed to transform the relative pointing input to absolute pointing and obtain the necessary parameters which will be applied in smoothed pointing. As a part of the deployment, the experiment module was made to provide a platform which allowed the enhanced pointing experience to be evaluated and generated proper output according to the results of the experiment task. One key achievement presented in this thesis is that the relative pointing devices are integrable with adaptive pointing and smoothed pointing which support for absolute pointing devices in general. Another key result presented in this thesis is that the configurable framework based experimental platform provides proper functions which meet the demands of professional pointing evaluation. ACM Computing Classification System (CCS): I.4.1 [Digitization and Image Capture]: Camera calibration, I.4.3 [Enhancement]: Smoothing, I.4.8 [Scene Analysis]: Trackin

Helsingin yliopiston digitaalinen arkisto

HOW CAN PRODUCT TEXT SNIPPETS BENEFIT FROM ONLINE CUSTOMER REVIEWS?

Author: Lei Jiawei
Ren Ming
Wei Qiang
Zhang Jin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2014
Field of study

Product text snippets should highlight the product features that are appealing to customers. Nevertheless, the features in current product snippets mainly are often decided based on the understanding of vendors or advertisers, and may fail to contain the features appealing to customers. This paper investigates how product text snippets generation can benefit from online customer reviews. In doing so, an automated method is designed, in which features and the opinions are extracted from online reviews, and are further used for product text snippet generation. To verify the effectiveness of the proposed method, we conduct two experiments and the results show that the extracted features and the snippet are effective in inviting potential customers, compared with the baseline ones. Experimental results demonstrate that 1) the extracted features are more appealing to customers; and 2) the snippets generated based on the extracted features are more likely to be clicked

AIS Electronic Library (AISeL)

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks

Author: Han Jiawei
Jin Bowen
Zhang Yu
Zhu Qi
Publication venue
Publication date: 04/06/2023
Field of study

Representation learning on networks aims to derive a meaningful vector representation for each node, thereby facilitating downstream tasks such as link prediction, node classification, and node clustering. In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure. As pretrained language models (PLMs) have demonstrated their effectiveness in obtaining widely generalizable text representations, a substantial amount of effort has been made to incorporate PLMs into representation learning on text-rich networks. However, few of them can jointly consider heterogeneous structure (network) information as well as rich textual semantic information of each node effectively. In this paper, we propose Heterformer, a Heterogeneous Network-Empowered Transformer that performs contextualized text encoding and heterogeneous structure encoding in a unified model. Specifically, we inject heterogeneous structure information into each Transformer layer when encoding node texts. Meanwhile, Heterformer is capable of characterizing node/edge type heterogeneity and encoding nodes with or without texts. We conduct comprehensive experiments on three tasks (i.e., link prediction, node classification, and node clustering) on three large-scale datasets from different domains, where Heterformer outperforms competitive baselines significantly and consistently.Comment: KDD 2023. (Code: https://github.com/PeterGriffinJin/Heterformer

arXiv.org e-Print Archive

Using simulated Tianqin gravitational wave data and electromagnetic wave data to study the coincidence problem and Hubble tension problem

Author: Cheng MingYue
Diao Jingwang
Li Jin
Pan Yu
Zhang JiaWei
Publication venue: 'IOP Publishing'
Publication date: 11/01/2023
Field of study

In this paper, we use electromagnetic wave data (H0LiCOW,

H(z)

, SNe) and gravitational wave data (Tianqin) to constrain the interacting dark energy (IDE) model and investigate the Hubble tension problem and coincidences problem. By combining these four kinds of data (Tianqin+H0LiCOW+SNe+

H(z)

), we obtained the parameter values at the confidence interval of

1\sigma

\Omega_m=0.36\pm0.18

\omega_x=-1.29^{+0.61}_{-0.23}

\xi=3.15^{+0.36}_{-1.1}

, and

H_0=70.04\pm0.42

kms^{-1}Mpc^{-1}

. According to our results, the best valve of

H_0

show that the Hubble tension problem can be alleviated to some extent. In addition, the

\xi+3\omega_x = -0.72^{+2.19}_{-1.19}(1\sigma)

of which the center value indicates the coincidence problem is slightly alleviated. However, the

\xi+3\omega_x = 0

is still within the

1\sigma

error range which indicates the

\Lambda

CDM model is still the model which is in best agreement with the observational data at present. Finally, we compare the constraint results of electromagnetic wave and gravitational wave on the model parameters and find that the constraint effect of electromagnetic wave data on model parameters is better than that of simulated Tianqin gravitational wave data.Comment: The article has been accepted by Chinese Physics

arXiv.org e-Print Archive

"Why Should I Review This Paper?" Unifying Semantic, Topic, and Citation Factors for Paper-Reviewer Matching

Author: Chen Xiusi
Han Jiawei
Jin Bowen
Shen Yanzhen
Zhang Yu
Publication venue
Publication date: 22/10/2023
Field of study

As many academic conferences are overwhelmed by a rapidly increasing number of paper submissions, automatically finding appropriate reviewers for each submission becomes a more urgent need than ever. Various factors have been considered by previous attempts on this task to measure the expertise relevance between a paper and a reviewer, including whether the paper is semantically close to, shares topics with, and cites previous papers of the reviewer. However, the majority of previous studies take only one of these factors into account, leading to an incomprehensive evaluation of paper-reviewer relevance. To bridge this gap, in this paper, we propose a unified model for paper-reviewer matching that jointly captures semantic, topic, and citation factors. In the unified model, a contextualized language model backbone is shared by all factors to learn common knowledge, while instruction tuning is introduced to characterize the uniqueness of each factor by producing factor-aware paper embeddings. Experiments on four datasets (one of which is newly contributed by us) across different fields, including machine learning, computer vision, information retrieval, and data mining, consistently validate the effectiveness of our proposed UniPR model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models

arXiv.org e-Print Archive

Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift

Author: Fan Jianqing
Ge Jiawei
Jin Chi
Ma Cong
Tang Shange
Publication venue
Publication date: 27/11/2023
Field of study

A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios

arXiv.org e-Print Archive

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

Author: Han Jiawei
Jin Bowen
Meng Yu
Zhang Yu
Zhu Qi
Publication venue
Publication date: 07/02/2023
Field of study

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature. Scientific literature tagging is beyond a pure multi-label text classification task because papers on the Web are prevalently accompanied by metadata information such as venues, authors, and references, which may serve as additional signals to infer relevant tags. Although there have been studies making use of metadata in academic paper classification, their focus is often restricted to one or two scientific fields (e.g., computer science and biomedicine) and to one specific model. In this work, we systematically study the effect of metadata on scientific literature tagging across 19 fields. We select three representative multi-label classifiers (i.e., a bag-of-words model, a sequence-based model, and a pre-trained language model) and explore their performance change in scientific literature tagging when metadata are fed to the classifiers as additional features. We observe some ubiquitous patterns of metadata's effects across all fields (e.g., venues are consistently beneficial to paper tagging in almost all cases), as well as some unique patterns in fields other than computer science and biomedicine, which are not explored in previous studies.Comment: 11 pages; Accepted to WWW 202

arXiv.org e-Print Archive