Search CORE

105,050 research outputs found

An Instance Transfer based Approach Using Enhanced Recurrent Neural Network for Domain Named Entity Recognition

Author: Li Lin
Sun Yueqing
Publication venue
Publication date: 09/10/2018
Field of study

Recently, neural networks have shown promising results for named entity recognition (NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labelled data (source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labelled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labelled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labelled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs bette than others. When there is no labelled data in domain target, compared to directly using the source domain labelled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure

arXiv.org e-Print Archive

Set-Based Tests for Genetic Association Using the Generalized Berk-Jones Statistic

Author: Lin Xihong
Sun Ryan
Publication venue
Publication date: 10/10/2017
Field of study

Studying the effects of groups of Single Nucleotide Polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases, above that which can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with single-SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk-Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk-Jones (BJ) statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism (GHC) test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation procedure for SNP-sets of any finite size. Using this p-value calculation, we illustrate that the rejection region for GBJ can be described as a compromise of those for BJ and GHC. We develop an omnibus statistic as well, and we show that this omnibus test is robust to the degree of signal sparsity. An additional advantage of our method is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study. We evaluate the finite sample performance of the GBJ though simulation studies and application to gene-level association analysis of breast cancer risk.Comment: Corrected typos in abstrac

arXiv.org e-Print Archive

Tur\'an's problem and Ramsey numbers for trees

Author: Sun Zhi-Hong
Wang Lin-Lin
Wu Yi-Li
Publication venue
Publication date: 03/05/2015
Field of study

Let

T_n^1=(V,E_1)

and

T_n^2=(V,E_2)

be the trees on

n

vertices with

V=\{v_0,v_1,\ldots,v_{n-1}\}

E_1=\{v_0v_1,\ldots,v_0v_{n-3},v_{n-4}v_{n-2},v_{n-3}v_{n-1}\}

, and

E_2=\{v_0v_1,\ldots,

v_0v_{n-3},v_{n-3}v_{n-2}, v_{n-3}v_{n-1}\}

. In this paper, for

p\ge n\ge 5

we obtain explicit formulas for \ex(p;T_n^1) and \ex(p;T_n^2), where \ex(p;L) denotes the maximal number of edges in a graph of order

p

not containing

L

as a subgraph. Let r(G\sb 1, G\sb 2) be the Ramsey number of the two graphs

G_1

and

G_2

. In this paper we also obtain some explicit formulas for

r(T_m,T_n^i)

, where

i\in\{1,2\}

and

T_m

is a tree on

m

vertices with

\Delta(T_m)\le m-3

.Comment: 21 page

arXiv.org e-Print Archive

On the Efficiency of Solving Boolean Polynomial Systems with the Characteristic Set Method

Author: Huang Zhenyu
Lin Dongdai
Sun Yao
Publication venue: 'Elsevier BV'
Publication date: 10/11/2019
Field of study

An improved characteristic set algorithm for solving Boolean polynomial systems is proposed. This algorithm is based on the idea of converting all the polynomials into monic ones by zero decomposition, and using additions to obtain pseudo-remainders. Three important techniques are applied in the algorithm. The first one is eliminating variables by new generated linear polynomials. The second one is optimizing the strategy of choosing polynomial for zero decomposition. The third one is to compute add-remainders to eliminate the leading variable of new generated monic polynomials. By analyzing the depth of the zero decomposition tree, we present some complexity bounds of this algorithm, which are lower than the complexity bounds of previous characteristic set algorithms. Extensive experimental results show that this new algorithm is more efficient than previous characteristic set algorithms for solving Boolean polynomial systems

arXiv.org e-Print Archive

Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels

Author: Lin Zhouchen
Sun Ke
Zhu Zhanxing
Publication venue
Publication date: 20/02/2020
Field of study

Graph Convolutional Networks(GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised(M3S) Training Algorithm, combined with self-supervised learning approach, focusing on improving the generalization performance of GCNs on graphs with few labeled nodes. Firstly, a Multi-Stage Training Framework is provided as the basis of M3S training method. Then we leverage DeepCluster technique, a popular form of self-supervised learning, and design corresponding aligning mechanism on the embedding space to refine the Multi-Stage Training Framework, resulting in M3S Training Algorithm. Finally, extensive experimental results verify the superior performance of our algorithm on graphs with few labeled nodes under different label rates compared with other state-of-the-art approaches.Comment: AAAI Conference on Artificial Intelligence (AAAI 2020

arXiv.org e-Print Archive

Verification of mixing properties in two-dimensional shifts of finite type

Author: Ban Jung-Chao
Hu Wen-Guei
Lin Song-Sun
Lin Yin-Heng
Publication venue
Publication date: 24/03/2015
Field of study

The degree of mixing is a fundamental property of a dynamical system. General multi-dimensional shifts cannot be systematically determined. This work introduces constructive and systematic methods for verifying the degree of mixing, from topological mixing to strong specification (or strong irreducibility) for two-dimensional shifts of finite type. First, transition matrices on infinite strips of width

n

are introduced for all

n\geq 2

. To determine the primitivity of the transition matrices, connecting operators are introduced to reduce the order of high-order transition matrices to yield lower-order transition matrices. Two sufficient conditions for primitivity are provided; they are invariant diagonal cycles and primitive commutative cycles of connecting operators. After primitivity is established, the corner-extendability and crisscross-extendability are used to demonstrate topological mixing. In addition, the hole-filling condition yields the strong specification. All mentioned conditions can be verified to apply in a finite number of steps

arXiv.org e-Print Archive

The natural measure of a symbolic dynamical system

Author: Hu Wen-Guei
Lin Song-Sun
Publication venue
Publication date: 13/08/2013
Field of study

This study investigates the natural or intrinsic measure of a symbolic dynamical system

\Sigma

. The measure

\mu([i_{1},i_{2},...,i_{n}])

of a pattern

[i_{1},i_{2},...,i_{n}]

\Sigma

is an asymptotic ratio of

[i_{1},i_{2},...,i_{n}]

, which arises in all patterns of length

n

within very long patterns, such that in a typical long pattern, the pattern

[i_{1},i_{2},...,i_{n}]

appears with frequency

\mu([i_{1},i_{2},...,i_{n}])

. When

\Sigma=\Sigma(A)

is a shift of finite type and

A

is an irreducible

N\times N

non-negative matrix, the measure

\mu

is the Parry measure.

\mu

is ergodic with maximum entropy. The result holds for sofic shift

\mathcal{G}=(G,\mathcal{L})

, which is irreducible. The result can be extended to

\Sigma(A)

, where

A

is a countably infinite matrix that is irreducible, aperiodic and positive recurrent. By using the Krieger cover, the natural measure of a general shift space is studied in the way of a countably infinite state of sofic shift, including context free shift. The Perron-Frobenius Theorem for non-negative matrices plays an essential role in this study

arXiv.org e-Print Archive

MVW-extensions of real quaternionic classical groups

Author: Lin Yanan
Sun Binyong
Tan Shaobin
Publication venue
Publication date: 10/11/2011
Field of study

Let

G

be a real quaternionic classical group \GL_n(\bH), \Sp(p,q) or \oO^*(2n). We define an extension

\breve G

G

with the following property: it contains

G

as a subgroup of index two, and for every

x\in G

, there is an element

\breve g\in \breve G\setminus G

such that

\breve g x\breve{g}^{-1}=x^{-1}

. This is similar to Moeglin-Vigneras-Waldspurger's extensions of non-quaternionic classical groups

arXiv.org e-Print Archive

Towards Understanding Adversarial Examples Systematically: Exploring Data Size, Task and Model Factors

Author: Lin Zhouchen
Sun Ke
Zhu Zhanxing
Publication venue
Publication date: 28/02/2019
Field of study

Most previous works usually explained adversarial examples from several specific perspectives, lacking relatively integral comprehension about this problem. In this paper, we present a systematic study on adversarial examples from three aspects: the amount of training data, task-dependent and model-specific factors. Particularly, we show that adversarial generalization (i.e. test accuracy on adversarial examples) for standard training requires more data than standard generalization (i.e. test accuracy on clean examples); and uncover the global relationship between generalization and robustness with respect to the data size especially when data is augmented by generative models. This reveals the trade-off correlation between standard generalization and robustness in limited training data regime and their consistency when data size is large enough. Furthermore, we explore how different task-dependent and model-specific factors influence the vulnerability of deep neural networks by extensive empirical analysis. Relevant recommendations on defense against adversarial attacks are provided as well. Our results outline a potential path towards the luminous and systematic understanding of adversarial examples

arXiv.org e-Print Archive

ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning

Author: Bischl Bernd
Lin Jiali
Sun Xudong
Publication venue
Publication date: 10/04/2019
Field of study

Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search

arXiv.org e-Print Archive