105,050 research outputs found

    An Instance Transfer based Approach Using Enhanced Recurrent Neural Network for Domain Named Entity Recognition

    Full text link
    Recently, neural networks have shown promising results for named entity recognition (NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labelled data (source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labelled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labelled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labelled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs bette than others. When there is no labelled data in domain target, compared to directly using the source domain labelled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure

    Set-Based Tests for Genetic Association Using the Generalized Berk-Jones Statistic

    Full text link
    Studying the effects of groups of Single Nucleotide Polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases, above that which can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with single-SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk-Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk-Jones (BJ) statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism (GHC) test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation procedure for SNP-sets of any finite size. Using this p-value calculation, we illustrate that the rejection region for GBJ can be described as a compromise of those for BJ and GHC. We develop an omnibus statistic as well, and we show that this omnibus test is robust to the degree of signal sparsity. An additional advantage of our method is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study. We evaluate the finite sample performance of the GBJ though simulation studies and application to gene-level association analysis of breast cancer risk.Comment: Corrected typos in abstrac

    Tur\'an's problem and Ramsey numbers for trees

    Full text link
    Let Tn1=(V,E1)T_n^1=(V,E_1) and Tn2=(V,E2)T_n^2=(V,E_2) be the trees on nn vertices with V={v0,v1,…,vnβˆ’1}V=\{v_0,v_1,\ldots,v_{n-1}\}, E1={v0v1,…,v0vnβˆ’3,vnβˆ’4vnβˆ’2,vnβˆ’3vnβˆ’1}E_1=\{v_0v_1,\ldots,v_0v_{n-3},v_{n-4}v_{n-2},v_{n-3}v_{n-1}\}, and E2={v0v1,…,E_2=\{v_0v_1,\ldots, v0vnβˆ’3,vnβˆ’3vnβˆ’2,vnβˆ’3vnβˆ’1}v_0v_{n-3},v_{n-3}v_{n-2}, v_{n-3}v_{n-1}\}. In this paper, for pβ‰₯nβ‰₯5p\ge n\ge 5 we obtain explicit formulas for \ex(p;T_n^1) and \ex(p;T_n^2), where \ex(p;L) denotes the maximal number of edges in a graph of order pp not containing LL as a subgraph. Let r(G\sb 1, G\sb 2) be the Ramsey number of the two graphs G1G_1 and G2G_2. In this paper we also obtain some explicit formulas for r(Tm,Tni)r(T_m,T_n^i), where i∈{1,2}i\in\{1,2\} and TmT_m is a tree on mm vertices with Ξ”(Tm)≀mβˆ’3\Delta(T_m)\le m-3.Comment: 21 page

    On the Efficiency of Solving Boolean Polynomial Systems with the Characteristic Set Method

    Full text link
    An improved characteristic set algorithm for solving Boolean polynomial systems is proposed. This algorithm is based on the idea of converting all the polynomials into monic ones by zero decomposition, and using additions to obtain pseudo-remainders. Three important techniques are applied in the algorithm. The first one is eliminating variables by new generated linear polynomials. The second one is optimizing the strategy of choosing polynomial for zero decomposition. The third one is to compute add-remainders to eliminate the leading variable of new generated monic polynomials. By analyzing the depth of the zero decomposition tree, we present some complexity bounds of this algorithm, which are lower than the complexity bounds of previous characteristic set algorithms. Extensive experimental results show that this new algorithm is more efficient than previous characteristic set algorithms for solving Boolean polynomial systems

    Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels

    Full text link
    Graph Convolutional Networks(GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised(M3S) Training Algorithm, combined with self-supervised learning approach, focusing on improving the generalization performance of GCNs on graphs with few labeled nodes. Firstly, a Multi-Stage Training Framework is provided as the basis of M3S training method. Then we leverage DeepCluster technique, a popular form of self-supervised learning, and design corresponding aligning mechanism on the embedding space to refine the Multi-Stage Training Framework, resulting in M3S Training Algorithm. Finally, extensive experimental results verify the superior performance of our algorithm on graphs with few labeled nodes under different label rates compared with other state-of-the-art approaches.Comment: AAAI Conference on Artificial Intelligence (AAAI 2020

    Verification of mixing properties in two-dimensional shifts of finite type

    Full text link
    The degree of mixing is a fundamental property of a dynamical system. General multi-dimensional shifts cannot be systematically determined. This work introduces constructive and systematic methods for verifying the degree of mixing, from topological mixing to strong specification (or strong irreducibility) for two-dimensional shifts of finite type. First, transition matrices on infinite strips of width nn are introduced for all nβ‰₯2n\geq 2. To determine the primitivity of the transition matrices, connecting operators are introduced to reduce the order of high-order transition matrices to yield lower-order transition matrices. Two sufficient conditions for primitivity are provided; they are invariant diagonal cycles and primitive commutative cycles of connecting operators. After primitivity is established, the corner-extendability and crisscross-extendability are used to demonstrate topological mixing. In addition, the hole-filling condition yields the strong specification. All mentioned conditions can be verified to apply in a finite number of steps

    The natural measure of a symbolic dynamical system

    Full text link
    This study investigates the natural or intrinsic measure of a symbolic dynamical system Ξ£\Sigma. The measure ΞΌ([i1,i2,...,in])\mu([i_{1},i_{2},...,i_{n}]) of a pattern [i1,i2,...,in][i_{1},i_{2},...,i_{n}] in Ξ£\Sigma is an asymptotic ratio of [i1,i2,...,in][i_{1},i_{2},...,i_{n}], which arises in all patterns of length nn within very long patterns, such that in a typical long pattern, the pattern [i1,i2,...,in][i_{1},i_{2},...,i_{n}] appears with frequency ΞΌ([i1,i2,...,in])\mu([i_{1},i_{2},...,i_{n}]). When Ξ£=Ξ£(A)\Sigma=\Sigma(A) is a shift of finite type and AA is an irreducible NΓ—NN\times N non-negative matrix, the measure ΞΌ\mu is the Parry measure. ΞΌ\mu is ergodic with maximum entropy. The result holds for sofic shift G=(G,L)\mathcal{G}=(G,\mathcal{L}), which is irreducible. The result can be extended to Ξ£(A)\Sigma(A), where AA is a countably infinite matrix that is irreducible, aperiodic and positive recurrent. By using the Krieger cover, the natural measure of a general shift space is studied in the way of a countably infinite state of sofic shift, including context free shift. The Perron-Frobenius Theorem for non-negative matrices plays an essential role in this study

    MVW-extensions of real quaternionic classical groups

    Full text link
    Let GG be a real quaternionic classical group \GL_n(\bH), \Sp(p,q) or \oO^*(2n). We define an extension G˘\breve G of GG with the following property: it contains GG as a subgroup of index two, and for every x∈Gx\in G, there is an element g˘∈GΛ˜βˆ–G\breve g\in \breve G\setminus G such that g˘xgΛ˜βˆ’1=xβˆ’1\breve g x\breve{g}^{-1}=x^{-1}. This is similar to Moeglin-Vigneras-Waldspurger's extensions of non-quaternionic classical groups

    Towards Understanding Adversarial Examples Systematically: Exploring Data Size, Task and Model Factors

    Full text link
    Most previous works usually explained adversarial examples from several specific perspectives, lacking relatively integral comprehension about this problem. In this paper, we present a systematic study on adversarial examples from three aspects: the amount of training data, task-dependent and model-specific factors. Particularly, we show that adversarial generalization (i.e. test accuracy on adversarial examples) for standard training requires more data than standard generalization (i.e. test accuracy on clean examples); and uncover the global relationship between generalization and robustness with respect to the data size especially when data is augmented by generative models. This reveals the trade-off correlation between standard generalization and robustness in limited training data regime and their consistency when data size is large enough. Furthermore, we explore how different task-dependent and model-specific factors influence the vulnerability of deep neural networks by extensive empirical analysis. Relevant recommendations on defense against adversarial attacks are provided as well. Our results outline a potential path towards the luminous and systematic understanding of adversarial examples

    ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning

    Full text link
    Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search
    • …
    corecore