Search CORE

91,923 research outputs found

Recommended from our members

A hybrid generative/discriminative framework to train a semantic parser from an un-annotated corpus

Author: He Yulan
Zhou Deyu
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2008
Field of study

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HMSVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences

Central Archive at the University of Reading

Crossref

Open Research Online

Aston Publications Explorer

Recommended from our members

Extracting protein-protein interaction based on discriminative training of the Hidden Vctor State model

Author: He Yulan
Zhou Deyu
Publication venue
Publication date: 01/06/2008
Field of study

The knowledge about gene clusters and protein interactions is important for biological researchers to unveil the mechanism of life. However, large quantity of the knowledge often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and deep parsing, have been proposed especially for extracting protein-protein interactions (Zhou and He, 2008). A semantic parser based on the Hidden Vector State (HVS) model for extracting protein-protein interactions is presented in (Zhou et al., 2008). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. Maximum Likelihood estimation (MLE) is used to derive the parameters of the HVS model. In this paper, we propose a discriminative approach based on parse error measure to train the HVS model. To adjust the HVS model to achieve minimum parse error rate, the generalized probabilistic descent (GPD) algorithm (Kuo et al., 2002) is used. Experiments have been conducted on the GENIA corpus. The results demonstrate modest improvements when the discriminatively trained HVS model outperforms its MLE trained counterpart by 2.5% in F-measure on the GENIA corpus

Open Research Online

EAST: An Efficient and Accurate Scene Text Detector

Author: He Weiran
Liang Jiajun
Wang Yuzhi
Wen He
Yao Cong
Zhou Shuchang
Zhou Xinyu
Publication venue
Publication date: 10/07/2017
Field of study

Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.Comment: Accepted to CVPR 2017, fix equation (3

arXiv.org e-Print Archive

Crossref

On monotonicity of regression quantile functions

Author: Gauss
He
He
Koenker
Koenker
Portnoy
Portnoy
Stephen Portnoy
Tereza Neocleous
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

In the linear regression quantile model, the conditional quantile of the response, Y, given x is QY|x(τ)≡x′β(τ). Though QY|x(τ) must be monotonically increasing, the Koenker–Bassett regression quantile estimator, View the MathML source, is not monotonic outside a vanishingly small neighborhood of View the MathML source. Given a grid of mesh δn, let View the MathML source be the linear interpolation of the values of View the MathML source along the grid. We show here that for a range of rates, δn, View the MathML source will be strictly monotonic (with probability tending to one) and will be asymptotically equivalent to View the MathML source in the sense that n1/2 times the difference tends to zero at a rate depending on δn

Crossref

Enlighten

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

Author: He Bingsheng
He Jiong
Zhang Shuhao
Zhou Amelie Chi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/04/2019
Field of study

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS