91,923 research outputs found
Recommended from our members
A hybrid generative/discriminative framework to train a semantic parser from an un-annotated corpus
We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HMSVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences
Recommended from our members
Extracting protein-protein interaction based on discriminative training of the Hidden Vctor State model
The knowledge about gene clusters and protein interactions is important for biological researchers to unveil the mechanism of life. However, large quantity of the knowledge often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and deep parsing, have been proposed especially for extracting protein-protein interactions (Zhou and He, 2008). A semantic parser based on the Hidden Vector State (HVS) model for extracting protein-protein interactions is presented in (Zhou et al., 2008). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. Maximum Likelihood estimation (MLE) is used to derive the parameters of the HVS model. In this paper, we propose a discriminative approach based on parse error measure to train the HVS model. To adjust the HVS model to achieve minimum parse error rate, the generalized probabilistic descent (GPD) algorithm (Kuo et al., 2002) is used. Experiments have been conducted on the GENIA corpus. The results demonstrate modest improvements when the discriminatively trained HVS model outperforms its MLE trained counterpart by 2.5% in F-measure on the GENIA corpus
EAST: An Efficient and Accurate Scene Text Detector
Previous approaches for scene text detection have already achieved promising
performances across various benchmarks. However, they usually fall short when
dealing with challenging scenarios, even when equipped with deep neural network
models, because the overall performance is determined by the interplay of
multiple stages and components in the pipelines. In this work, we propose a
simple yet powerful pipeline that yields fast and accurate text detection in
natural scenes. The pipeline directly predicts words or text lines of arbitrary
orientations and quadrilateral shapes in full images, eliminating unnecessary
intermediate steps (e.g., candidate aggregation and word partitioning), with a
single neural network. The simplicity of our pipeline allows concentrating
efforts on designing loss functions and neural network architecture.
Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500
demonstrate that the proposed algorithm significantly outperforms
state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR
2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps
at 720p resolution.Comment: Accepted to CVPR 2017, fix equation (3
On monotonicity of regression quantile functions
In the linear regression quantile model, the conditional quantile of the response, Y, given x is QY|x(τ)≡x′β(τ). Though QY|x(τ) must be monotonically increasing, the Koenker–Bassett regression quantile estimator, View the MathML source, is not monotonic outside a vanishingly small neighborhood of View the MathML source. Given a grid of mesh δn, let View the MathML source be the linear interpolation of the values of View the MathML source along the grid. We show here that for a range of rates, δn, View the MathML source will be strictly monotonic (with probability tending to one) and will be asymptotically equivalent to View the MathML source in the sense that n1/2 times the difference tends to zero at a rate depending on δn
BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures
We introduce BriskStream, an in-memory data stream processing system (DSPSs)
specifically designed for modern shared-memory multicore architectures.
BriskStream's key contribution is an execution plan optimization paradigm,
namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair
of producer-consumer operators into consideration. We propose a branch and
bound based approach with three heuristics to resolve the resulting nontrivial
optimization problem. The experimental evaluations demonstrate that BriskStream
yields much higher throughput and better scalability than existing DSPSs on
multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1
- …