38 research outputs found

    Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification

    Get PDF
    Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus

    Adult image detection combining bovw based on region of interest and color moments

    Get PDF
    Abstract. To prevent pornography from spreading on the Internet effectively, we propose a novel method of adult image detection which combines bag-ofvisual-words (BoVW) based on region of interest (ROI) and color moments (CM). The goal of BoVW is to automatically mine the local patterns of adult contents, called visual words. The usual BoVW method clusters visual words from the patches in the whole image and adopts the weighting schemes of hard assignment. However, there are many background noises in the whole image and soft-weighting scheme is better than hard assignment. Therefore, we propose the method of BoVW based on ROI, which includes two perspectives. Firstly, we propose to create visual words in ROI for adult image detection. The representative power of visual words can be improved because the patches in ROI are more indicative to adult contents than those in the whole image. Secondly, soft-weighting scheme is adopted to detect adult images. Moreover, CM is selected by evaluating some commonly-used global features to be combined with BoVW based on ROI. The experiments and the comparison with the state-of-the-art methods show that our method is able to remarkably improve the performance of adult image detection

    Insight into the partial solutionisation of a high pressure die-cast Al-Mg-Zn-Si alloy for mechanical property enhancement

    Get PDF
    Engineering and Physical Sciences Research Council, United Kingdom; National Natural Science Foundation of China; Natural Science Basic Research Plan in Shaanxi Province of Chin

    Loglinear models for word alignment

    No full text
    We present a framework for word alignment based on log-linear models. All knowledge sources are treated as feature functions, which depend on the source langauge sentence, the target language sentence and possible additional variables. Log-linear models allow statistical alignment models to be easily extended by incorporating syntactic information. In this paper, we use IBM Model 3 alignment probabilities, POS correspondence, and bilingual dictionary coverage as features. Our experiments show that log-linear models significantly outperform IBM translation models.

    Maximum entropy based phrase reordering model for statistical machine translation

    No full text
    We propose a novel reordering model for phrase-based statistical machine transla-tion (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hier-archical phrasal reordering with general-ization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reorder-ing events of neighbor blocks from bilin-gual data. In our experiments on Chinese-to-English translation, this MaxEnt-based reordering model obtains significant im-provements in BLEU score on the NIST MT-05 and IWSLT-04 tasks.

    Partial matching strategy for phrase-based statistical machine translation

    No full text
    This paper presents a partial matching strat-egy for phrase-based statistical machine trans-lation (PBSMT). Source phrases which do not appear in the training corpus can be trans-lated by word substitution according to par-tially matched phrases. The advantage of this method is that it can alleviate the data sparse-ness problem if the amount of bilingual corpus is limited. We incorporate our approach into the state-of-the-art PBSMT system Moses and achieve statistically significant improvements on both small and large corpora.

    Improving Statistical Machine Translation using Lexicalized Rule Selection

    No full text
    This paper proposes a novel lexicalized approach for rule selection for syntax-based statistical machine translation (SMT). We build maximum entropy (MaxEnt) models which combine rich context information for selecting translation rules during decoding. We successfully integrate the MaxEnt-based rule selection models into the state-of-the-art syntax-based SMT model. Experiments show that our lexicalized approach for rule selection achieves statistically significant improvements over the state-of-the-art SMT system.
    corecore