42 research outputs found

    Enhancing scarce-resource language translation through pivot combinations

    Get PDF
    Chinese and Spanish are the most spoken languages in the world. However, there is not much research done in machine translation for this language pair. We experiment with the parallel Chinese-Spanish corpus (United Nations) to explore alternatives of SMT strategies which consist on using a pivot language. Particularly, two well-known alternatives are shown for pivoting: the cascade system and the pseudo-corpus. As Pivot language we use English, Arabic and French. Results show that English is the best pivot language between Chinese and Spanish. As a new strategy, we propose to perform a combination of the pivot strategies which is capable to highly outperform the direct translation strategy.Postprint (published version

    Research on multi-modal sentiment feature learning of social media content

    Get PDF
    社交媒体已成为现代社会舆论交流和信息传递的主要平台。针对社交媒体的情感分析对于舆论监控、商业产品导向和股市预测等都具有重大应用价值。但社交媒体内容的多模态性(文本、图片等)让传统的单模态情感分析方法面临许多局限,多模态情感分析技术对跨媒体内容的理解与分析具有重大的理论价值。 多模态情感分析区别于单模态方法的关键问题在于,如何综合利用形态各异的多模态情感信息,来获取整体的情感倾向性,同时考虑单个模态本身在情感表达上的性质。针对该问题,利用社交媒体上的多模态内容在情感表达上所具有的关联性、抽象层级性的特点,提出了一套面向社交媒体的多模态情感特征学习与融合方法,实现多模态情感分析,主要内容和创新点...Social media has become a main platform of public communication and information transmission. Therefore, social media sentiment analysis has great application values in many fields, such as public opinion monitoring, production marking, stock forecasting and so on. But the multi-modal characteristic of social media content (e.g. texts and images) significantly challenges traditional text-based sen...学位:工学硕士院系专业:信息科学与技术学院_模式识别与智能系统学号:3152013115327

    Extracting spatial relations from document for geographic information retrieval

    Get PDF
    IEEE Geoscience and Remote Sensing Society (IEEE GRSS); East China Norm. Univ., Sch. Resour. Environ. Sci.; Shanghai Urban Dev. Inf. Res. Cent.; The Geographical Society of Shanghai; East China Univ. Sci. Technol., Bus. Sch.<span class="MedBlackText">Geographic information retrieval (GIR) is developed to retrieve geographical information from unstructured text (commonly web documents). Previous researches focus on applying traditional information retrieval (IR) techniques to GIR, such as ranking geographic relevance by vector space model (VSM). In many cases, these keyword-based methods can not support spatial query very well. For example, searching documents on &quot;debris flow took place in Hunan last year&quot;, the documents selected in this way may only contain the words &quot;debris flow&quot; and &quot;Hunan&quot; rather than refer to &quot;debris&quot; flow actually occurred in &quot;Hunan&quot;. Lack of spatial relations between thematic activates (debris flow) and geographic entities (Hunan) is the key reason for this problem. In this paper, we present a kernel-based approach and apply it in support vector machine (SVM) to extract spatial relations from free text for further GIS service and spatial reasoning. First, we analyze the characters of spatial relation expressions in natural language and there are two types of spatial relations: topology and direction. Both of them are used to qualitatively describe the relative positions of spatial objects to each other. Then we explore the use of dependency tree (a dependency tree represents the grammatical dependencies in a sentence and it can be generated by syntax parser) to identify these spatial relations. We observe that the features required to find a relationship between two spatial named entities in the same sentence is typically captured by the shortest path between the two entities in the dependency tree. Therefore, we construct a shortest path dependency kernel for SVM to complete the task. The experiment results show that our dependency tree kernel achieves significant improvement than previous method. </span

    Fault-Tolerant Learning for Term Extraction

    Get PDF

    Which is More Suitable for Chinese Word Segmentation, the Generative Model or the Discriminative One?

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer

    Get PDF
    PACLIC 20 / Wuhan, China / 1-3 November, 200
    corecore