967 research outputs found

    Enhancement of lexical concepts using cross-lingual web mining

    Full text link

    Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

    Get PDF
    Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    A Survey of Biological Entity Recognition Approaches

    Get PDF
    There has been growing interest in the task of Named Entity Recognition (NER) and a lot of research has been done in this direction in last two decades. Particularly, a lot of progress has been made in the biomedical domain with emphasis on identifying domain-specific entities and often the task being known as Biological Named Entity Recognition (BER). The task of biological entity recognition (BER) has been proved to be a challenging task due to several reasons as identified by many researchers. The recognition of biological entities in text and the extraction of relationships between them have paved the way for doing more complex text-mining tasks and building further applications. This paper looks at the challenges perceived by the researchers in BER task and investigates the works done in the domain of BER by using the multiple approaches available for the task

    Experiences in evaluating multilingual and text-image information retrieval

    Get PDF
    23 pages, 8 figures.One important step during the development of information retrieval (IR) processes is the evaluation of the output regarding the information needs of the user. The "high quality" of the output is related to the integration of different methods to be applied in the IR process and the information included in the retrieved documents, but how can "quality" be measured? Although some of these methods can be tested in a stand-alone way, it is not always clear what will happen when several methods are integrated. For this reason, much effort has been put into establishing a good combination of several methods or to correctly tuning some of the algorithms involved. The current approach is to measure the precision and recall figures yielded when different combinations of methods are included in an IR process. In this article, a short description of the current techniques and methods included in an IR system is given, paying special attention to the multilingual aspect of the problem. Also a discussion of their influence on the final performance of the IR process is presented by explaining previous experiences in the evaluation process followed in two projects (MIRACLE and OmniPaper) related to multilingual information retrieval.This work has been partially supported by the projects OmniPaper (European Union, 5th Framework Programme for Research and Technological Development, IST-2001-32174), NEDINE (E-Content project Ref.: 22225), and GPS Project—Software Process Management Platform: modeling, reuse, and measurement (National Research Plan, TIN2004-07083).Publicad

    Mining Social Media and Structured Data in Urban Environmental Management to Develop Smart Cities

    Get PDF
    This research presented the deployment of data mining on social media and structured data in urban studies. We analyzed urban relocation, air quality and traffic parameters on multicity data as early work. We applied the data mining techniques of association rules, clustering and classification on urban legislative history. Results showed that data mining could produce meaningful knowledge to support urban management. We treated ordinances (local laws) and the tweets about them as indicators to assess urban policy and public opinion. Hence, we conducted ordinance and tweet mining including sentiment analysis of tweets. This part of the study focused on NYC with a goal of assessing how well it heads towards a smart city. We built domain-specific knowledge bases according to widely accepted smart city characteristics, incorporating commonsense knowledge sources for ordinance-tweet mapping. We developed decision support tools on multiple platforms using the knowledge discovered to guide urban management. Our research is a concrete step in harnessing the power of data mining in urban studies to enhance smart city development
    • …
    corecore