46 research outputs found

    بناء أداة تفاعلية متعددة اللغات لاسترجاع المعلومات

    Get PDF
    The growing requirement on the Internet have made users access to the information expressed in a language other than their own , which led to Cross lingual information retrieval (CLIR) .CLIR is established as a major topic in Information Retrieval (IR). One approach to CLIR uses different methods of translation to translate queries to documents and indexes in other languages. As queries submitted to search engines suffer lack of untranslatable query keys (i.e., words that the dictionary is missing) and translation ambiguity, which means difficulty in choosing between alternatives of translation. Our approach in this thesis is to build and develop the software tool (MORTAJA-IR-TOOL) , a new tool for retrieving information using programming JAVA language with JDK 1.6. This tool has many features, which is develop multiple systematic languages system to be use as a basis for translation when using CLIR, as well as the process of stemming the words entered in the query process as a stage preceding the translation process. The evaluation of the proposed methodology translator of the query comparing it with the basic translation that uses readable dictionary automatically the percentage of improvement is 8.96%. The evaluation of the impact of the process of stemming the words entered in the query on the quality of the output process in the retrieval of matched data in other process the rate of improvement is 4.14%. Finally the rated output of the merger between the use of stemming methodology proposed and translation process (MORTAJA-IR-TOOL) which concluded that the proportion of advanced in the process of improvement in data rate of retrieval is 15.86%. Keywords: Cross lingual information retrieval, CLIR, Information Retrieval, IR, Translation, stemming.الاحتياجات المتنامية على شبكة الإنترنت جعلت المستخدمين لهم حق الوصول إلى المعلومات بلغة غير لغتهم الاصلية، مما يقودنا الى مصطلح عبور اللغات لاسترجاع المعلومات (CLIR). CLIR أنشئت كموضوع رئيسي في "استرجاع المعلومات" (IR). نهج واحد ل CLIR يستخدم أساليب مختلفة للترجمة ومنها لترجمة الاستعلامات وترجمة الوثائق والفهارس في لغات أخرى. الاستفسارات والاستعلامات المقدمة لمحركات البحث تعاني من عدم وجود ترجمه لمفاتيح الاستعلام (أي أن العبارة مفقودة من القاموس) وايضا تعاني من غموض الترجمة، مما يعني صعوبة في الاختيار بين بدائل الترجمة. في نهجنا في هذه الاطروحة تم بناء وتطوير الأداة البرمجية (MORTAJA-IR-TOOL) أداة جديدة لاسترجاع المعلومات باستخدام لغة البرمجة JAVA مع JDK 1.6، وتمتلك هذه الأداة العديد من الميزات، حيث تم تطوير منظومة منهجية متعددة اللغات لاستخدامها كأساس للترجمة عند استخدام CLIR، وكذلك عملية تجذير للكلمات المدخلة في عملية الاستعلام كمرحلة تسبق عملية الترجمة. وتم تقييم الترجمة المنهجية المقترحة للاستعلام ومقارنتها مع الترجمة الأساسية التي تستخدم قاموس مقروء اليا كأساس للترجمة في تجربة تركز على المستخدم وكانت نسبة التحسين 8.96% , وكذلك يتم تقييم مدى تأثير عملية تجذير الكلمات المدخلة في عملية الاستعلام على جودة المخرجات في عملية استرجاع البيانات المتطابقة باللغة الاخرى وكانت نسبة التحسين 4.14% , وفي النهاية تم تقييم ناتج عملية الدمج بين استخدام التجذير والترجمة المنهجية المقترحة (MORTAJA-IR-TOOL) والتي خلصت الى نسبة متقدمة في عملية التحسين في نسبة البيانات المرجعة وكانت 15.86%

    Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering

    Full text link
    This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm

    Transfer Learning in Natural Language Processing through Interactive Feedback

    Get PDF
    Machine learning models cannot easily adapt to new domains and applications. This drawback becomes detrimental for natural language processing (NLP) because language is perpetually changing. Across disciplines and languages, there are noticeable differences in content, grammar, and vocabulary. To overcome these shifts, recent NLP breakthroughs focus on transfer learning. Through clever optimization and engineering, a model can successfully adapt to a new domain or task. However, these modifications are still computationally inefficient or resource-intensive. Compared to machines, humans are more capable at generalizing knowledge across different situations, especially in low-resource ones. Therefore, the research on transfer learning should carefully consider how the user interacts with the model. The goal of this dissertation is to investigate “human-in-the-loop” approaches for transfer learning in NLP. First, we design annotation frameworks for inductive transfer learning, which is the transfer of models across tasks. We create an interactive topic modeling system for users to find topics useful for classifying documents in multiple languages. The user-constructed topic model bridges improves classification accuracy and bridges cross-lingual gaps in knowledge. Next, we look at popular language models, like BERT, that can be applied to various tasks. While these models are useful, they still require a large amount of labeled data to learn a new task. To reduce labeling, we develop an active learning strategy which samples documents that surprise the language model. Users only need to annotate a small subset of these unexpected documents to adapt the language model for text classification. Then, we transition to user interaction in transductive transfer learning, which is the transfer of models across domains. We focus our efforts on low-resource languages to develop an interactive system for word embeddings. In this approach, the feedback from bilingual speakers refines the cross-lingual embedding space for classification tasks. Subsequently, we look at domain shift for tasks beyond text classification. Coreference resolution is fundamental for NLP applications, like question-answering and dialogue, but the models are typically trained and evaluated on one dataset. We use active learning to find spans of text in the new domain for users to label. Furthermore, we provide important insights on annotating spans for domain adaptation. Finally, we summarize the contributions of each chapter. We focus on aspects like the scope of applications and model complexity. We conclude with a discussion of future directions. Researchers may extend the ideas in our thesis to topics like user-centric active learning and proactive learning

    Ranking and Retrieval under Semantic Relevance

    Get PDF
    This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed. Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering. Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions

    Perspectives on Public Policy in Societal-Environmental Crises

    Get PDF
    This is an open access book. Histories we tell never emerge in a vacuum, and history as an academic discipline that studies the past is highly sensitive to the concerns of the present and the heated debates that can divide entire societies. But does the study of the past also have something to teach us about the future? Can history help us in coping with the planetary crisis we are now facing? By analyzing historical societies as complex adaptive systems, we contribute to contemporary thinking about societal-environmental interactions in policy and planning and consider how environmental and climatic changes, whether sudden high impact events or more subtle gradual changes, impacted human responses in the past. We ask how societal perceptions of such changes affect behavioral patterns and explanatory rationalities in premodernity, and whether a better historical understanding of these relationships can inform our response to contemporary problems of similar nature and magnitude, such as adapting to climate change

    HPV vaccination: knowledge, attitudes and beliefs in the Chinese population

    Get PDF
    Introduction Cervical cancer is the fourth most common cancer in women worldwide. An estimated 62,000 cases of cervical cancer occur annually in China, accounting for 12% of global incidence. Virtually all cervical cancers are related to infection by Human Papilloma Virus (HPV): effective HPV vaccines have been developed and vaccination programmes introduced in many countries over the last decade. Given the burden of cervical cancer in China, it is imperative that effective primary and secondary prevention strategies are introduced. Effective introduction of HPV vaccination programmes will require education and information strategies that are informed by a comprehensive understanding of the knowledge, attitudes and beliefs about HPV infection and its relationship to cervical cancer in the Chinese population. Aims and objectives The aims of my thesis are: 1) to systematically review the evidence from the Chineselanguage literature in relation to knowledge of and attitude towards HPV infection and HPV vaccination, and 2) to explore knowledge and attitudes about HPV infection, HPV vaccination and cervical screening amongst teenagers in Heilongjiang province in China. Methods I undertook a systematic literature review using two electronic Chinese databases – the ‘Chinese National Knowledge Infrastructure’ (CNKI) database and the ‘Wanfang’ database. These were searched from inception through November 30th 2012: MeSH terms were applied to both Chinese databases. Manual searching of relevant online journals was also undertaken. Following selection of papers based on pre-determined inclusion and exclusion criteria, quality assessment was carried out using a modified quality assessment checklist, and included studies were classified as good, fair or poor quality. Due to heterogeneity of populations and survey instruments a narrative approach was adopted for data synthesis. I also undertook a questionnaire survey of high-school students in China. Questions were designed based on the Health Belief Model, informed by findings from my systematic review, and refined through cognitive interviews prior to field work in early 2014. The survey targeted students in five public high schools in one middle-income city (Mudanjiang city) and two small counties (Ning’an and Hailin) of Heilongjiang province; 3788 young people aged 14-22 years participated. Descriptive statistical analysis was used to summarise demographic characteristics; initially differences were identified using the chi-square test. Factor analysis was applied to identify attitude patterns and logistic regression analysis models were applied to determine the association between attitude (potential predictors) and acceptability, attitude and levels of knowledge. Results Forty seven articles met my inclusion criteria and were included in the systematic review. All included studies were published between 2006 and 2011; all were cross-sectional questionnaire surveys with sample sizes ranging from 100 – 9,865. The quality of included studies varied considerably. Included populations ranged from the general public, to young people, and health professionals. Awareness of HPV and knowledge of the relationship between HPV and cervical cancer, and of the sexually transmitted nature of HPV, were the main issues examined. Awareness of HPV was low among all non-health professionals groups. Similarly, understanding of the relationship between HPV infection and cervical cancer and of the sexually transmitted nature of HPV was low. However, significant differences in awareness and knowledge were found, based on urban/rural status, ethnicity and age. Uighur women had the lowest awareness and knowledge levels, followed by rural women adults, and teenagers. Acceptability of HPV vaccination varied in terms of the vaccine target recipients (whether adult women, or for their daughters), and between health professionals and the general public). Reported levels of HPV vaccine acceptability (for women adults themselves and for their daughters) were higher in North China compared to South China. Health professionals were less willing to accept the vaccine for their daughters than they were to receiving it themselves. The cost, source and appropriate age for HPV vaccination were also frequently examined issues. Importantly, a high proportion of the health professionals believed that the appropriate age for vaccine was over 18 years old for girls. 3788 participants aged 14-22 years were included in the questionnaire survey, with 54% females and 20% urban students. Overall awareness of HPV was 13.2% and acceptability of the HPV vaccine was 68%. Knowledge levels varied in different content areas; for example 74% of respondents knew that HPV vaccination is not 100% effective against cervical cancer while only 6% knew that poor personal hygiene did not increase the risk of contracting HPV infection. Attitudes towards HPV infection and vaccination were also interesting and novel; the greatest concern about HPV vaccination was minor side effects (72%). The highest-rated source of recommendations about HPV vaccination was parents (66%), while there were concerns expressed about ‘gossip’ in relation to HPV vaccination (51%). No urban/rural differences were found in knowledge and attitudes - gender differences existed, but depended on specific circumstances. Participants who were willing to accept HPV vaccination were more likely to be influenced by others, to report high perceived severity of HPV and cervical cancer, to perceive benefits of HPV vaccination and to score well on knowledge questions. Participants with high knowledge scores for HPV infection and vaccination were more likely to consider HPV infection and cervical cancer to be serious, and were less likely to associate HPV infection with stigma. Participants who had high levels of awareness of HPV infection were more likely to be influenced by others in relation to accepting HPV vaccination. Discussion My thesis has produced new and novel findings in relation to HPV vaccination knowledge, attitudes and beliefs in China. Low levels of awareness and knowledge amongst Chinese people may be influenced by traditional Chinese culture, which perhaps makes people more reluctant to consider issues related to sexual practices. Another possible explanation is that people tended to under-report knowledge of HPV when answering the questions in the survey in order to conform to social norms in China - these topics are highly sensitive in China. High levels of acceptability of HPV vaccines may have also been influenced by ‘ways of thinking’ among Chinese people; their natural inclination is to accept all recommendations for vaccination from government agencies – so they may not have thought hard about this choice. There is optimism in the Chinese population that cancer can be prevented by vaccination – indeed, they are inclined to believe it will prevent disease that can generate serious health impacts in the future. Nevertheless, some Chinese people have conservative attitudes towards the effectiveness of HPV vaccination and some suspicion of the drug companies which produce these vaccines. There were significant methodological issues in my comparisons of Western and Chinese literature. Western literature is more likely to comprise good quality studies – typically there are better-defined sampling frames, more valid and reliable instruments and robust theoretical frameworks. The difference in quality between Chinese and Western literature arises from the stricter rules for reporting and evaluation in western publications and the relatively low publishing standards in Chinese literature. My thesis also details a number of methodological issues which arose in conducting my questionnaire survey – ideally, I would like to follow up the work I have done with a multi-centre population-based study among teenagers in China (an idea which I will pursue once I return to China). This would hopefully provide better quality information on the influences of factors such as socio-economic status and family background in determining acceptability of HPV vaccination. Nevertheless, my relatively modest, school-based study has, I believe, produced results which add to the information available to health care planners and policy makers in the field of HPV vaccination in China. Conclusion My systematic review is, to my knowledge, the first to identify and synthesise findings about knowledge of and attitude towards HPV infection and vaccination in the Chinese literature – as such, it addresses a gap in currently available evidence. Although there are methodological limitations in Chinese literature (with more poor quality studies), the results still have implications for further health education intervention programmes and health policy. My questionnaire survey was also a ‘first’ in many ways – it explored attitudes towards HPV vaccines based on Health Belief Model among Chinese teenagers and examined HPV related stigma among mainland Chinese teenagers. Low levels of awareness and knowledge and conservative attitudes towards sexually related infections suggest the impact of Chinese traditional culture and a range of other social and financial constraints in China. Hence, there is a great deal to be done before HPV vaccination can be implemented in China – there are educational needs, and in many areas societal and cultural attitudes need to be challenged. Significant changes are also need in government policy and investment – these are major challenges for health care in China, and I sincerely hope my thesis will contribute to these important debates

    Um estudo comparativo das abordagens de detecção e reconhecimento de texto para cenários de computação restrita

    Get PDF
    Orientadores: Ricardo da Silva Torres, Allan da Silva PintoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Textos são elementos fundamentais para uma efetiva comunicação em nosso cotidiano. A mobilidade de pessoas e veículos em ambientes urbanos e a busca por um produto de interesse em uma prateleira de supermercado são exemplos de atividades em que o entendimento dos elementos textuais presentes no ambiente são essenciais para a execução da tarefa. Recentemente, diversos avanços na área de visão computacional têm sido reportados na literatura, com o desenvolvimento de algoritmos e métodos que objetivam reconhecer objetos e textos em cenas. Entretanto, a detecção e reconhecimento de textos são problemas considerados em aberto devido a diversos fatores que atuam como fontes de variabilidades durante a geração e captura de textos em cenas, o que podem impactar as taxas de detecção e reconhecimento de maneira significativa. Exemplo destes fatores incluem diferentes formas dos elementos textuais (e.g., circular ou em linha curva), estilos e tamanhos da fonte, textura, cor, variação de brilho e contraste, entre outros. Além disso, os recentes métodos considerados estado-da-arte, baseados em aprendizagem profunda, demandam altos custos de processamento computacional, o que dificulta a utilização de tais métodos em cenários de computação restritiva. Esta dissertação apresenta um estudo comparativo de técnicas de detecção e reconhecimento de texto, considerando tanto os métodos baseados em aprendizado profundo quanto os métodos que utilizam algoritmos clássicos de aprendizado de máquina. Esta dissertação também apresenta um método de fusão de caixas delimitadoras, baseado em programação genética (GP), desenvolvido para atuar tanto como uma etapa de pós-processamento, posterior a etapa de detecção, quanto para explorar a complementariedade dos algoritmos de detecção de texto investigados nesta dissertação. De acordo com o estudo comparativo apresentado neste trabalho, os métodos baseados em aprendizagem profunda são mais eficazes e menos eficientes, em comparação com os métodos clássicos da literatura e considerando as métricas adotadas. Além disso, o algoritmo de fusão proposto foi capaz de aprender informações complementares entre os métodos investigados nesta dissertação, o que resultou em uma melhora das taxas de precisão e revocação. Os experimentos foram conduzidos considerando os problemas de detecção de textos horizontais, verticais e de orientação arbitráriaAbstract: Texts are fundamental elements for effective communication in our daily lives. The mobility of people and vehicles in urban environments and the search for a product of interest on a supermarket shelf are examples of activities in which the understanding of the textual elements present in the environment is essential to succeed in such tasks. Recently, several advances in computer vision have been reported in the literature, with the development of algorithms and methods that aim to recognize objects and texts in scenes. However, text detection and recognition are still open problems due to several factors that act as sources of variability during scene text generation and capture, which can significantly impact detection and recognition rates of current algorithms. Examples of these factors include different shapes of textual elements (e.g., circular or curved), font styles and sizes, texture, color, brightness and contrast variation, among others. Besides, recent state-of-the-art methods based on deep learning demand high computational processing costs, which difficult their use in restricted computing scenarios. This dissertation presents a comparative study of text detection and recognition techniques, considering methods based on deep learning and methods that use classical machine learning algorithms. This dissertation also presents an algorithm for fusing bounding boxes, based on genetic programming (GP), developed to act as a post-processing step for a single text detector and to explore the complementarity of text detection algorithms investigated in this dissertation. According to the comparative study presented in this work, the methods based on deep learning are more effective and less efficient, in comparison to classic methods for text detection investigated in this work, considering the adopted metrics. Furthermore, the proposed GP-based fusion algorithm was able to learn complementary information from the methods investigated in this dissertation, which resulted in an improvement of precision and recall rates. The experiments were conducted considering text detection problems involving horizontal, vertical and arbitrary orientationsMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Arabic named entity recognition

    Full text link
    En esta tesis doctoral se describen las investigaciones realizadas con el objetivo de determinar las mejores tecnicas para construir un Reconocedor de Entidades Nombradas en Arabe. Tal sistema tendria la habilidad de identificar y clasificar las entidades nombradas que se encuentran en un texto arabe de dominio abierto. La tarea de Reconocimiento de Entidades Nombradas (REN) ayuda a otras tareas de Procesamiento del Lenguaje Natural (por ejemplo, la Recuperacion de Informacion, la Busqueda de Respuestas, la Traduccion Automatica, etc.) a lograr mejores resultados gracias al enriquecimiento que a~nade al texto. En la literatura existen diversos trabajos que investigan la tarea de REN para un idioma especifico o desde una perspectiva independiente del lenguaje. Sin embargo, hasta el momento, se han publicado muy pocos trabajos que estudien dicha tarea para el arabe. El arabe tiene una ortografia especial y una morfologia compleja, estos aspectos aportan nuevos desafios para la investigacion en la tarea de REN. Una investigacion completa del REN para elarabe no solo aportaria las tecnicas necesarias para conseguir un alto rendimiento, sino que tambien proporcionara un analisis de los errores y una discusion sobre los resultados que benefician a la comunidad de investigadores del REN. El objetivo principal de esta tesis es satisfacer esa necesidad. Para ello hemos: 1. Elaborado un estudio de los diferentes aspectos del arabe relacionados con dicha tarea; 2. Analizado el estado del arte del REN; 3. Llevado a cabo una comparativa de los resultados obtenidos por diferentes tecnicas de aprendizaje automatico; 4. Desarrollado un metodo basado en la combinacion de diferentes clasificadores, donde cada clasificador trata con una sola clase de entidades nombradas y emplea el conjunto de caracteristicas y la tecnica de aprendizaje automatico mas adecuados para la clase de entidades nombradas en cuestion. Nuestros experimentos han sido evaluados sobre nueve conjuntos de test.Benajiba, Y. (2009). Arabic named entity recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8318Palanci

    Introduction: Ways of Machine Seeing

    Get PDF
    How do machines, and, in particular, computational technologies, change the way we see the world? This special issue brings together researchers from a wide range of disciplines to explore the entanglement of machines and their ways of seeing from new critical perspectives. This 'editorial' is for a special issue of AI & Society, which includes contributions from: María Jesús Schultz Abarca, Peter Bell, Tobias Blanke, Benjamin Bratton, Claudio Celis Bueno, Kate Crawford, Iain Emsley, Abelardo Gil-Fournier, Daniel Chávez Heras, Vladan Joler, Nicolas Malevé, Lev Manovich, Nicholas Mirzoeff, Perle Møhl, Bruno Moreschi, Fabian Offert, Trevor Paglan, Jussi Parikka, Luciana Parisi, Matteo Pasquinelli, Gabriel Pereira, Carloalberto Treccani, Rebecca Uliasz, and Manuel van der Veen
    corecore