46 research outputs found
بناء أداة تفاعلية متعددة اللغات لاسترجاع المعلومات
The growing requirement on the Internet have made users access to the information expressed in a language other than their own , which led to Cross lingual information retrieval (CLIR) .CLIR is established as a major topic in Information Retrieval (IR). One approach to CLIR uses different methods of translation to translate queries to documents and indexes in other languages. As queries submitted to search engines suffer lack of untranslatable query keys (i.e., words that the dictionary is missing) and translation ambiguity, which means difficulty in choosing between alternatives of translation. Our approach in this thesis is to build and develop the software tool (MORTAJA-IR-TOOL) , a new tool for retrieving information using programming JAVA language with JDK 1.6. This tool has many features, which is develop multiple systematic languages system to be use as a basis for translation when using CLIR, as well as the process of stemming the words entered in the query process as a stage preceding the translation process. The evaluation of the proposed methodology translator of the query comparing it with the basic translation that uses readable dictionary automatically the percentage of improvement is 8.96%. The evaluation of the impact of the process of stemming the words entered in the query on the quality of the output process in the retrieval of matched data in other process the rate of improvement is 4.14%. Finally the rated output of the merger between the use of stemming methodology proposed and translation process (MORTAJA-IR-TOOL) which concluded that the proportion of advanced in the process of improvement in data rate of retrieval is 15.86%. Keywords: Cross lingual information retrieval, CLIR, Information Retrieval, IR, Translation, stemming.الاحتياجات المتنامية على شبكة الإنترنت جعلت المستخدمين لهم حق الوصول إلى المعلومات بلغة غير لغتهم الاصلية، مما يقودنا الى مصطلح عبور اللغات لاسترجاع المعلومات (CLIR). CLIR أنشئت كموضوع رئيسي في "استرجاع المعلومات" (IR). نهج واحد ل CLIR يستخدم أساليب مختلفة للترجمة ومنها لترجمة الاستعلامات وترجمة الوثائق والفهارس في لغات أخرى. الاستفسارات والاستعلامات المقدمة لمحركات البحث تعاني من عدم وجود ترجمه لمفاتيح الاستعلام (أي أن العبارة مفقودة من القاموس) وايضا تعاني من غموض الترجمة، مما يعني صعوبة في الاختيار بين بدائل الترجمة. في نهجنا في هذه الاطروحة تم بناء وتطوير الأداة البرمجية (MORTAJA-IR-TOOL) أداة جديدة لاسترجاع المعلومات باستخدام لغة البرمجة JAVA مع JDK 1.6، وتمتلك هذه الأداة العديد من الميزات، حيث تم تطوير منظومة منهجية متعددة اللغات لاستخدامها كأساس للترجمة عند استخدام CLIR، وكذلك عملية تجذير للكلمات المدخلة في عملية الاستعلام كمرحلة تسبق عملية الترجمة. وتم تقييم الترجمة المنهجية المقترحة للاستعلام ومقارنتها مع الترجمة الأساسية التي تستخدم قاموس مقروء اليا كأساس للترجمة في تجربة تركز على المستخدم وكانت نسبة التحسين 8.96% , وكذلك يتم تقييم مدى تأثير عملية تجذير الكلمات المدخلة في عملية الاستعلام على جودة المخرجات في عملية استرجاع البيانات المتطابقة باللغة الاخرى وكانت نسبة التحسين 4.14% , وفي النهاية تم تقييم ناتج عملية الدمج بين استخدام التجذير والترجمة المنهجية المقترحة (MORTAJA-IR-TOOL) والتي خلصت الى نسبة متقدمة في عملية التحسين في نسبة البيانات المرجعة وكانت 15.86%
Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering
This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm
Transfer Learning in Natural Language Processing through Interactive Feedback
Machine learning models cannot easily adapt to new domains and applications. This drawback becomes detrimental for natural language processing (NLP) because language is perpetually changing. Across disciplines and languages, there are noticeable differences in content, grammar, and vocabulary. To overcome these shifts, recent NLP breakthroughs focus on transfer learning. Through clever optimization and engineering, a model can successfully adapt to a new domain or task. However, these modifications are still computationally inefficient or resource-intensive. Compared to machines, humans are more capable at generalizing knowledge across different situations, especially in low-resource ones. Therefore, the research on transfer learning should carefully consider how the user interacts with the model. The goal of this dissertation is to investigate “human-in-the-loop” approaches for transfer learning in NLP.
First, we design annotation frameworks for inductive transfer learning, which is the transfer of models across tasks. We create an interactive topic modeling system for users to find topics useful for classifying documents in multiple languages. The user-constructed topic model bridges improves classification accuracy and bridges cross-lingual gaps in knowledge. Next, we look at popular language models, like BERT, that can be applied to various tasks. While these models are useful, they still require a large amount of labeled data to learn a new task. To reduce labeling, we develop an active learning strategy which samples documents that surprise the language model. Users only need to annotate a small subset of these unexpected documents to adapt the language model for text classification.
Then, we transition to user interaction in transductive transfer learning, which is the transfer of models across domains. We focus our efforts on low-resource languages to develop an interactive system for word embeddings. In this approach, the feedback from bilingual speakers refines the cross-lingual embedding space for classification tasks. Subsequently, we look at domain shift for tasks beyond text classification. Coreference resolution is fundamental for NLP applications, like question-answering and dialogue, but the models are typically trained and evaluated on one dataset. We use active learning to find spans of text in the new domain for users to label. Furthermore, we provide important insights on annotating spans for domain adaptation.
Finally, we summarize the contributions of each chapter. We focus on aspects like the scope of applications and model complexity. We conclude with a discussion of future directions. Researchers may extend the ideas in our thesis to topics like user-centric active learning and proactive learning
Ranking and Retrieval under Semantic Relevance
This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed.
Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering.
Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions
Perspectives on Public Policy in Societal-Environmental Crises
This is an open access book. Histories we tell never emerge in a vacuum, and history as an academic discipline that studies the past is highly sensitive to the concerns of the present and the heated debates that can divide entire societies. But does the study of the past also have something to teach us about the future? Can history help us in coping with the planetary crisis we are now facing? By analyzing historical societies as complex adaptive systems, we contribute to contemporary thinking about societal-environmental interactions in policy and planning and consider how environmental and climatic changes, whether sudden high impact events or more subtle gradual changes, impacted human responses in the past. We ask how societal perceptions of such changes affect behavioral patterns and explanatory rationalities in premodernity, and whether a better historical understanding of these relationships can inform our response to contemporary problems of similar nature and magnitude, such as adapting to climate change
HPV vaccination: knowledge, attitudes and beliefs in the Chinese population
Introduction
Cervical cancer is the fourth most common cancer in women worldwide. An estimated
62,000 cases of cervical cancer occur annually in China, accounting for 12% of global
incidence. Virtually all cervical cancers are related to infection by Human Papilloma Virus
(HPV): effective HPV vaccines have been developed and vaccination programmes
introduced in many countries over the last decade. Given the burden of cervical cancer in
China, it is imperative that effective primary and secondary prevention strategies are
introduced. Effective introduction of HPV vaccination programmes will require education
and information strategies that are informed by a comprehensive understanding of the
knowledge, attitudes and beliefs about HPV infection and its relationship to cervical cancer
in the Chinese population.
Aims and objectives
The aims of my thesis are: 1) to systematically review the evidence from the Chineselanguage
literature in relation to knowledge of and attitude towards HPV infection and HPV
vaccination, and 2) to explore knowledge and attitudes about HPV infection, HPV
vaccination and cervical screening amongst teenagers in Heilongjiang province in China.
Methods
I undertook a systematic literature review using two electronic Chinese databases – the
‘Chinese National Knowledge Infrastructure’ (CNKI) database and the ‘Wanfang’ database.
These were searched from inception through November 30th 2012: MeSH terms were
applied to both Chinese databases. Manual searching of relevant online journals was also
undertaken. Following selection of papers based on pre-determined inclusion and exclusion
criteria, quality assessment was carried out using a modified quality assessment checklist,
and included studies were classified as good, fair or poor quality. Due to heterogeneity of
populations and survey instruments a narrative approach was adopted for data synthesis.
I also undertook a questionnaire survey of high-school students in China. Questions were
designed based on the Health Belief Model, informed by findings from my systematic
review, and refined through cognitive interviews prior to field work in early 2014. The
survey targeted students in five public high schools in one middle-income city (Mudanjiang
city) and two small counties (Ning’an and Hailin) of Heilongjiang province; 3788 young
people aged 14-22 years participated. Descriptive statistical analysis was used to summarise
demographic characteristics; initially differences were identified using the chi-square test.
Factor analysis was applied to identify attitude patterns and logistic regression analysis
models were applied to determine the association between attitude (potential predictors) and
acceptability, attitude and levels of knowledge.
Results
Forty seven articles met my inclusion criteria and were included in the systematic review.
All included studies were published between 2006 and 2011; all were cross-sectional
questionnaire surveys with sample sizes ranging from 100 – 9,865. The quality of included
studies varied considerably. Included populations ranged from the general public, to young
people, and health professionals. Awareness of HPV and knowledge of the relationship
between HPV and cervical cancer, and of the sexually transmitted nature of HPV, were the
main issues examined. Awareness of HPV was low among all non-health professionals
groups. Similarly, understanding of the relationship between HPV infection and cervical
cancer and of the sexually transmitted nature of HPV was low. However, significant
differences in awareness and knowledge were found, based on urban/rural status, ethnicity
and age. Uighur women had the lowest awareness and knowledge levels, followed by rural
women adults, and teenagers.
Acceptability of HPV vaccination varied in terms of the vaccine target recipients (whether
adult women, or for their daughters), and between health professionals and the general
public). Reported levels of HPV vaccine acceptability (for women adults themselves and for
their daughters) were higher in North China compared to South China. Health professionals
were less willing to accept the vaccine for their daughters than they were to receiving it
themselves. The cost, source and appropriate age for HPV vaccination were also frequently
examined issues. Importantly, a high proportion of the health professionals believed that the
appropriate age for vaccine was over 18 years old for girls.
3788 participants aged 14-22 years were included in the questionnaire survey, with 54%
females and 20% urban students. Overall awareness of HPV was 13.2% and acceptability of
the HPV vaccine was 68%. Knowledge levels varied in different content areas; for example
74% of respondents knew that HPV vaccination is not 100% effective against cervical cancer
while only 6% knew that poor personal hygiene did not increase the risk of contracting HPV
infection. Attitudes towards HPV infection and vaccination were also interesting and novel;
the greatest concern about HPV vaccination was minor side effects (72%). The highest-rated
source of recommendations about HPV vaccination was parents (66%), while there were
concerns expressed about ‘gossip’ in relation to HPV vaccination (51%). No urban/rural
differences were found in knowledge and attitudes - gender differences existed, but
depended on specific circumstances. Participants who were willing to accept HPV
vaccination were more likely to be influenced by others, to report high perceived severity of
HPV and cervical cancer, to perceive benefits of HPV vaccination and to score well on
knowledge questions. Participants with high knowledge scores for HPV infection and
vaccination were more likely to consider HPV infection and cervical cancer to be serious,
and were less likely to associate HPV infection with stigma. Participants who had high levels
of awareness of HPV infection were more likely to be influenced by others in relation to
accepting HPV vaccination.
Discussion
My thesis has produced new and novel findings in relation to HPV vaccination knowledge,
attitudes and beliefs in China. Low levels of awareness and knowledge amongst Chinese
people may be influenced by traditional Chinese culture, which perhaps makes people more
reluctant to consider issues related to sexual practices. Another possible explanation is that
people tended to under-report knowledge of HPV when answering the questions in the
survey in order to conform to social norms in China - these topics are highly sensitive in
China.
High levels of acceptability of HPV vaccines may have also been influenced by ‘ways of
thinking’ among Chinese people; their natural inclination is to accept all recommendations
for vaccination from government agencies – so they may not have thought hard about this
choice. There is optimism in the Chinese population that cancer can be prevented by
vaccination – indeed, they are inclined to believe it will prevent disease that can generate
serious health impacts in the future. Nevertheless, some Chinese people have conservative
attitudes towards the effectiveness of HPV vaccination and some suspicion of the drug
companies which produce these vaccines.
There were significant methodological issues in my comparisons of Western and Chinese
literature. Western literature is more likely to comprise good quality studies – typically there
are better-defined sampling frames, more valid and reliable instruments and robust
theoretical frameworks. The difference in quality between Chinese and Western literature
arises from the stricter rules for reporting and evaluation in western publications and the
relatively low publishing standards in Chinese literature.
My thesis also details a number of methodological issues which arose in conducting my
questionnaire survey – ideally, I would like to follow up the work I have done with a multi-centre
population-based study among teenagers in China (an idea which I will pursue once I
return to China). This would hopefully provide better quality information on the influences
of factors such as socio-economic status and family background in determining acceptability
of HPV vaccination. Nevertheless, my relatively modest, school-based study has, I believe,
produced results which add to the information available to health care planners and policy
makers in the field of HPV vaccination in China.
Conclusion
My systematic review is, to my knowledge, the first to identify and synthesise findings about
knowledge of and attitude towards HPV infection and vaccination in the Chinese literature –
as such, it addresses a gap in currently available evidence. Although there are
methodological limitations in Chinese literature (with more poor quality studies), the results
still have implications for further health education intervention programmes and health
policy.
My questionnaire survey was also a ‘first’ in many ways – it explored attitudes towards HPV
vaccines based on Health Belief Model among Chinese teenagers and examined HPV related
stigma among mainland Chinese teenagers. Low levels of awareness and knowledge and
conservative attitudes towards sexually related infections suggest the impact of Chinese
traditional culture and a range of other social and financial constraints in China. Hence, there
is a great deal to be done before HPV vaccination can be implemented in China – there are
educational needs, and in many areas societal and cultural attitudes need to be challenged.
Significant changes are also need in government policy and investment – these are major
challenges for health care in China, and I sincerely hope my thesis will contribute to these
important debates
Um estudo comparativo das abordagens de detecção e reconhecimento de texto para cenários de computação restrita
Orientadores: Ricardo da Silva Torres, Allan da Silva PintoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Textos são elementos fundamentais para uma efetiva comunicação em nosso cotidiano. A mobilidade de pessoas e veículos em ambientes urbanos e a busca por um produto de interesse em uma prateleira de supermercado são exemplos de atividades em que o entendimento dos elementos textuais presentes no ambiente são essenciais para a execução da tarefa. Recentemente, diversos avanços na área de visão computacional têm sido reportados na literatura, com o desenvolvimento de algoritmos e métodos que objetivam reconhecer objetos e textos em cenas. Entretanto, a detecção e reconhecimento de textos são problemas considerados em aberto devido a diversos fatores que atuam como fontes de variabilidades durante a geração e captura de textos em cenas, o que podem impactar as taxas de detecção e reconhecimento de maneira significativa. Exemplo destes fatores incluem diferentes formas dos elementos textuais (e.g., circular ou em linha curva), estilos e tamanhos da fonte, textura, cor, variação de brilho e contraste, entre outros. Além disso, os recentes métodos considerados estado-da-arte, baseados em aprendizagem profunda, demandam altos custos de processamento computacional, o que dificulta a utilização de tais métodos em cenários de computação restritiva. Esta dissertação apresenta um estudo comparativo de técnicas de detecção e reconhecimento de texto, considerando tanto os métodos baseados em aprendizado profundo quanto os métodos que utilizam algoritmos clássicos de aprendizado de máquina. Esta dissertação também apresenta um método de fusão de caixas delimitadoras, baseado em programação genética (GP), desenvolvido para atuar tanto como uma etapa de pós-processamento, posterior a etapa de detecção, quanto para explorar a complementariedade dos algoritmos de detecção de texto investigados nesta dissertação. De acordo com o estudo comparativo apresentado neste trabalho, os métodos baseados em aprendizagem profunda são mais eficazes e menos eficientes, em comparação com os métodos clássicos da literatura e considerando as métricas adotadas. Além disso, o algoritmo de fusão proposto foi capaz de aprender informações complementares entre os métodos investigados nesta dissertação, o que resultou em uma melhora das taxas de precisão e revocação. Os experimentos foram conduzidos considerando os problemas de detecção de textos horizontais, verticais e de orientação arbitráriaAbstract: Texts are fundamental elements for effective communication in our daily lives. The mobility of people and vehicles in urban environments and the search for a product of interest on a supermarket shelf are examples of activities in which the understanding of the textual elements present in the environment is essential to succeed in such tasks. Recently, several advances in computer vision have been reported in the literature, with the development of algorithms and methods that aim to recognize objects and texts in scenes. However, text detection and recognition are still open problems due to several factors that act as sources of variability during scene text generation and capture, which can significantly impact detection and recognition rates of current algorithms. Examples of these factors include different shapes of textual elements (e.g., circular or curved), font styles and sizes, texture, color, brightness and contrast variation, among others. Besides, recent state-of-the-art methods based on deep learning demand high computational processing costs, which difficult their use in restricted computing scenarios. This dissertation presents a comparative study of text detection and recognition techniques, considering methods based on deep learning and methods that use classical machine learning algorithms. This dissertation also presents an algorithm for fusing bounding boxes, based on genetic programming (GP), developed to act as a post-processing step for a single text detector and to explore the complementarity of text detection algorithms investigated in this dissertation. According to the comparative study presented in this work, the methods based on deep learning are more effective and less efficient, in comparison to classic methods for text detection investigated in this work, considering the adopted metrics. Furthermore, the proposed GP-based fusion algorithm was able to learn complementary information from the methods investigated in this dissertation, which resulted in an improvement of precision and recall rates. The experiments were conducted considering text detection problems involving horizontal, vertical and arbitrary orientationsMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE
Arabic named entity recognition
En esta tesis doctoral se describen las investigaciones realizadas con el objetivo de determinar
las mejores tecnicas para construir un Reconocedor de Entidades Nombradas
en Arabe. Tal sistema tendria la habilidad de identificar y clasificar las entidades
nombradas que se encuentran en un texto arabe de dominio abierto.
La tarea de Reconocimiento de Entidades Nombradas (REN) ayuda a otras tareas de
Procesamiento del Lenguaje Natural (por ejemplo, la Recuperacion de Informacion, la
Busqueda de Respuestas, la Traduccion Automatica, etc.) a lograr mejores resultados
gracias al enriquecimiento que a~nade al texto. En la literatura existen diversos trabajos
que investigan la tarea de REN para un idioma especifico o desde una perspectiva
independiente del lenguaje. Sin embargo, hasta el momento, se han publicado muy
pocos trabajos que estudien dicha tarea para el arabe.
El arabe tiene una ortografia especial y una morfologia compleja, estos aspectos aportan
nuevos desafios para la investigacion en la tarea de REN. Una investigacion completa
del REN para elarabe no solo aportaria las tecnicas necesarias para conseguir
un alto rendimiento, sino que tambien proporcionara un analisis de los errores y una
discusion sobre los resultados que benefician a la comunidad de investigadores del
REN. El objetivo principal de esta tesis es satisfacer esa necesidad. Para ello hemos:
1. Elaborado un estudio de los diferentes aspectos del arabe relacionados con dicha
tarea;
2. Analizado el estado del arte del REN;
3. Llevado a cabo una comparativa de los resultados obtenidos por diferentes
tecnicas de aprendizaje automatico;
4. Desarrollado un metodo basado en la combinacion de diferentes clasificadores,
donde cada clasificador trata con una sola clase de entidades nombradas y emplea
el conjunto de caracteristicas y la tecnica de aprendizaje automatico mas
adecuados para la clase de entidades nombradas en cuestion.
Nuestros experimentos han sido evaluados sobre nueve conjuntos de test.Benajiba, Y. (2009). Arabic named entity recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8318Palanci
Introduction: Ways of Machine Seeing
How do machines, and, in particular, computational technologies, change the way we see the world? This special issue brings together researchers from a wide range of disciplines to explore the entanglement of machines and their ways of seeing from new critical perspectives.
This 'editorial' is for a special issue of AI & Society, which includes contributions from: María Jesús Schultz Abarca, Peter Bell, Tobias Blanke, Benjamin Bratton, Claudio Celis Bueno, Kate Crawford, Iain Emsley, Abelardo Gil-Fournier, Daniel Chávez Heras, Vladan Joler, Nicolas Malevé, Lev Manovich, Nicholas Mirzoeff, Perle Møhl, Bruno Moreschi, Fabian Offert, Trevor Paglan, Jussi Parikka, Luciana Parisi, Matteo Pasquinelli, Gabriel Pereira, Carloalberto Treccani, Rebecca Uliasz, and Manuel van der Veen