232 research outputs found
Cross-Lingual and Cross-Chronological Information Access to Multilingual Historical Documents
In this chapter, we present our work in realizing information access across different languages and periods. Nowadays, digital collections of historical documents have to handle materials written in many different languages in different time periods. Even in a particular language, there are significant differences over time in terms of grammar, vocabulary and script. Our goal is to develop a method to access digital collections in a wide range of periods from ancient to modern. We introduce an information extraction method for digitized ancient Mongolian historical manuscripts for reducing labour-intensive analysis. The proposed method performs computerized analysis on Mongolian historical documents. Named entities such as personal names and place names are extracted by employing support vector machine. The extracted named entities are utilized to create a digital edition that reflects an ancient Mongolian historical manuscript written in traditional Mongolian script. The Text Encoding Initiative guidelines are adopted to encode the named entities, transcriptions and interpretations of ancient words. A web-based prototype system is developed for utilizing digital editions of ancient Mongolian historical manuscripts as scholarly tools. The proposed prototype has the capability to display and search traditional Mongolian text and its transliteration in Latin letters along with the highlighted named entities and the scanned images of the source manuscript
Studies on User Intent Analysis and Mining
Predicting the goals of users can be extremely useful in e-commerce, online entertainment, information retrieval, and many other online services and applications. In this thesis, we study the task of user intent understanding, trying to bridge the gap between user expressions to online services and their goals behind it. As far as we know, most of the existing user intent studies are focusing on web search and social media domain. Studies on other areas are not enough. For example, as people more and more rely our daily life on cellphone, our information needs expressing to mobile devices and related services are increasing dramatically. Studies of user intent mining on mobile devices are not much. And the intentions of using mobile devices are different from the ones we use web search engine or social network. So we cannot directly apply the existing user intention to this area. Besides, user's intents are not stable but changing over time. And different interests will impact each other. Modeling such kind of dynamic user interests can help accurately understand and predict user's intent. But there're few existing works in this area. Moreover, user intent could be explicitly or implicitly expressed by users. The implicit intent expression is more close to human's natural language and also have great value to recognize and mine. To make further studies of these challenges, we first try to answer the question of “What is the user intent?” By referring amount of previous studies, we give our definition of user intent as “User intent is a task-specific, predefined or latent concept, topic or knowledge-base that is under an expression from a user who is trying to express his goal of information or service need.“ Then, we focus on the driving scenario when a user using cellphone and study the user intent in this domain. As far as we know, it is the first time of user intent analysis and categorization in this domain. And we also build a dataset of user input and related intent category and attributes by crowdsourcing and carefully handcraft. With the user intent taxonomy and dataset in hand, we conduct a user intent classification and user intent attribute recognition by supervised machine learning models. To classify the user intent for a user intent query, we use a convolutional neural network model to build a multi-class classifier. And then we use a sequential labeling method to recognize the intent attribute in the query. The experiment results show that our proposed method outperforms several baseline models in precision, recall, and F-score. In addition, we study the implicit user intent mining method through web search log data. By using a Restricted Boltzmann Machine, we make use of the correlation of query and click information to learn the latent intent behind a user web search. We propose a user intent prediction model on online discussion forum using Multivariate Hawkes Process. It dynamically models user intentions change and interact over time.The method models both of the internal and external factors of user's online forum response motivations, and also integrated the time decay fact of user's interests. We also present a data visualization method, using an enriched domain ontology to highlight the domain-specific words and entity relations within an article.Ph.D., Information Studies -- Drexel University, 201
Recommended from our members
Where are you talking about? Advances and Challenges of Geographic Analysis of Text with Application to Disease Monitoring
The Natural Language Processing task we focus on in this thesis is Geoparsing. Geoparsing is the process of extraction and grounding of toponyms (place names). Consider this sentence: "The victims of the Spanish earthquake off the coast of Malaga were of American and Mexican origin." Four toponyms will be extracted (called Geotagging) and grounded to their geographic coordinates (called Toponym Resolution). However, our research goes further than any previous work by showing how to distinguish the literal place(s) of the event (Spain, Malaga) from other linguistic types/uses such as nationalities (Mexican, American), improving downstream task accuracy. We consolidate and extend the Standard Evaluation Framework, discuss key research problems, then present concrete solutions in order to advance each stage of geoparsing. For geotagging, as well as training a SOTA neural Location-NER tagger, we simplify Metonymy Resolution with a novel minimalist feature extraction combined with an LSTM-based classifier, matching SOTA results. For toponym resolution, we deploy the latest deep learning methods to achieve SOTA performance by augmenting neural models with hitherto unused geographic features called Map Vectors. With each research project, we provide high-quality datasets and system prototypes, further building resources in this field. We then show how these geoparsing advances coupled with our proposed Intra-Document Analysis can be used to associate news articles with locations in order to monitor the spread of public health threats. To this end, we evaluate our research contributions with production data from a real-time downstream application to improve geolocation of news events for disease monitoring. The data was made available to us by the Joint Research Centre (JRC), which operates one such system called MediSys that processes incoming news articles in order to monitor threats to public health and make these available to a variety of governmental, business and non-profit organisations. We also discuss steps towards an end-to-end, automated news monitoring system and make actionable recommendations for future work. In summary, the thesis aims are twofold: (1) Generate original geoparsing research aimed at advancing each stage of the pipeline by addressing pertinent challenges with concrete solutions and actionable proposals. (2) Demonstrate how this research can be applied to news event monitoring to increase the efficacy of existing biosurveillance systems, e.g. European Commission’s MediSys.I was generously funded by DREAM CDT, which was funded by NERC of UKRI
Traces of the Animal Past
Understanding the relationships between humans and animals is essential to a full understanding of both our present and our shared past. Across the humanities and social sciences, researchers have embraced the ‘animal turn,’ a multispecies approach to scholarship, with historians at the forefront of new research in human-animal studies that blends traditional research methods with interdisciplinary theoretical frameworks that decenter humans in historical narratives. These exciting approaches come with core methodological challenges for scholars seeking to better understand the past from non-anthropocentric perspectives.
Whether in a large public archive, a small private collection, or the oral histories of living memories, stories of animals are mediated by the humans who have inscribed the records and organized archival collections. In oral histories, the place of animals in the past are further refracted by the frailty of human memory and recollection. Only traces remain for researchers to read and interpret.
Bringing together seventeen original essays by a leading group of international scholars, Traces of the Animal Past showcases the innovative methods historians use to unearth and explain how animals fit into our collective histories. Situating the historian within the narrative, bringing transparency to methodological processes, and reflecting on the processes and procedures of current research, this book presents new approaches and new directions for a maturing field of historical inquiry
The Classification of Religions: A domain-analytic examination of the history and epistemology of the classification of religions within the Religious Studies discipline
While religion is a part of every culture and is entangled in many facets of the lives of those who are religious, the scientific study of religion and the Religious Studies discipline are fairly new, only developing in the mid to late nineteenth century. One of the contributions that the scientific study of religions has made is the development of different approaches for classifying religions. As a multidisciplinary field, Religious Studies and the classification of religions has been influenced by philosophy, psychology, history, sociology and anthropology. This study, using the domain-analytic paradigm, traces the development of the Religious Studies discipline and the classification of religions, analyzes the epistemological assumptions behind the prominent approaches used to classify religions and briefly examines their relation to the Library of Congress, Dewey Decimal and Universal Decimal classifications
- …