1,974 research outputs found

    Knowledge extraction from fictional texts

    Get PDF
    Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries - by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges: - Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available. - Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains. - Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues. This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains. - The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision. - The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall. - The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.Wissensextraktion ist ein SchlĂŒsselaufgabe bei der Verarbeitung natĂŒrlicher Sprache, und umfasst viele Unteraufgaben, wie Taxonomiekonstruktion, EntitĂ€tserkennung und Typisierung, Relationsextraktion, Wissenskanonikalisierung, etc. Durch den Aufbau von strukturiertem Wissen (z.B. Wissensdatenbanken) aus Texten wird die Wissensextraktion zu einem SchlĂŒsselfaktor fĂŒr Suchmaschinen, Question Answering und andere Anwendungen. Aktuelle Methoden zur Wissensextraktion konzentrieren sich jedoch hauptsĂ€chlich auf den Bereich der realen Welt, wobei Wikipedia und Mainstream- Nachrichtenartikel die Hauptquellen sind. Fiktion und Fantasy sind Kernbestandteile unserer menschlichen Kultur, die sich von Literatur bis zu Filmen, Fernsehserien, Comics und Videospielen erstreckt. FĂŒr Tausende von fiktiven Universen wird Wissen aus Suchmaschinen abgefragt – von Fans ebenso wie von Kulturwissenschaftler. Im Gegensatz zur realen Welt muss die Wissensextraktion in solchen spezifischen DomĂ€nen wie Belletristik und Fantasy mehrere zentrale Herausforderungen bewĂ€ltigen: ‱ Trainingsdaten. Quellen fĂŒr fiktive DomĂ€nen stammen hauptsĂ€chlich aus BĂŒchern und von Fans erstellten Inhalten, die spĂ€rlich und fehlerbehaftet sind und schwierige Textstrukturen wie Dialoge und Zitate enthalten. Trainingsdaten fĂŒr SchlĂŒsselaufgaben wie Taxonomie-Induktion, Named Entity Typing oder Relation Extraction sind ebenfalls nicht verfĂŒgbar. ‱ Domain-Eigenschaften und DiversitĂ€t. Fiktive Universen können sehr anspruchsvoll sein und EntitĂ€ten, soziale Strukturen und manchmal auch Sprachen enthalten, die sich von der realen Welt völlig unterscheiden. Moderne Methoden zur Wissensextraktion machen Annahmen ĂŒber Entity-Class-, Entity-Subclass- und Entity- Entity-Relationen, die fĂŒr fiktive DomĂ€nen oft ungĂŒltig sind. Bei verschiedenen Genres fiktiver DomĂ€nen mĂŒssen Modelle auch ĂŒber fiktive DomĂ€nen hinweg transferierbar sein. ‱ Lange fiktive Texte. WĂ€hrend moderne Modelle EinschrĂ€nkungen hinsichtlich der LĂ€nge der Eingabesequenz haben, ist es wichtig, Methoden zu entwickeln, die in der Lage sind, mit sehr langen Texten (z.B. ganzen BĂŒchern) umzugehen, und mehrere Kontexte und verteilte Hinweise zu erfassen. Diese Dissertation befasst sich mit den oben genannten Herausforderungen, und entwickelt Methoden, die den Stand der Kunst zur Wissensextraktion in fiktionalen DomĂ€nen voranbringen. ‱ Der erste Beitrag ist eine Methode, genannt TiFi, zur Konstruktion von Typsystemen (Taxonomie induktion) fĂŒr fiktive DomĂ€nen. Aus von Fans erstellten Inhalten in Online-Communities wie Wikia induziert TiFi Taxonomien in drei wesentlichen Schritten: Kategoriereinigung, Kantenreinigung und Top-Level- Konstruktion. TiFi nutzt eine Vielzahl von Informationen aus den ursprĂŒnglichen Quellen und ist in der Lage, Taxonomien fĂŒr eine Vielzahl von fiktiven DomĂ€nen mit hoher PrĂ€zision zu erstellen. ‱ Der zweite Beitrag ist ein umfassender Ansatz, genannt ENTYFI, zur Erkennung von EntitĂ€ten, und deren Typen, in langen fiktiven Texten. Aufbauend auf 205 automatisch induzierten hochwertigen Typsystemen fĂŒr populĂ€re fiktive DomĂ€nen nutzt ENTYFI die Überlappung und Wiederverwendung dieser fiktiven DomĂ€nen zur Bearbeitung neuer Texte. Durch die Zusammenstellung verschiedener Typisierungsmodule mit einer Konsolidierungsphase ist ENTYFI in der Lage, in langen fiktionalen Texten eine feinkörnige EntitĂ€tstypisierung mit hoher PrĂ€zision und Abdeckung durchzufĂŒhren. ‱ Der dritte Beitrag ist ein End-to-End-System, genannt KnowFi, um Relationen zwischen EntitĂ€ten aus sehr langen Texten wie ganzen BĂŒchern zu extrahieren. KnowFi nutzt Hintergrundwissen aus 142 beliebten fiktiven DomĂ€nen, um interessante Beziehungen zu identifizieren und Trainingsdaten zu sammeln. KnowFi umfasst eine Ă€hnlichkeitsbasierte Ranking-Technik, um falsch positive EintrĂ€ge in Trainingsdaten zu reduzieren und potenzielle Textpassagen auszuwĂ€hlen, die Paare von Kandidats-EntitĂ€ten enthalten. Durch das Trainieren eines hierarchischen neuronalen Netzwerkes fĂŒr alle Relationen ist KnowFi in der Lage, Relationen zwischen EntitĂ€tspaaren aus langen fiktiven Texten abzuleiten, und ĂŒbertrifft die besten frĂŒheren Methoden zur Relationsextraktion

    sWOM and Online Shopping within a Disease Menace: The Case of Vietnam

    Get PDF
    Although electronic word-of-mouth via social networking sites (or sWOM) greatly induced online shopping, its importance in shopping decisions during the coronavirus disease (COVID-19) pandemic has not been holistically considered. Based on the necessity of sWOM, uses and gratifications theory (UGT), and health belief theory (HBT), this study frames a consumer shopping tendency model toward sWOM in the context of the pandemic. A web-based survey was designed to collect data from 403 respondents who are inclined to patronize e-stores during the pandemic. Next, the measurement model is examined using a two-step method of structural equation modeling. The findings specify that sWOM is an influential communication mode for online shopping in the pandemic. sWOM is of primary importance to information quality. Moreover, utilitarian value, social value, perceived threat, and self-efficacy toward shopping tendency are significantly motivated by sWOM. Lastly, information quality, utilitarian value, social value, and perceived threat are major predictors of shopping tendency during Covid-19. Finally, theoretical and practical implications are discussed

    Unearthing Common Inconsistency for Generalisable Deepfake Detection

    Full text link
    Deepfake has emerged for several years, yet efficient detection techniques could generalize over different manipulation methods require further research. While current image-level detection method fails to generalize to unseen domains, owing to the domain-shift phenomenon brought by CNN's strong inductive bias towards Deepfake texture, video-level one shows its potential to have both generalization across multiple domains and robustness to compression. We argue that although distinct face manipulation tools have different inherent bias, they all disrupt the consistency between frames, which is a natural characteristic shared by authentic videos. Inspired by this, we proposed a detection approach by capturing frame inconsistency that broadly exists in different forgery techniques, termed unearthing-common-inconsistency (UCI). Concretely, the UCI network based on self-supervised contrastive learning can better distinguish temporal consistency between real and fake videos from multiple domains. We introduced a temporally-preserved module method to introduce spatial noise perturbations, directing the model's attention towards temporal information. Subsequently, leveraging a multi-view cross-correlation learning module, we extensively learn the disparities in temporal representations between genuine and fake samples. Extensive experiments demonstrate the generalization ability of our method on unseen Deepfake domains.Comment: 9 pages, 2 figures and 5 table

    Preconcentration of Arsenic Species in Environmental Waters by Solid Phase Extraction Using Metal-loaded Chelating Resins

    Full text link
    Joint Research on Environmental Science and Technology for the Earth『Annual Report of FY 2002, The Core University Program between Japan Society for the Promotion of Science (JSPS) and National Centre for Natural Science and Technology (NCST)』pp.20-23, Core University Program Office, Fujita Laboratory, Dept. of Environmental Engineering, Osaka University, 200

    Analyse gĂ©omatique de la correspondance entre la localisation des hĂŽpitaux de la ville d'Hanoi (ViĂȘt-nam) et les besoins de la population en soins de santĂ©

    Get PDF
    Actuellement, dans la ville de Hanoi, le systĂšme de soins de santĂ© ne rĂ©pond plus Ă  la demande de services exprimĂ©e par la population. Les Ă©tudes portant sur la relation entre l'offre et la demande de services peuvent offrir des informations supplĂ©mentaires pour analyser la situation et faciliter une meilleure prise de dĂ©cision pour le dĂ©veloppement du systĂšme. Notre Ă©tude se concentrera plus particuliĂšrement sur les relations spatiales entre la population, les infrastructures sanitaires et l'environnement. L'objectif de cette Ă©tude est de dĂ©velopper un modĂšle d'Ă©valuation de services de soins mĂ©dicaux Ă  partir des caractĂ©ristiques de la population et celles du systĂšme de soins mĂ©dicaux existant Ă  l'aide de la gĂ©omatique. Ce modĂšle sera composĂ© de deux parties: la dĂ©termination la demande de soins et la dĂ©finition de l'offre de services du systĂšme de santĂ©. A partir des donnĂ©es cartographiques, socio-Ă©conomiques et hospitaliĂšres, les indicateurs exprimant la demande et ceux liĂ©s aux paramĂštres d'accessibilitĂ© et de disponibilitĂ© de l'offre hospitaliĂšre ont Ă©tĂ© calculĂ©s. Ensuite, une analyse multivariĂ©e a permis d'estimer des rĂ©sultats prĂ©liminaires. Ces rĂ©sultats montrent la situation actuelle des services de soins mĂ©dicaux dans la ville. On y constate que les hĂŽpitaux sont, dans la plupart des cas, loin des zones oĂč la demande est forte. Celle-ci se concentre au centre de la ville. Les rĂ©sultats obtenus y montrent une dĂ©ficience. Dans les zones d'expansion urbaine, une amĂ©lioration doit ĂȘtre apportĂ©e au niveau de l'offre en soins mĂ©dicaux adĂ©quats; Ă©galement, dans plusieurs zones, la qualitĂ© de l'offre doit ĂȘtre revue Ă  la hausse

    Effects of peer feedback on Taiwanese adolescents’ English speaking practices and development

    Get PDF
    This thesis explores the impact of peer feedback on two secondary level classrooms studying English as a foreign language in Taiwan. The effectiveness of teacher-led feedback has consistently been the focus of the relevant literature but relatively fewer studies have experimentally investigated the impact of peer-led feedback on learning. This research is based on the belief that the investigation of the process of peer-led feedback, as well as the effectiveness of peer-led correction, will enhance our understanding of learners’ communicative interactions. These data will allow us the opportunity to provide suggestions for successful second/foreign language learning. This study was conducted following a mixed-methods quasi-experimental design involving a variety of data collection and analysis techniques. Observations of peer-peer dialogues taken from a Year 7 and a Year 8 class were analysed using content analysis, in order to classify the types of peer feedback provided by the Year 7 and Year 8 learners. Pre-and post-measures, including English speaking tests, questionnaires, and checklists, were examined with non-parametric statistical tests used to explore any changes in relation to the learners’ speaking development after the quasi-experiment. Key findings included frequency and distribution of seven types of peer feedback, as used by the Year 7 and Year 8 learners, and the statistical results that revealed the differences between the pre-and post-measures. Among the seven types of peer feedback (translation, confirmation, completion, explicit indication, explicit correction, explanation and recasts), explicit correction and translation were the two techniques used most frequently by the learners. Post-test results indicated an improvement in the learners’ speaking performance. The results of pre- and post-questionnaires and pre- and post-checklists showed different levels of change in the learners’ self-evaluation of their own ability to speak English, as well as their attitudes towards corrective feedback. These results allow us to gain insight into the nature of peer interaction in communicative speaking activities as well as learners’ motives behind their feedback behaviours. Additionally, the results shed light on learners’ opinions towards corrective feedback that they received or provided in peer interaction. Further, the results yield a deepened understanding of impacts of peer feedback on L2 development by examining changes in learners’ speaking performance, self-confidence in speaking English and self-evaluation of their own ability to speak English after a peer-led correction treatment. In conclusion, the study suggests that adolescent learners are willing and able to provide each other with feedback in peer interaction. The feedback that they delivered successfully helps their peers to attend to form and has positive impacts on their peers’ English- speaking performance. Moreover, the study provides explanations for learners’ preference for certain types of feedback techniques, which hopefully helps to tackle the mismatch between teachers’ intentions and learners’ expectations of corrective feedback in the L2 classrooms

    Some field experience with subsynchronous vibration of centrifugal compressors

    Get PDF
    A lot of large chemical fertilizer plants producing 1000 ton NH3/day and 1700 ton urea/day were constructed in the 1970's in China. During operation, subsynchronous vibration takes place occasionally in some of the large turbine-compressor sets and has resulted in heavy economic losses. Two cases of subsynchronous vibration are described: Self-excited vibration of the low-pressure (LP) cylinder of one kind of N2-H2 multistage compressor; and Forced subsynchronous vibration of the high-pressure (HP) cylinder of the CO2 compressor
    • 

    corecore