2,092 research outputs found
Knowledge extraction from fictional texts
Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries - by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges: - Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available. - Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains. - Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues. This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains. - The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision. - The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall. - The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.Wissensextraktion ist ein SchlĂŒsselaufgabe bei der Verarbeitung natĂŒrlicher Sprache, und umfasst viele Unteraufgaben, wie Taxonomiekonstruktion, EntitĂ€tserkennung und Typisierung, Relationsextraktion, Wissenskanonikalisierung, etc. Durch den Aufbau von strukturiertem Wissen (z.B. Wissensdatenbanken) aus Texten wird die Wissensextraktion zu einem SchlĂŒsselfaktor fĂŒr Suchmaschinen, Question Answering und andere Anwendungen. Aktuelle Methoden zur Wissensextraktion konzentrieren sich jedoch hauptsĂ€chlich auf den Bereich der realen Welt, wobei Wikipedia und Mainstream- Nachrichtenartikel die Hauptquellen sind. Fiktion und Fantasy sind Kernbestandteile unserer menschlichen Kultur, die sich von Literatur bis zu Filmen, Fernsehserien, Comics und Videospielen erstreckt. FĂŒr Tausende von fiktiven Universen wird Wissen aus Suchmaschinen abgefragt â von Fans ebenso wie von Kulturwissenschaftler. Im Gegensatz zur realen Welt muss die Wissensextraktion in solchen spezifischen DomĂ€nen wie Belletristik und Fantasy mehrere zentrale Herausforderungen bewĂ€ltigen: âą Trainingsdaten. Quellen fĂŒr fiktive DomĂ€nen stammen hauptsĂ€chlich aus BĂŒchern und von Fans erstellten Inhalten, die spĂ€rlich und fehlerbehaftet sind und schwierige Textstrukturen wie Dialoge und Zitate enthalten. Trainingsdaten fĂŒr SchlĂŒsselaufgaben wie Taxonomie-Induktion, Named Entity Typing oder Relation Extraction sind ebenfalls nicht verfĂŒgbar. âą Domain-Eigenschaften und DiversitĂ€t. Fiktive Universen können sehr anspruchsvoll sein und EntitĂ€ten, soziale Strukturen und manchmal auch Sprachen enthalten, die sich von der realen Welt völlig unterscheiden. Moderne Methoden zur Wissensextraktion machen Annahmen ĂŒber Entity-Class-, Entity-Subclass- und Entity- Entity-Relationen, die fĂŒr fiktive DomĂ€nen oft ungĂŒltig sind. Bei verschiedenen Genres fiktiver DomĂ€nen mĂŒssen Modelle auch ĂŒber fiktive DomĂ€nen hinweg transferierbar sein. âą Lange fiktive Texte. WĂ€hrend moderne Modelle EinschrĂ€nkungen hinsichtlich der LĂ€nge der Eingabesequenz haben, ist es wichtig, Methoden zu entwickeln, die in der Lage sind, mit sehr langen Texten (z.B. ganzen BĂŒchern) umzugehen, und mehrere Kontexte und verteilte Hinweise zu erfassen. Diese Dissertation befasst sich mit den oben genannten Herausforderungen, und entwickelt Methoden, die den Stand der Kunst zur Wissensextraktion in fiktionalen DomĂ€nen voranbringen. âą Der erste Beitrag ist eine Methode, genannt TiFi, zur Konstruktion von Typsystemen (Taxonomie induktion) fĂŒr fiktive DomĂ€nen. Aus von Fans erstellten Inhalten in Online-Communities wie Wikia induziert TiFi Taxonomien in drei wesentlichen Schritten: Kategoriereinigung, Kantenreinigung und Top-Level- Konstruktion. TiFi nutzt eine Vielzahl von Informationen aus den ursprĂŒnglichen Quellen und ist in der Lage, Taxonomien fĂŒr eine Vielzahl von fiktiven DomĂ€nen mit hoher PrĂ€zision zu erstellen. âą Der zweite Beitrag ist ein umfassender Ansatz, genannt ENTYFI, zur Erkennung von EntitĂ€ten, und deren Typen, in langen fiktiven Texten. Aufbauend auf 205 automatisch induzierten hochwertigen Typsystemen fĂŒr populĂ€re fiktive DomĂ€nen nutzt ENTYFI die Ăberlappung und Wiederverwendung dieser fiktiven DomĂ€nen zur Bearbeitung neuer Texte. Durch die Zusammenstellung verschiedener Typisierungsmodule mit einer Konsolidierungsphase ist ENTYFI in der Lage, in langen fiktionalen Texten eine feinkörnige EntitĂ€tstypisierung mit hoher PrĂ€zision und Abdeckung durchzufĂŒhren. âą Der dritte Beitrag ist ein End-to-End-System, genannt KnowFi, um Relationen zwischen EntitĂ€ten aus sehr langen Texten wie ganzen BĂŒchern zu extrahieren. KnowFi nutzt Hintergrundwissen aus 142 beliebten fiktiven DomĂ€nen, um interessante Beziehungen zu identifizieren und Trainingsdaten zu sammeln. KnowFi umfasst eine Ă€hnlichkeitsbasierte Ranking-Technik, um falsch positive EintrĂ€ge in Trainingsdaten zu reduzieren und potenzielle Textpassagen auszuwĂ€hlen, die Paare von Kandidats-EntitĂ€ten enthalten. Durch das Trainieren eines hierarchischen neuronalen Netzwerkes fĂŒr alle Relationen ist KnowFi in der Lage, Relationen zwischen EntitĂ€tspaaren aus langen fiktiven Texten abzuleiten, und ĂŒbertrifft die besten frĂŒheren Methoden zur Relationsextraktion
sWOM and Online Shopping within a Disease Menace: The Case of Vietnam
Although electronic word-of-mouth via social networking sites (or sWOM) greatly induced online shopping, its importance in shopping decisions during the coronavirus disease (COVID-19) pandemic has not been holistically considered. Based on the necessity of sWOM, uses and gratifications theory (UGT), and health belief theory (HBT), this study frames a consumer shopping tendency model toward sWOM in the context of the pandemic. A web-based survey was designed to collect data from 403 respondents who are inclined to patronize e-stores during the pandemic. Next, the measurement model is examined using a two-step method of structural equation modeling. The findings specify that sWOM is an influential communication mode for online shopping in the pandemic. sWOM is of primary importance to information quality. Moreover, utilitarian value, social value, perceived threat, and self-efficacy toward shopping tendency are significantly motivated by sWOM. Lastly, information quality, utilitarian value, social value, and perceived threat are major predictors of shopping tendency during Covid-19. Finally, theoretical and practical implications are discussed
Unearthing Common Inconsistency for Generalisable Deepfake Detection
Deepfake has emerged for several years, yet efficient detection techniques
could generalize over different manipulation methods require further research.
While current image-level detection method fails to generalize to unseen
domains, owing to the domain-shift phenomenon brought by CNN's strong inductive
bias towards Deepfake texture, video-level one shows its potential to have both
generalization across multiple domains and robustness to compression. We argue
that although distinct face manipulation tools have different inherent bias,
they all disrupt the consistency between frames, which is a natural
characteristic shared by authentic videos. Inspired by this, we proposed a
detection approach by capturing frame inconsistency that broadly exists in
different forgery techniques, termed unearthing-common-inconsistency (UCI).
Concretely, the UCI network based on self-supervised contrastive learning can
better distinguish temporal consistency between real and fake videos from
multiple domains. We introduced a temporally-preserved module method to
introduce spatial noise perturbations, directing the model's attention towards
temporal information. Subsequently, leveraging a multi-view cross-correlation
learning module, we extensively learn the disparities in temporal
representations between genuine and fake samples. Extensive experiments
demonstrate the generalization ability of our method on unseen Deepfake
domains.Comment: 9 pages, 2 figures and 5 table
Preconcentration of Arsenic Species in Environmental Waters by Solid Phase Extraction Using Metal-loaded Chelating Resins
Joint Research on Environmental Science and Technology for the EarthăAnnual Report of FY 2002, The Core University Program between Japan Society for the Promotion of Science (JSPS) and National Centre for Natural Science and Technology (NCST)ăpp.20-23, Core University Program Office, Fujita Laboratory, Dept. of Environmental Engineering, Osaka University, 200
Analyse gĂ©omatique de la correspondance entre la localisation des hĂŽpitaux de la ville d'Hanoi (ViĂȘt-nam) et les besoins de la population en soins de santĂ©
Actuellement, dans la ville de Hanoi, le systĂšme de soins de santĂ© ne rĂ©pond plus Ă la demande de services exprimĂ©e par la population. Les Ă©tudes portant sur la relation entre l'offre et la demande de services peuvent offrir des informations supplĂ©mentaires pour analyser la situation et faciliter une meilleure prise de dĂ©cision pour le dĂ©veloppement du systĂšme. Notre Ă©tude se concentrera plus particuliĂšrement sur les relations spatiales entre la population, les infrastructures sanitaires et l'environnement. L'objectif de cette Ă©tude est de dĂ©velopper un modĂšle d'Ă©valuation de services de soins mĂ©dicaux Ă partir des caractĂ©ristiques de la population et celles du systĂšme de soins mĂ©dicaux existant Ă l'aide de la gĂ©omatique. Ce modĂšle sera composĂ© de deux parties: la dĂ©termination la demande de soins et la dĂ©finition de l'offre de services du systĂšme de santĂ©. A partir des donnĂ©es cartographiques, socio-Ă©conomiques et hospitaliĂšres, les indicateurs exprimant la demande et ceux liĂ©s aux paramĂštres d'accessibilitĂ© et de disponibilitĂ© de l'offre hospitaliĂšre ont Ă©tĂ© calculĂ©s. Ensuite, une analyse multivariĂ©e a permis d'estimer des rĂ©sultats prĂ©liminaires. Ces rĂ©sultats montrent la situation actuelle des services de soins mĂ©dicaux dans la ville. On y constate que les hĂŽpitaux sont, dans la plupart des cas, loin des zones oĂč la demande est forte. Celle-ci se concentre au centre de la ville. Les rĂ©sultats obtenus y montrent une dĂ©ficience. Dans les zones d'expansion urbaine, une amĂ©lioration doit ĂȘtre apportĂ©e au niveau de l'offre en soins mĂ©dicaux adĂ©quats; Ă©galement, dans plusieurs zones, la qualitĂ© de l'offre doit ĂȘtre revue Ă la hausse
Effects of peer feedback on Taiwanese adolescentsâ English speaking practices and development
This thesis explores the impact of peer feedback on two secondary level classrooms
studying English as a foreign language in Taiwan. The effectiveness of teacher-led
feedback has consistently been the focus of the relevant literature but relatively fewer
studies have experimentally investigated the impact of peer-led feedback on learning.
This research is based on the belief that the investigation of the process of peer-led
feedback, as well as the effectiveness of peer-led correction, will enhance our
understanding of learnersâ communicative interactions. These data will allow us the
opportunity to provide suggestions for successful second/foreign language learning.
This study was conducted following a mixed-methods quasi-experimental design
involving a variety of data collection and analysis techniques. Observations of peer-peer
dialogues taken from a Year 7 and a Year 8 class were analysed using content
analysis, in order to classify the types of peer feedback provided by the Year 7 and
Year 8 learners. Pre-and post-measures, including English speaking tests,
questionnaires, and checklists, were examined with non-parametric statistical tests
used to explore any changes in relation to the learnersâ speaking development after
the quasi-experiment. Key findings included frequency and distribution of seven
types of peer feedback, as used by the Year 7 and Year 8 learners, and the statistical
results that revealed the differences between the pre-and post-measures. Among the
seven types of peer feedback (translation, confirmation, completion, explicit
indication, explicit correction, explanation and recasts), explicit correction and
translation were the two techniques used most frequently by the learners. Post-test
results indicated an improvement in the learnersâ speaking performance. The results
of pre- and post-questionnaires and pre- and post-checklists showed different levels
of change in the learnersâ self-evaluation of their own ability to speak English,
as well as their attitudes towards corrective feedback.
These results allow us to gain insight into the nature of peer interaction in
communicative speaking activities as well as learnersâ motives behind their feedback
behaviours. Additionally, the results shed light on learnersâ opinions towards
corrective feedback that they received or provided in peer interaction. Further, the
results yield a deepened understanding of impacts of peer feedback on L2
development by examining changes in learnersâ speaking performance, self-confidence
in speaking English and self-evaluation of their own ability to speak
English after a peer-led correction treatment. In conclusion, the study suggests that
adolescent learners are willing and able to provide each other with feedback in peer
interaction. The feedback that they delivered successfully helps their peers to attend
to form and has positive impacts on their peersâ English- speaking performance.
Moreover, the study provides explanations for learnersâ preference for certain types
of feedback techniques, which hopefully helps to tackle the mismatch between
teachersâ intentions and learnersâ expectations of corrective feedback in the L2
classrooms
Validating Problem Solving Competency Instrument in the New General Education Curriculum
Problem solving is a crucial skill for students who experience learning and living in the 21st century. To enhance this skill, students need to face a situation setting problem, then students solve the problem. The 2018 general education curriculum has been developed according to the competency approach. As a result, the instructions and assessment system need to be adapted to align with requirements in the new curriculum. The purpose of present study is to develop and validate the problem solving competency (PSC) instrument based on general requirements of this competency in the general education curriculum in Vietnam. The results of Exploratory Factor Analysis (EFA) show that the instruments can be divided into three different components with good factor loadings to measure problem solving competency of Vietnamese students. The instrument is reliable and valid. Reliability analysis using Cronbachâs alpha revealed satisfactory internal consistency for each factor, with values ranging from .670 to .812
Some field experience with subsynchronous vibration of centrifugal compressors
A lot of large chemical fertilizer plants producing 1000 ton NH3/day and 1700 ton urea/day were constructed in the 1970's in China. During operation, subsynchronous vibration takes place occasionally in some of the large turbine-compressor sets and has resulted in heavy economic losses. Two cases of subsynchronous vibration are described: Self-excited vibration of the low-pressure (LP) cylinder of one kind of N2-H2 multistage compressor; and Forced subsynchronous vibration of the high-pressure (HP) cylinder of the CO2 compressor
- âŠ