6,154 research outputs found
Knowledge-Driven Implicit Information Extraction
Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner. This is a feature that is used extensively in day to day communications in situations such as: 1) expressing sarcasm, 2) when trying to recall forgotten things, 3) when required to convey descriptive information, 4) when emphasizing the features of an entity, and 5) when communicating a common understanding. Consider the tweet New Sandra Bullock astronaut lost in space movie looks absolutely terrifying and the text snippet extracted from a clinical narrative He is suffering from nausea and severe headaches. Dolasteron was prescribed . The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea . Such implicit references of the entities and the relationships are common occurrences in daily communication and they add value to conversations. However, extracting implicit constructs has not received enough attention in the information extraction literature. This dissertation focuses on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets. When people use implicit constructs in their daily communication, they assume the existence of a shared knowledge with the audience about the subject being discussed. This shared knowledge helps to decode implicitly conveyed information. For example, the above Twitter user assumed that his/her audience knows that the actress Sandra Bullock starred in the movie Gravity and it is a movie about space exploration. The clinical professional who wrote the clinical narrative above assumed that the reader knows that Dolasteron is an anti-nausea drug. The audience without such domain knowledge may not have correctly decoded the information conveyed in the above examples. This dissertation demonstrates manifestations of implicit constructs in text, studies their characteristics, and develops a software solution that is capable of extracting implicit information from text. The developed solution starts by acquiring relevant knowledge to solve the implicit information extraction problem. The relevant knowledge includes domain knowledge, contextual knowledge, and linguistic knowledge. The acquired knowledge can take different syntactic forms such as a text snippet, structured knowledge represented in standard knowledge representation languages such as the Resource Description Framework (RDF) or other custom formats. Hence, the acquired knowledge is pre-processed to create models that can be processed by machines. Such models provide the infrastructure to perform implicit information extraction. This dissertation focuses on three different use cases of implicit information and demonstrates the applicability of the developed solution in these use cases. They are: 1) implicit entity linking in clinical narratives, 2) implicit entity linking in Twitter, and 3) implicit relationship extraction from clinical narratives. The evaluations are conducted on relevant annotated datasets for implicit information and they demonstrate the effectiveness of the developed solution in extracting implicit information from text
INNOVATION AND KNOWLEDGE TRANSFER MECHANISMS IN AN “ENGAGED” UNIVERSITY. THE CASE OF THE “FEDERICO II" SAN GIOVANNI HUB (SGH)
What happens when a former industrial area (dismissed for nearly 20 years) is replaced by a knowledge-intensive Hub hosting: a University Campus, research centres and laboratories, firms, and a hybrid form of advanced education programmes in partnership with global-scale companies?
The present research aims at defining the scope of such emerging phenomenon occurring in a peripheral suburb in the East area of the city of Naples (Italy), and characterised by the settlement of a knowledge intensive Hub involving innovation, technology and knowledge transfer processes.
The main subject of the study is the San Giovanni a Teduccio “Federico II” University Hub, a university campus and research centre hosted by a peripheral urban suburb in the East area of Naples and herein named the San Giovanni Hub (“SGH”) or simply the “Hub”
Text mining for social sciences: new approaches
The rise of the Internet has determined an important change in the way we look at the world, and then the mode we measure it. In June 2018, more than 55% of the world’s population has an Internet access. It follows that, every day we are able to quantify what more than four billion people do, how and when they do it. This means data.
The availability of all these data raised more than one questions: How to manage them? How to treat them? How to extract information from them? Now, more than ever before, we need to think about new rules, new methods and new procedures for handling this huge amount of data, which are characterized by being unstructured, raw and messy.
One of the most interesting challenge in this field regards the implementation of processes for deriving information from textual sources; this process is also known as Text Mining. Born in the mid-90s, Text Mining represents a prolific field which has evolved – thanks to technology evolution – from the Automatic Text Analysis, a set of methods for the description and the analysis of documents.
Textual data, even if transformed into a structured format, present several criticisms as they are characterized by high dimensionality and noise. Moreover, online texts – like social media posts or blogs comments – are most of the time very short, and this means more sparseness of the matrices when the data are encoded. All these findings pose the problem of looking at new and advanced solutions for treating Web Data, that are able to overcome these criticisms and at the same time, return the information contained into these texts. The objective is to propose a fast and scalable method, able to deal with the findings of the online texts, and then with big and sparse matrices. To do that, we propose a procedure that starts from the collection of texts to the interpretation of the results. The innovative parts of this procedure consist of the choice of the weighting scheme for the term-document matrix and the co-clustering approach for data classification. To verify the validity of the procedure, we test it through two real applications: one concerning the topic of the safety and health at work and another regarding the subject of the Brexit vote. It will be shown how the technique works on different types of texts, allowing us to obtain meaningful results.
For the reasons described above, in this research work we implement and test on real datasets a new procedure for content analysis of textual data, using a two-way approach in the Text Clustering field. As will be shown in the following pages, Text Clustering is a process of unsupervised classification that reproduces the internal structure of the data, by dividing the text into different groups on the basis of the lexical similarities. Text Clustering is mostly utilized for content analysis, and it might be applied for the classification of words, documents or both. In latter case we refer to two-way clustering, that is the specific approach we implemented within this research work for the treatment of the texts.
To better organize the research work, we divided it into two parts: a first part of theory and a second one of application. The first part contains a preliminary chapter of literature review on the field of the Automatic Text Analysis in the context of data revolution, and a second chapter where the new procedure for text co-clustering is proposed. The second part regards the application of the proposed techniques on two different set of texts, one composed of news and another one composed of tweets. The idea is to test the same procedure on different type of texts, in order to verify the validity and the robustness of the method
The narrative interview for the assessment of the assisted person: structure, method and data analysis
Background and aim: If it is true that the impact of the symptoms of the disease is differently perceived by each person and that there is an incommunicability of the experiences of suffering, it is equally true that the narration provides an understandable representation, which derives from the network of representations that are part of a personal history. The aim of this study was to offer an in-depth analysis of the “narrative interview” collected during the assessment of a 74 years old diabetic woman. Methods: A case study was conducted by a nurse with advanced expertise in conducting narrative interview. Content analysis and Meaning analysis were performed using a Grounded theory approach and according with Gee’s Poetic Method. Results: The patient after the diagnosis felt disbelief, anger and confusion. The illness forces her to change her life, habits and social role, with high suffering. However she adjusted to this new condition and thanks to her strong and positive attitude and the social support she received, she has succeeded in activating her “post traumatic growth”. Conclusions: A good narrative interview starts long before the interview itself and it requires: a specific training in the use of the instrument; the strengthening of specific skills (e.g. the active listening); the choice of optimal setting and timing for the patient; the ability to offer encouragement in the expression of the subjective experience and to conduct an analysis of the patient’s words with a subjective lens, reflecting the uniqueness of each illness experience
System Learning of User Interactions
The case presented in this paper describes an early prototype and next steps for developing a user-adaptive recommender system using semantic analysis and matching of user profiles and content. Machine learning methods optimize semantic analysis and matching based on implicit and explicit feedback of users. The constant interaction with users provides a valuable data source that is used to improve human-computer interaction and for adapting to specific user preferences. This can lead to, among others, higher accuracy and relevance in content matching, more intuitive graphical user interfaces, improved system performance, and better prioritization of tasks
Enabling entity retrieval by exploiting Wikipedia as a semantic knowledge source
This dissertation research, PanAnthropon FilmWorld, aims to demonstrate direct retrieval of entities and related facts by exploiting Wikipedia as a semantic knowledge source, with the film domain as its proof-of-concept domain of application. To this end, a semantic knowledge base concerning the film domain has been constructed with the data extracted/derived from 10,640 Wikipedia pages on films and additional pages on film awards. The knowledge base currently contains 209,266 entities and 2,345,931 entity-centric facts. Both the knowledge base and the corresponding semantic search interface are based on the coherent classification of entities. Entity-centric facts are also consistently represented as tuples. The semantic search interface (http://dlib.ischool.drexel.edu:8080/sofia/PA/) supports multiple types of semantic search functions, which go beyond the traditional keyword-based search function, including the main General Entity Retrieval Query (GERQ) function, which is concerned with retrieving all entities that match the specified entity type, subtype, and semantic conditions and thus corresponds to the main research problem. Two types of evaluation have been performed in order to evaluate (1) the quality of information extraction and (2) the effectiveness of information retrieval using the semantic interface. The first type of evaluation has been performed by inspecting 11,495 film-centric facts concerning 100 films. The results have confirmed high data quality with 99.96% average precision and 99.84% average recall. The second type of evaluation has been performed by conducting an experiment with human subjects. The experiment involved having the subjects perform a retrieval task by using both the PanAnthropon interface and the Internet Movie Database (IMDb) interface and comparing their task performance between the two interfaces. The results have confirmed higher effectiveness of the PanAnthropon interface vs. the IMDb interface (83.11% vs. 40.78% average precision; 83.55% vs. 40.26% average recall). Moreover, the subjects’ responses to the post-task questionnaire indicate that the subjects found the PanAnthropon interface to be highly usable and easily understandable as well as highly effective. The main contribution from this research therefore consists in achieving the set research goal, namely, demonstrating the utility and feasibility of semantics-based direct entity retrieval.Ph.D., Information Studies -- Drexel University, 201
The discourse of tourism and national heritage: a constrastive study from a cultural perspective
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Filosofía y Letras, Departamento de Filología Inglesa. Fecha de lectura: 20-11-2014This thesis presents a research study in the field of online tourism promotion. It
focuses on the national online promotion of UNESCO World Heritage Sites, in two
different types of websites –institutional and commercial– from three countries, Great
Britain, Spain and Romania. The study analyses the way each country presents its
national landmarks and combines various modes to create a virtual brochure with a
promotional message from both institutional and commercial positions. For this, it studies
the organization of the websites and their webpages, as well as the lexico-grammatical
and visual features of the promotional messages. Results of the different analyses are
interpreted from a cultural perspective.
The theoretical framework for the analysis is Systemic Functional Linguistics. The
linguistic text is analysed following Halliday’s theory of the metafunctions (1985, 1994;
Halliday and Matthiessen 2004). Thus, the analysis focuses on the ideational,
interpersonal and textual meanings of the verbal message. Analysis of the visual text
applies Kress and van Leeuwen’s model (1996, 2006), studying the same types of
meanings realised visually.
The results of the different analyses are compared from two perspectives: in relation
to the types of websites and to the countries in which they were produced. Comparison
between institutional and commercial websites reveals a pattern in which the similarities
seem to be related to characteristics typical of web organization and layout, tourist
promotion and specific topic, while differences reflect the types of websites and their
functions. However, when the websites are compared from the point of view of the
different countries, a number of national characteristics of web promotion, common to
the two functions of websites are revealed. These are further interpreted from a cultural
point of view, showing that the findings can be accounted for by the context dimension of
cultural variability (Hall 1976, 2000; Hall and Hall 1990). The British and Spanish sets of
websites are, in general, consistent with the literature on intercultural communication
consulted (Hall 2000; Würtz 2005; Neuliep 2006; Şerbănescu 2007), whereas the
Romanian sets do not follow the pattern for its usual classification as a high-context
culture, but combine features of both low- and high-contexts. The consistencies seem to
indicate the stability of British and Spanish cultures. At the same time, departure from the
cultural contextual patterns exists in all the cases analysed. These inconsistencies can be
explained by cultural changes and influences due to globalization, and internal changes in
terms of politics, economy and society. They also indicate that cultural patterns can be
affected by the medium of communication (Internet) and the context of communication
(types of promotion).
Findings from the thesis emphasize the need for an understanding of multimodality
and interculturality in online tourism promotion, especially as applied to creating an
image or brand for a country's successful international promotion They show that
Systemic Functional Linguistics offers a useful tool from both theoretical and practical
perspective which can be applied to areas like composition of promotional messages,
online promotion, tourism discourse and its strategies, or intercultural communicationEsta tesis presenta un estudio de investigación en el campo de la promoción turística
por internet. Específicamente, analiza la promoción nacional por internet de los Sitios
Patrimonio de la Humanidad de la UNESCO, en dos tipos de sitios web – institucional y
comercial – de tres países, Gran Bretaña, España y Rumanía. El estudio analiza el modo
en el cual cada país presenta sus objetivos turísticos nacionales y combina varias
modalidades para crear un folleto virtual con un mensaje promocional desde ambas
posiciones, institucional y comercial. Para esto, estudia tanto la organización de los sitios
web y sus páginas, como las características léxico-gramaticales y visuales de los
mensajes promocionales. Los resultados de los diferentes análisis se interpretan desde
una perspectiva cultural.
El marco teórico utilizado para el análisis es la Lingüística Sistémico-Funcional. El
texto lingüístico es analizado siguiendo la teoría de las metafunciones de Halliday (1985,
1994; Halliday y Matthiessen 2004). Así, el análisis se centra en los significados
ideacional, interpersonal y textual de los mensajes verbales. El análisis del texto visual
aplica el modelo de Kress y van Leeuwen (1996, 2006), estudiando los mismos tipos de
significados realizados visualmente.
Los resultados de los diferentes análisis se comparan desde dos perspectivas: en
relación con los tipos de sitios web y con los países de donde proceden. Las
comparaciones entre los sitios web institucionales y comerciales revelan un patrón en el
cual las similitudes parecen relacionadas con las características típicas de la organización
y disposición de la web, promoción turística y tema específico, mientras que las
diferencias reflejan los tipos de sitios web y sus funciones. Sin embargo, cuando los sitios
web se comparan desde el punto de vista de las diferentes culturas, se revela un número
de características nacionales de la promoción en línea, comunes en las dos funciones de
los sitios web. Estas características nacionales se interpretan más a fondo desde un punto
de vista cultural, mostrando que los resultados pueden ser explicados por la dimensión
del "contexto" de la variabilidad cultural (Hall 1976, 2000; Hall and Hall 1990). Los
córpora de sitios web británicos y españoles son, en general, congruentes con los estudios
sobre la comunicación intercultural consultados (Hall 2000; Würtz 2005; Neuliep 2006;
Şerbănescu 2007), mientras que los córpora rumanos no siguen el patrón de su
clasificación usual como cultura de contexto alto, sino que combinan características de
ambos contextos, bajo y alto. Las consistencias parecen indicar la estabilidad de las
culturas británica y española. Al mismo tiempo, existen desviaciones de los patrones
culturales contextuales en todos los casos analizados. Estas inconsistencias se pueden
explicar por los cambios culturales y las influencias debidas a la globalización y los
cambios internos en términos de política, economía y sociedad. También indican que los
patrones culturales pueden ser afectados por el medio de comunicación (internet) y el
contexto de comunicación (tipo de promoción).
Los resultados de la tesis ponen de relieve la necesidad de una comprensión de la
multimodalidad y la interculturalidad en la promoción turística por internet,
especialmente en relación a la creación de una imagen o marca para la promoción
internacional de un país. Demuestran que la Lingüística Sistémico-Funcional ofrece una
herramienta útil, tanto desde la perspectiva teórica como de la práctica, que se puede
aplicar a áreas como la composición de mensajes promocionales, la promoción por
internet, el discurso del turismo y sus estrategias, o la comunicación intercultura
Automatic Document Summarization Using Knowledge Based System
This dissertation describes a knowledge-based system to create abstractive summaries of documents by generalizing new concepts, detecting main topics and creating new sentences. The proposed system is built on the Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The system is unsupervised and domain independent. Its domain knowledge is provided by the comprehensive ontology of common sense knowledge contained in the Cyc knowledge base. The system described in this dissertation generates coherent and topically related new sentences as a summary for a given document. It uses syntactic structure and semantic features of the given documents to fuse information. It makes use of the knowledge base as a source of domain knowledge. Furthermore, it uses the reasoning engine to generalize novel information.
The proposed system consists of three main parts: knowledge acquisition, knowledge discovery, and knowledge representation. Knowledge acquisition derives syntactic structure of each sentence in the document and maps words and their syntactic relationships into Cyc knowledge base. Knowledge discovery abstracts novel concepts, not explicitly mentioned in the document by exploring the ontology of mapped concepts and derives main topics described in the document by clustering the concepts. Knowledge representation creates new English sentences to summarize main concepts and their relationships. The syntactic structure of the newly created sentences is extended beyond simple subject-predicate-object triplets by incorporating adjective and adverb modifiers. This structure allows the system to create sentences that are more complex. The proposed system was implemented and tested. Test results show that the system is capable of creating new sentences that include abstracted concepts not mentioned in the original document and is capable of combining information from different parts of the document text to compose a summary
Recommended from our members
An Investigation Into The Blogging Practices Of Academics And Researchers
This research project investigated the experiences of academics and researchers using blogs to support their practice. The three research questions were: to identify the academics' and researchers' motivations for beginning and maintaining a blog, the contribution of blogging to their learning in the profession, and the challenges experienced.
The research questions were investigated using several methods. Five datasets were collected from 26 participants. A questionnaire was first administered to collect background information about the bloggers, and was analysed quantitatively. Then, an initial unstructured interview of one open-ended question was conducted by email. The unstructured interview was analysed using descriptive phenomenology. A follow-on semi-structured interview was conducted and analysed by applying thematic analysis. Blog content was collected in parallel: textual extracts were analysed using discourse analysis and visual extracts by applying thematic/saliency analysis.
Results revealed varied reasons for beginning a blog. For example, the blog can be used as a repository of 'half-baked' ideas. Blogging contributed to the academics' and researchers' learning in the profession in multiple ways. Academic bloggers, for example, can quickly reach a wider audience compared to other forms of academic publishing. Among the challenges, there were concerns over managing confidential information in public, and intellectual property issues. Regarding the methodological contribution of the research, suggestions on strategies for mixing and matching different research methods for data collection and analysis have been provided.
An empirically-grounded framework of blog use in academia and research has been derived based on research findings and scholarship models in the literature. The framework describes how characteristics of digital scholarship such as openness and sharing, are manifested through blogging. The framework can be used to guide academics and researchers who are interested in taking up blogging as a scholarly practice.
Finally, empirically-grounded guidelines on using blogs in academia and research have been derived. The guidelines were evaluated by four practitioners. Future work includes recruiting more practitioners to evaluate the guidelines
- …