377 research outputs found

    Quantitative Redundancy in Partial Implications

    Get PDF
    We survey the different properties of an intuitive notion of redundancy, as a function of the precise semantics given to the notion of partial implication. The final version of this survey will appear in the Proceedings of the Int. Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201

    Analyzing Twitter Feeds to Facilitate Crises Informatics and Disaster Response During Mass Emergencies

    Get PDF
    It is a common practice these days for general public to use various micro-blogging platforms, predominantly Twitter, to share ideas, opinions and information about things and life. Twitter is also being increasingly used as a popular source of information sharing during natural disasters and mass emergencies to update and communicate the extent of the geographic phenomena, report the affected population and casualties, request or provide volunteering services and to share the status of disaster recovery process initiated by humanitarian-aid and disaster-management organizations. Recent research in this area has affirmed the potential use of such social media data for various disaster response tasks. Even though the availability of social media data is massive, open and free, there is a significant limitation in making sense of this data because of its high volume, variety, velocity, value, variability and veracity. The current work provides a comprehensive framework of text processing and analysis performed on several thousands of tweets shared on Twitter during natural disaster events. Specifically, this work em- ploys state-of-the-art machine learning techniques from natural language processing on tweet content to process the ginormous data generated at the time of disasters. This study shall serve as a basis to provide useful actionable information to the crises management and mitigation teams in planning and preparation of effective disaster response and to facilitate the development of future automated systems for handling crises situations

    Knowledge-Driven Implicit Information Extraction

    Get PDF
    Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner. This is a feature that is used extensively in day to day communications in situations such as: 1) expressing sarcasm, 2) when trying to recall forgotten things, 3) when required to convey descriptive information, 4) when emphasizing the features of an entity, and 5) when communicating a common understanding. Consider the tweet New Sandra Bullock astronaut lost in space movie looks absolutely terrifying and the text snippet extracted from a clinical narrative He is suffering from nausea and severe headaches. Dolasteron was prescribed . The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea . Such implicit references of the entities and the relationships are common occurrences in daily communication and they add value to conversations. However, extracting implicit constructs has not received enough attention in the information extraction literature. This dissertation focuses on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets. When people use implicit constructs in their daily communication, they assume the existence of a shared knowledge with the audience about the subject being discussed. This shared knowledge helps to decode implicitly conveyed information. For example, the above Twitter user assumed that his/her audience knows that the actress Sandra Bullock starred in the movie Gravity and it is a movie about space exploration. The clinical professional who wrote the clinical narrative above assumed that the reader knows that Dolasteron is an anti-nausea drug. The audience without such domain knowledge may not have correctly decoded the information conveyed in the above examples. This dissertation demonstrates manifestations of implicit constructs in text, studies their characteristics, and develops a software solution that is capable of extracting implicit information from text. The developed solution starts by acquiring relevant knowledge to solve the implicit information extraction problem. The relevant knowledge includes domain knowledge, contextual knowledge, and linguistic knowledge. The acquired knowledge can take different syntactic forms such as a text snippet, structured knowledge represented in standard knowledge representation languages such as the Resource Description Framework (RDF) or other custom formats. Hence, the acquired knowledge is pre-processed to create models that can be processed by machines. Such models provide the infrastructure to perform implicit information extraction. This dissertation focuses on three different use cases of implicit information and demonstrates the applicability of the developed solution in these use cases. They are: 1) implicit entity linking in clinical narratives, 2) implicit entity linking in Twitter, and 3) implicit relationship extraction from clinical narratives. The evaluations are conducted on relevant annotated datasets for implicit information and they demonstrate the effectiveness of the developed solution in extracting implicit information from text

    Building News Measures from Textual Data and an Application to Volatility Forecasting

    Get PDF
    We retrieve news stories and earnings announcements of the S&P 100 constituents from two professional news providers, along with tenmacroeconomic indicators.We also gather data fromGoogle Trends about these firms’ assets as an index of retail investors’ attention. Thus, we create an extensive and innovative database that contains precise information with which to analyze the link between news and asset price dynamics. We detect the sentiment of news stories using a dictionary of sentiment-related words and negations and propose a set of more than five thousand information-based variables that provide natural proxies for the information used by heterogeneousmarket players. We first shed light on the impact of information measures on daily realized volatility and select them by penalized regression. Then,we performa forecasting exercise and showthat themodel augmentedwith news-related variables provides superior forecasts

    FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT

    Get PDF
    Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text

    Text Mining to Support Knowledge Discovery from Electronic Health Records

    Get PDF
    The use of electronic health records (EHRs) has grown rapidly in the last decade. The EHRs are no longer being used only for storing information for clinical purposes but the secondary use of the data in the healthcare research has increased rapidly as well. The data in EHRs are recorded in a structured manner as much as possible, however, many EHRs often also contain large amount of unstructured free‐text. The structured and unstructured clinical data presents several challenges to the researchers since the data are not primarily collected for research purposes. The issues related to structured data can be missing data, noise, and inconsistency. The unstructured free-text is even more challenging to use since they often have no fixed format and may vary from clinician to clinician and from database to database. Text and data mining techniques are increasingly being used to effectively and efficiently process large EHRs for research purposes. Most of the me

    Knowledge Modelling and Learning through Cognitive Networks

    Get PDF
    One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot

    Adaptive sentiment analysis

    Get PDF
    Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts
    corecore