11 research outputs found

    A Big Data Platform for Real Time Analysis of Signs of Depression in Social Media

    Get PDF
    In this paper we propose a scalable platform for real-time processing of Social Media data. The platform ingests huge amounts of contents, such as Social Media posts or comments, and can support Public Health surveillance tasks. The processing and analytical needs of multiple screening tasks can easily be handled by incorporating user-defined execution graphs. The design is modular and supports different processing elements, such as crawlers to extract relevant contents or classifiers to categorise Social Media. We describe here an implementation of a use case built on the platform that monitors Social Media users and detects early signs of depressionThis work was funded by FEDER/Ministerio de Ciencia, Innovación y Universidades—Agencia Estatal de Investigación/ Project (RTI2018-093336-B-C21). Our research also receives financial support from the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/04, ED431C 2018/29, ED431C 2018/19) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University SystemS

    Enhancing the interactivity of a clinical decision support system by using knowledge engineering and natural language processing

    Get PDF
    Mental illness is a serious health problem and it affects many people. Increasingly,Clinical Decision Support Systems (CDSS) are being used for diagnosis and it is important to improve the reliability and performance of these systems. Missing a potential clue or a wrong diagnosis can have a detrimental effect on the patient's quality of life and could lead to a fatal outcome. The context of this research is the Galatean Risk and Safety Tool (GRiST), a mental-health-risk assessment system. Previous research has shown that success of a CDSS depends on its ease of use, reliability and interactivity. This research addresses these concerns for the GRiST by deploying data mining techniques. Clinical narratives and numerical data have both been analysed for this purpose.Clinical narratives have been processed by natural language processing (NLP)technology to extract knowledge from them. SNOMED-CT was used as a reference ontology and the performance of the different extraction algorithms have been compared. A new Ensemble Concept Mining (ECM) method has been proposed, which may eliminate the need for domain specific phrase annotation requirements. Word embedding has been used to filter phrases semantically and to build a semantic representation of each of the GRiST ontology nodes.The Chi-square and FP-growth methods have been used to find relationships between GRiST ontology nodes. Interesting patterns have been found that could be used to provide real-time feedback to clinicians. Information gain has been used efficaciously to explain the differences between the clinicians and the consensus risk. A new risk management strategy has been explored by analysing repeat assessments. A few novel methods have been proposed to perform automatic background analysis of the patient data and improve the interactivity and reliability of GRiST and similar systems

    Mining Social Media to Understand Consumers' Health Concerns and the Public's Opinion on Controversial Health Topics.

    Full text link
    Social media websites are increasingly used by the general public as a venue to express health concerns and discuss controversial medical and public health issues. This information could be utilized for the purposes of public health surveillance as well as solicitation of public opinions. In this thesis, I developed methods to extract health-related information from multiple sources of social media data, and conducted studies to generate insights from the extracted information using text-mining techniques. To understand the availability and characteristics of health-related information in social media, I first identified the users who seek health information online and participate in online health community, and analyzed their motivations and behavior by two case studies of user-created groups on MedHelp and a diabetes online community on Twitter. Through a review of tweets mentioning eye-related medical concepts identified by MetaMap, I diagnosed the common reasons of tweets mislabeled by natural language processing tools tuned for biomedical texts, and trained a classifier to exclude non medically-relevant tweets to increase the precision of the extracted data. Furthermore, I conducted two studies to evaluate the effectiveness of understanding public opinions on controversial medical and public health issues from social media information using text-mining techniques. The first study applied topic modeling and text summarization to automatically distill users' key concerns about the purported link between autism and vaccines. The outputs of two methods cover most of the public concerns of MMR vaccines reported in previous survey studies. In the second study, I estimated the public's view on the ac{ACA} by applying sentiment analysis to four years of Twitter data, and demonstrated that the the rates of positive/negative responses measured by tweet sentiment are in general agreement with the results of Kaiser Family Foundation Poll. Finally, I designed and implemented a system which can automatically collect and analyze online news comments to help researchers, public health workers, and policy makers to better monitor and understand the public's opinion on issues such as controversial health-related topics.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120714/1/owenliu_1.pd

    The Genitive Ratio and its Applications

    Get PDF
    The genitive ratio (GR) is a novel method of classifying nouns as animate, concrete or abstract. English has two genitive (possessive) constructions: possessive-s (the boy's head) and possessive-of (the head of the boy). There is compelling evidence that preference for possessive-s is strongly influenced by the possessor's animacy. A corpus analysis that counts each genitive construction in three conditions (definite, indefinite and no article) confirms that occurrences of possessive-s decline as the animacy hierarchy progresses from animate through concrete to abstract. A computer program (Animyser) is developed to obtain results-counts from phrase-searches of Wikipedia that provide multiple genitive ratios for any target noun. Key ratios are identified and algorithms developed, with specific applications achieving classification accuracies of over 80%. The algorithms, based on logistic regression, produce a score of relative animacy that can be applied to individual nouns or to texts. The genitive ratio is a tool with potential applications in any research domain where the relative animacy of language might be significant. Three such applications exemplify that. Combining GR analysis with other factors might enhance established co-reference (anaphora) resolution algorithms. In sentences formed from pairings of animate with concrete or abstract nouns, the animate noun is usually salient, more likely to be the grammatical subject or thematic agent, and to co-refer with a succeeding pronoun or noun-phrase. Two experiments, online sentence production and corpus-based, demonstrate that the GR algorithm reliably predicts the salient noun. Replication of the online experiment in Italian suggests that the GR might be applied to other languages by using English as a 'bridge'. In a mental health context, studies have indicated that Alzheimer's patients' language becomes progressively more concrete; depressed patients' language more abstract. Analysis of sample texts suggests that the GR might monitor the prognosis of both illnesses, facilitating timely clinical interventions

    What text mining analysis of psychotherapy records can tell us about therapy process and outcome

    Get PDF
    Increasing demand for mental health treatment and the transfer of a large portion of our lives online has led to the development of a growing range of computerized psychological therapy programmes. We are also creating and storing data at ever increasing rates, a trend that has led to the development of sophisticated textual analysis approaches. This thesis sits at the cross-section of these evolving areas. It is an exploratory analysis of how text mining analysis can be applied to online cognitive behaviour therapy. The project emerged as a collaboration between two commercial partners: Ieso Digital Health and Linguamatics, and UCL. Ieso Digital Health provide online cognitive behaviour therapy via an online instant messaging platform and Linguamatics are the developers of text mining software I2E. The involvement of the two industrial partners in this project shaped two major components of this research; the data studied and the platform for textual analysis. Linguistic analysis of textual data in mental health is a wide and variable field that brings together a variety of methods and data formats. These are broadly introduced in Chapter 1 and Chapter 2 provides a systematic review of research on the analysis of language used within therapeutic exchanges during mental health treatment. The research carried out in this thesis involved the development of a number of linguistic features within I2E and statistical analyses to explore their association with mental health outcomes and the development of predictive models of outcome. The results (Chapters 4-10) suggested that there were statistically significant associations between selected language features and therapy outcome scores but that these language features did not fare well as predictors of outcome when developed models were externally validated. These results and recommendations for the application of text mining in therapy transcripts are discussed in Chapter 11

    Cyber Threat Intelligence based Holistic Risk Quantification and Management

    Get PDF

    Congress UPV Proceedings of the 21ST International Conference on Science and Technology Indicators

    Get PDF
    This is the book of proceedings of the 21st Science and Technology Indicators Conference that took place in València (Spain) from 14th to 16th of September 2016. The conference theme for this year, ‘Peripheries, frontiers and beyond’ aimed to study the development and use of Science, Technology and Innovation indicators in spaces that have not been the focus of current indicator development, for example, in the Global South, or the Social Sciences and Humanities. The exploration to the margins and beyond proposed by the theme has brought to the STI Conference an interesting array of new contributors from a variety of fields and geographies. This year’s conference had a record 382 registered participants from 40 different countries, including 23 European, 9 American, 4 Asia-Pacific, 4 Africa and Near East. About 26% of participants came from outside of Europe. There were also many participants (17%) from organisations outside academia including governments (8%), businesses (5%), foundations (2%) and international organisations (2%). This is particularly important in a field that is practice-oriented. The chapters of the proceedings attest to the breadth of issues discussed. Infrastructure, benchmarking and use of innovation indicators, societal impact and mission oriented-research, mobility and careers, social sciences and the humanities, participation and culture, gender, and altmetrics, among others. We hope that the diversity of this Conference has fostered productive dialogues and synergistic ideas and made a contribution, small as it may be, to the development and use of indicators that, being more inclusive, will foster a more inclusive and fair world

    Recognition of off-line handwritten cursive text

    Get PDF
    The author presents novel algorithms to design unconstrained handwriting recognition systems organized in three parts: In Part One, novel algorithms are presented for processing of Arabic text prior to recognition. Algorithms are described to convert a thinned image of a stroke to a straight line approximation. Novel heuristic algorithms and novel theorems are presented to determine start and end vertices of an off-line image of a stroke. A straight line approximation of an off-line stroke is converted to a one-dimensional representation by a novel algorithm which aims to recover the original sequence of writing. The resulting ordering of the stroke segments is a suitable preprocessed representation for subsequent handwriting recognition algorithms as it helps to segment the stroke. The algorithm was tested against one data set of isolated handwritten characters and another data set of cursive handwriting, each provided by 20 subjects, and has been 91.9% and 91.8% successful for these two data sets, respectively. In Part Two, an entirely novel fuzzy set-sequential machine character recognition system is presented. Fuzzy sequential machines are defined to work as recognizers of handwritten strokes. An algorithm to obtain a deterministic fuzzy sequential machine from a stroke representation, that is capable of recognizing that stroke and its variants, is presented. An algorithm is developed to merge two fuzzy machines into one machine. The learning algorithm is a combination of many described algorithms. The system was tested against isolated handwritten characters provided by 20 subjects resulting in 95.8% recognition rate which is encouraging and shows that the system is highly flexible in dealing with shape and size variations. In Part Three, also an entirely novel text recognition system, capable of recognizing off-line handwritten Arabic cursive text having a high variability is presented. This system is an extension of the above recognition system. Tokens are extracted from a onedimensional representation of a stroke. Fuzzy sequential machines are defined to work as recognizers of tokens. It is shown how to obtain a deterministic fuzzy sequential machine from a token representation that is capable'of recognizing that token and its variants. An algorithm for token learning is presented. The tokens of a stroke are re-combined to meaningful strings of tokens. Algorithms to recognize and learn token strings are described. The. recognition stage uses algorithms of the learning stage. The process of extracting the best set of basic shapes which represent the best set of token strings that constitute an unknown stroke is described. A method is developed to extract lines from pages of handwritten text, arrange main strokes of extracted lines in the same order as they were written, and present secondary strokes to main strokes. Presented secondary strokes are combined with basic shapes to obtain the final characters by formulating and solving assignment problems for this purpose. Some secondary strokes which remain unassigned are individually manipulated. The system was tested against the handwritings of 20 subjects yielding overall subword and character recognition rates of 55.4% and 51.1%, respectively

    The social and psychological work of metaphor: a corpus linguistic investigation

    Get PDF
    This thesis investigates the triangular relationship between metaphor use, community, and state of mind, to ask the question: what social and psychological work does metaphor do, in the computer-mediated discourse setting of an online forum. The thesis goes beyond the finding and grouping of metaphors for analysis to consider the pattern of metaphor use over time in terms of (i) surrounding language style; (ii) density of use; and (iii) use by different participant groups. In achieving its aim the thesis provides insights into (i) the effect of metaphor use in terms of state of mind; (ii) the role of metaphor in the characterisation of a community; and (iii) methods for considering linguistic metaphor in naturally occurring discourse in terms of its psychological effect, which also creates insights into metaphor theory. The primary novel contribution of the thesis is to combine an analysis of metaphor use with an analysis of the language style that surrounds it, using established research relating language style to state of mind to consider the social and psychological work that metaphor does. The primary prediction of the investigation is that where metaphor is used to characterise a concept, the surrounding language will be of a style that has been found to be associated with better mental health. This is related to and supported by the second novel contribution of the thesis, which is to consider the role of metaphor in the formation and evolution of a community over time, by considering change in density of metaphor and other key variables in the data as a whole, and for comparative participant groups. The third novel contribution of the thesis is that, alongside more established corpus linguistic techniques, new techniques from the fast-evolving areas of data science and natural language processing are explored and evaluated in terms of (i) finding metaphors in the corpora; (ii) analysing language style; and (iii) diachronic analysis. It is shown that use of the identified dominant metaphor themes in each community co-occurs with specific language styles associated with mental health, and that this work of metaphor evolves over time as a consensus which becomes normative within the group for a period, such that it shapes community members as well as being shaped by them, while the flexibility of metaphor still leaves that work open to further evolution. The adaptation and prominence of particular metaphor themes over time to do particular work in each forum also underpins the characterisation of it as a particular community
    corecore