281 research outputs found

    PaperRobot: Incremental Draft Generation of Scientific Ideas

    Full text link
    We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper. Turing Tests, where a biomedical domain expert is asked to compare a system output and a human-authored string, show PaperRobot generated abstracts, conclusion and future work sections, and new titles are chosen over human-written ones up to 30%, 24% and 12% of the time, respectively.Comment: 12 pages. Accepted by ACL 2019 Code and resource is available at https://github.com/EagleW/PaperRobo

    Computational Methods for Analyzing Health News Coverage

    Get PDF
    Researchers that investigate the media's coverage of health have historically relied on keyword searches to retrieve relevant health news coverage, and manual content analysis methods to categorize and score health news text. These methods are problematic. Manual content analysis methods are labor intensive, time consuming, and inherently subjective because they rely on human coders to review, score, and annotate content. Retrieving relevant health news coverage using keywords can be challenging because manually defining an optimal keyword query, especially for complex health topics and media analysis concepts, can be very difficult, and the optimal query may vary based on when the news was published, the type of news published, and the target audience of the news coverage. This dissertation research investigated computational methods that can assist health news investigators by facilitating these tasks. The first step was to identify the research methods currently used by investigators, and the research questions and health topics researchers tend to investigate. To capture this information an extensive literature review of health news analyses was performed. No literature review of this type and scope could be found in the research literature. This review confirmed that researchers overwhelmingly rely on manual content analysis methods to analyze the text of health news coverage, and on the use of keyword searching to identify relevant health news articles. To investigate the use of computational methods for facilitating these tasks, classifiers that categorize health news on relevance to the topic of obesity, and on their news framing were developed and evaluated. The obesity news classifier developed for this dissertation outperformed alternative methods, including searching based on keyword appearance. Classifying on the framing of health news proved to be a more difficult task. The news framing classifiers performed well, but the results suggest that the underlying features of health news coverage that contribute to the framing of health news are a richer and more useful source of framing information rather than binary news framing classifications. The third step in this dissertation was to use the findings of the literature review and the classifier studies to design the SalientHealthNews system. The purpose of SalientHealthNews is to facilitate the use of computational and data mining techniques for health news investigation, hypothesis testing, and hypothesis generation. To illustrate the use of SalientHealthNews' features and algorithms, it was used to generate preliminary data for a study investigating how framing features vary in health and obesity news coverage that discusses populations with health disparities. This research contributes to the study of the media's coverage of health by providing a detailed description of how health news is studied and what health news topics are investigated, then by demonstrating that certain tasks performed in health news analyses can be facilitated by computational methods, and lastly by describing the design of a system that will facilitate the use of computational and data mining techniques for the study of health news. These contributions should further the study of health news by expanding the methods available to health news analysis researchers. This will lead to researchers being better equipped to accurately and consistently evaluate the media's coverage of health. Knowledge of the quality of health news coverage should in turn lead to better informed health journalists, healthcare providers, and healthcare consumers, ultimately improving individual and public health

    Smart literature review:a practical topic modelling approach to exploratory literature review

    Get PDF

    N-gram analysis of 970 microbial organisms reveals presence of biological language models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been suggested previously that genome and proteome sequences show characteristics typical of natural-language texts such as "signature-style" word usage indicative of authors or topics, and that the algorithms originally developed for natural language processing may therefore be applied to genome sequences to draw biologically relevant conclusions. Following this approach of 'biological language modeling', statistical n-gram analysis has been applied for comparative analysis of whole proteome sequences of 44 organisms. It has been shown that a few particular amino acid n-grams are found in abundance in one organism but occurring very rarely in other organisms, thereby serving as genome signatures. At that time proteomes of only 44 organisms were available, thereby limiting the generalization of this hypothesis. Today nearly 1,000 genome sequences and corresponding translated sequences are available, making it feasible to test the existence of biological language models over the evolutionary tree.</p> <p>Results</p> <p>We studied whole proteome sequences of 970 microbial organisms using n-gram frequencies and cross-perplexity employing the Biological Language Modeling Toolkit and Patternix Revelio toolkit. Genus-specific signatures were observed even in a simple unigram distribution. By taking statistical n-gram model of one organism as reference and computing cross-perplexity of all other microbial proteomes with it, cross-perplexity was found to be predictive of branch distance of the phylogenetic tree. For example, a 4-gram model from proteome of <it>Shigellae flexneri 2a</it>, which belongs to the <it>Gammaproteobacteria </it>class showed a self-perplexity of 15.34 while the cross-perplexity of other organisms was in the range of 15.59 to 29.5 and was proportional to their branching distance in the evolutionary tree from <it>S. flexneri</it>. The organisms of this genus, which happen to be pathotypes of <it>E.coli</it>, also have the closest perplexity values with <it>E. coli.</it></p> <p>Conclusion</p> <p>Whole proteome sequences of microbial organisms have been shown to contain particular n-gram sequences in abundance in one organism but occurring very rarely in other organisms, thereby serving as proteome signatures. Further it has also been shown that perplexity, a statistical measure of similarity of n-gram composition, can be used to predict evolutionary distance within a genus in the phylogenetic tree.</p

    Explainability for Machine Learning Models: From Data Adaptability to User Perception

    Full text link
    This thesis explores the generation of local explanations for already deployed machine learning models, aiming to identify optimal conditions for producing meaningful explanations considering both data and user requirements. The primary goal is to develop methods for generating explanations for any model while ensuring that these explanations remain faithful to the underlying model and comprehensible to the users. The thesis is divided into two parts. The first enhances a widely used rule-based explanation method. It then introduces a novel approach for evaluating the suitability of linear explanations to approximate a model. Additionally, it conducts a comparative experiment between two families of counterfactual explanation methods to analyze the advantages of one over the other. The second part focuses on user experiments to assess the impact of three explanation methods and two distinct representations. These experiments measure how users perceive their interaction with the model in terms of understanding and trust, depending on the explanations and representations. This research contributes to a better explanation generation, with potential implications for enhancing the transparency, trustworthiness, and usability of deployed AI systems.Comment: PhD Thesi

    Tackling Wicked Food Issues: Applying the Wicked Problems Approach in Higher Education to Promote Healthy Eating Habits in American School Children

    Get PDF
    Life-long healthy eating habits linked with sustainable local agricultural practices, as “wicked problems” in the United States, are intractable, on-going, and high-stakes issues. An interdisciplinary university course was developed to engage students in participatory research and fieldwork on the inextricably linked dimensions of food, health, and sustainability. Students worked with community partners, stakeholders, and experts to address the specific interdisciplinary issues of diet and promotion of healthy eating habits in American school children. Using a “bottom-up” approach, students co-developed projects with stakeholders (including school children) to empower movement for change. This interactive research process created an iterative feedback loop which fostered more inclusive and creative projects to meliorate the wicked problem at hand. Project proposals ranged from the creation of an interactive website intended for school children, to field trips to local farming communities, to “how-to” workshops for gardening and meal planning, to local tastings. Projects were, in the end, shared with and vetted by community partners for future co-implementation. Using food as an interdisciplinary agent to bring collaboration to fruition, the results of this work indicate higher education could be more effective in preparing students for our 21st century food challenges by developing experiential learning courses in partnership with food communities

    Factors that influence physical activity: Exploring the impact of demographic and built environment variables for the communities of Osceola, Independence and West Liberty, Iowa

    Get PDF
    The aim of this study was to examine recreational activity patterns and their relationship to the built environment, spatial accessibility and socio-demographic status across three different rural communities in Iowa. Data on recreational activities were derived from the results of a transportation survey (telephone, online, and mail) that is conducted on an annual basis by Iowa State University Associate Professor Julia M. Badenhope as part of the Iowa Living Roadways `Community Visioning Program General Survey\u27. The data for the three communities pertaining to this study came from the 2008 and 2010 survey results. The study sample contained 178, 105 and 160 randomly selected survey respondents for the three communities of Osceola, Independence and West Liberty, Iowa. The methodology presented could be easily adopted and implemented in future projects examining the relationship between the built environment and recreational activities. Respondents along with their corresponding demographic information and activity levels, in addition to existing park locations were mapped using the Geographic Information Systems (GIS). The Network Analyst extension tool was used to measure different socio-demographic, spatial and physical factors that could potentially influence physical activities such as walking, biking, and running and these measurements were analyzed using SPSS and JMP in order to obtain the statistical significance. In addition, Anselin\u27s Local Moran\u27s I was utilized to measure spatial autocorrelation in order to establish the presence of clusters within the communities based on the respondents\u27 recreational activity levels. Statistical analyses indicated no significant relationship among the different demographic variables and the levels of recreational activities among the survey respondents of the communities of Osceola and Independence. Association was however, found between gender and walking (2-sided p-value=0.0008) for the community of West Liberty, Iowa. Spatial analyses in conjunction with statistical results indicated significant difference for the respondents of Osceola in terms of the shortest distance to a recreational facility and the two activities of running (2-sided p-value=0.0034) and biking (2-sided p-value=0.0247). For the City of West Liberty, significant relationship was found for the shortest distance to a recreational facility and overall exercise (2-sided p-value=0.0079). In general, these results indicate for the respondents of Osceola living in close proximity to recreational facilities are more likely to run and bike, while proximity can also influence overall physical activities including `other\u27 activities for the City of West Liberty. Additionally, there was no evidence of significant clustering for the attribute of recreational activity levels of analyses. Overall, the study indicated that the relationship between the rural environment and demographic variables and recreational activity levels is not direct and more research is required to effectively measure the discrepancy in the level of physical activity in regards to the built environment and demographic variables

    Social Media Analysis for Social Good

    Get PDF
    Data on social media is abundant and offers valuable information that can be utilised for a range of purposes. Users share their experiences and opinions on various topics, ranging from their personal life to the community and the world, in real-time. In comparison to conventional data sources, social media is cost-effective to obtain, is up-to-date and reaches a larger audience. By analysing this rich data source, it can contribute to solving societal issues and promote social impact in an equitable manner. In this thesis, I present my research in exploring innovative applications using \ac{NLP} and machine learning to identify patterns and extract actionable insights from social media data to ultimately make a positive impact on society. First, I evaluate the impact of an intervention program aimed at promoting inclusive and equitable learning opportunities for underrepresented communities using social media data. Second, I develop EmoBERT, an emotion-based variant of the BERT model, for detecting fine-grained emotions to gauge the well-being of a population during significant disease outbreaks. Third, to improve public health surveillance on social media, I demonstrate how emotions expressed in social media posts can be incorporated into health mention classification using an intermediate task fine-tuning and multi-feature fusion approach. I also propose a multi-task learning framework to model the literal meanings of disease and symptom words to enhance the classification of health mentions. Fourth, I create a new health mention dataset to address the imbalance in health data availability between developing and developed countries, providing a benchmark alternative to the traditional standards used in digital health research. Finally, I leverage the power of pretrained language models to analyse religious activities, recognised as social determinants of health, during disease outbreaks
    • …
    corecore