7,210 research outputs found

    Reactive control and reasoning assistance for scientific laboratory instruments

    Get PDF
    Scientific laboratory instruments that are involved in chemical or physical sample identification frequently require substantial human preparation, attention, and interactive control during their operation. Successful real-time analysis of incoming data that supports such interactive control requires: (1) a clear recognition of variance of the data from expected results; and (2) rapid diagnosis of possible alternative hypotheses which might explain the variance. Such analysis then aids in decisions about modifying the experiment protocol, as well as being a goal itself. This paper reports on a collaborative project at the NASA Ames Research Center between artificial intelligence researchers and planetary microbial ecologists. Our team is currently engaged in developing software that autonomously controls science laboratory instruments and that provides data analysis of the real-time data in support of dynamic refinement of the experiment control. the first two instruments to which this technology has been applied are a differential thermal analyzer (DTA) and a gas chromatograph (GC). coupled together, they form a new geochemicstry and microbial analysis tool that is capable of rapid identification of the organiz and mineralogical constituents in soils. The thermal decomposition of the minerals and organics, and the attendance release of evolved gases, provides data about the structural and molecular chemistry of the soil samples

    Modeling Global Syntactic Variation in English Using Dialect Classification

    Get PDF
    This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

    Topic-dependent sentiment analysis of financial blogs

    Get PDF
    While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

    More blogging features for author identification

    Get PDF
    In this paper we present a novel improvement in the field of authorship identification in personal blogs. The improvement in authorship identification, in our work, is by utilizing a hybrid collection of linguistic features that best capture the style of users in diaries blogs. The features sets contain LIWC with its psychology background, a collection of syntactic features & part-of-speech (POS), and the misspelling errors features. Furthermore, we analyze the contribution of each feature set on the final result and compare the outcome of using different combination from the selected feature sets. Our new categorization of misspelling words which are mapped into numerical features, are noticeably enhancing the classification results. The paper also confirms the best ranges of several parameters that affect the final result of authorship identification such as the author numbers, words number in each post, and the number of documents/posts for each author/user. The results and evaluation show that the utilized features are compact, while their performance is highly comparable with other much larger feature sets
    • ā€¦
    corecore