13,985 research outputs found

    Matching MEDLINE/PubMed data with Web of Science (WoS): a routine in R language

    Get PDF
    We present a novel routine, namely medlineR, based on R language, that enables the user to match data from MEDLINE/PubMed with records indexed in the ISI Web of Science (WoS) database. The matching allows exploiting the rich and controlled vocabulary of Medical Sub- ject Headings (MeSH) of MEDLINE/PubMed with additional fields of WoS. The integration provides data (e.g. citation data, list of cited reference, list of the addresses of authors’ host organisations, WoS subject categories) to perform a variety of scientometric analyses. This brief communication describes medlineR, the methodology on which it relies, and the steps the user should follow to perform the matching across the two databases. In order to specify the differences from Leydesdorff and Opthof (2013), we conclude the brief communication by testing the routine on the case of the "Burgada Syndrome"

    Computation of multiple correspondence analysis, with code in R

    Get PDF
    The generalization of simple correspondence analysis, for two categorical variables, to multiple correspondence analysis where they may be three or more variables, is not straighforward, both from a mathematical and computational point of view. In this paper we detail the exact computational steps involved in performing a multiple correspondence analysis, including the special aspects of adjusting the principal inertias to correct the percentages of inertia, supplementary points and subset analysis. Furthermore, we give the algorithm for joint correspondence analysis where the cross-tabulations of all unique pairs of variables are analysed jointly. The code in the R language for every step of the computations is given, as well as the results of each computation.Adjustment of principal inertias, Burt matrix, correspondence analysis, multiple correspondence analysis, R language, singular value decomposition, subset analysis

    Web Scraping in the R Language: A Tutorial

    Get PDF
    Information Systems researchers can now more easily access vast amounts of data on the World Wide Web to answer both familiar and new questions with more rigor, precision, and timeliness. The main goal of this tutorial is to explain how Information Systems researchers can automatically “scrape” data from the web using the R programming language. This article provides a conceptual overview of the Web Scraping process. The tutorial discussion is about two R packages useful for Web Scraping: “rvest” and “xml2”. Simple examples of web scraping involving these two packages are provided. This tutorial concludes with an example of a complex web scraping task involving retrieving data from Bayt.com - a leading employment website in the Middle East

    Some Notes on the Past and Future of Lisp-Stat

    Get PDF
    Lisp-Stat was originally developed as a framework for experimenting with dynamic graphics in statistics. To support this use, it evolved into a platform for more general statistical computing. The choice of the Lisp language as the basis of the system was in part coincidence and in part a very deliberate decision. This paper describes the background behind the choice of Lisp, as well as the advantages and disadvantages of this choice. The paper then discusses some lessons that can be drawn from experience with Lisp-Stat and with the R language to guide future development of Lisp-Stat, R, and similar systems.

    Using garch algorithm to analyze data in R language

    Get PDF
    One of the challenging aspects of conditional heteroskedasticity series is that if we were to plot the correlogram of a series with volatility we might still see what appears to be a realisation of stationary discrete white noise. That is, the volatility itself is hard to detect purely from the correlogram. This is despite the fact that the series is most definitely non-stationary as its variance is not constant in time. So ARCH and GARCH models have become important tools in the analysis of time series data, particularly in financial applications. These models are especially useful when the goal of the study is to analyze and forecast volatility. This paper gives the motivation behind the simplest GARCH model and illustrates its usefulness in examining portfolio risk. So an ARCH (autoregressive conditionally heteroskedasticity) model is a model for the variance of a time series. ARCH models are used to describe a changing, possibly volatile variance. Although an ARCH model could possibly be used to describe a gradually increasing variance over time, most often it is used in situations in which there may be short periods of increased variation. (Gradually increasing variance connected to a gradually increasing mean level might be better handled by transforming the variable). In this article we will see what is ARCH and GARCH, how it’s helpful for analyzing economic and financial data and how to use it in R-Studio

    On testing the significance of sets of genes

    Full text link
    This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. [Proc. Natl. Acad. Sci. USA 102 (2005) 15545--15550]. We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that implements our ideas.Comment: Published at http://dx.doi.org/10.1214/07-AOAS101 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore