89,739 research outputs found

    The Role of Preprocessing for Word Representation Learning in Affective Tasks

    Get PDF
    Affective tasks, including sentiment analysis, emotion classification, and sarcasm detection have drawn a lot of attention in recent years due to a broad range of useful applications in various domains. The main goal of affect detection tasks is to recognize states such as mood, sentiment, and emotions from textual data (e.g., news articles or product reviews). Despite the importance of utilizing preprocessing steps in different stages (i.e., word representation learning and building a classification model) of affect detection tasks, this topic has not been studied well. To that end, we explore whether applying various preprocessing methods (stemming, lemmatization, stopword removal, punctuation removal and so on) and their combinations in different stages of the affect detection pipeline can improve the model performance. The are many preprocessing approaches that can be utilized in affect detection tasks. However, their influence on the final performance depends on the type of preprocessing and the stages that they are applied. Moreover, the preprocessing impacts vary across different affective tasks. Our analysis provides thorough insights into how preprocessing steps can be applied in building an effect detection pipeline and their respective influence on performance

    Hand classification of fMRI ICA noise components

    Get PDF
    We present a practical "how-to" guide to help determine whether single-subject fMRI independent components (ICs) characterise structured noise or not. Manual identification of signal and noise after ICA decomposition is required for efficient data denoising: to train supervised algorithms, to check the results of unsupervised ones or to manually clean the data. In this paper we describe the main spatial and temporal features of ICs and provide general guidelines on how to evaluate these. Examples of signal and noise components are provided from a wide range of datasets (3T data, including examples from the UK Biobank and the Human Connectome Project, and 7T data), together with practical guidelines for their identification. Finally, we discuss how the data quality, data type and preprocessing can influence the characteristics of the ICs and present examples of particularly challenging datasets

    A statistical framework for the analysis of microarray probe-level data

    Full text link
    In microarray technology, a number of critical steps are required to convert the raw measurements into the data relied upon by biologists and clinicians. These data manipulations, referred to as preprocessing, influence the quality of the ultimate measurements and studies that rely upon them. Standard operating procedure for microarray researchers is to use preprocessed data as the starting point for the statistical analyses that produce reported results. This has prevented many researchers from carefully considering their choice of preprocessing methodology. Furthermore, the fact that the preprocessing step affects the stochastic properties of the final statistical summaries is often ignored. In this paper we propose a statistical framework that permits the integration of preprocessing into the standard statistical analysis flow of microarray data. This general framework is relevant in many microarray platforms and motivates targeted analysis methods for specific applications. We demonstrate its usefulness by applying the idea in three different applications of the technology.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS116 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Data Preprocessing and Homogeneity: The Influence on Robustness and Modeling by PLS Via NIR of Fish Burgers

    Get PDF
    Fish burgers as new products require their shelf life investigated. Sensory results usually do not follow a homogeneous profile, as it measures human perception. Once the sensory and physicochemical monitoring of the shelf life takes time and considerable investment, the Near Infrared spectroscopy comes as a fast instrumental technique, which can access multiple parameters from the sample at the same time. In order to replace traditional methods improving mathematical modeling, the objective of this study is the estimation of the data preprocessing and homogeneity (Kolmogorov–Smirnov) influence in the quality parameters of Partial Least Squares modeling. Calibration and validation models were evaluated by means of correlation coefficient, Rank, robustness and Residual Prediction Deviation. All the preprocessing available on the software Opus Lab® were tested and compared. 72 readings/8 samples of refrigerated grass carp burgers originated the data regarding its water activity, rancid taste, pH and reactive substances of thiobarbituric acid results. The preprocessing methods accessible were Standard Normal Variate, Multiplicative Scatter Correction, 2nd derivative, 1st derivative, Straight Line Subtraction and Min/Max. Each chosen preprocessing generated a model with different parameters. The homogeneity of data proved to have a direct influence on the robustness, confirming the challenge to fit sensory results in Partial Least Squares prediction models. New possibilities to investigate meat products were shown. DOI: http://dx.doi.org/10.17807/orbital.v11i6.123

    EXPLORATION, NORMALIZATION, AND GENOTYPE CALLS OF HIGH DENSITY OLIGONUCLEOTIDE SNP ARRAY DATA

    Get PDF
    In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular we describe methodology useful for preprocessing Affymetrix SNP chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from three relatively large studies including one in which large number independent calls are available. Software implementing these ideas are avialble from the Bioconductor oligo package

    2D Elastic Full-Waveform Tomography of Vibro-Seismic Data in Crystalline Host Rock at the GFZ-Underground-Lab, Freiberg

    Get PDF
    The main objective of this work is the application of 2D elastic full-waveform inversion (FWI) to seismic data which was recorded in the GFZ-Underground Laboratory, Freiberg. The thesis investigates the influence of different preprocessing parameters for the field data, the resolution potential and strategies for a better convergence of the FWI in synthetic studies. Afterwards the work is focused on the application of the FWI to the field data
    • …
    corecore