Search CORE

70,299 research outputs found

Normalization of oligonucleotide arrays based on the least-variant set of genes

Author: Calza Stefano
Pawitan Yudi
Valentini Davide
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background It is well known that the normalization step of microarray data makes a difference in the downstream analysis. All normalization methods rely on certain assumptions, so differences in results can be traced to different sensitivities to violation of the assumptions. Illustrating the lack of robustness, in a striking spike-in experiment all existing normalization methods fail because of an imbalance between up- and down-regulated genes. This means it is still important to develop a normalization method that is robust against violation of the standard assumptions Results We develop a new algorithm based on identification of the least-variant set (LVS) of genes across the arrays. The array-to-array variation is evaluated in the robust linear model fit of pre-normalized probe-level data. The genes are then used as a reference set for a non-linear normalization. The method is applicable to any existing expression summaries, such as MAS5 or RMA. Conclusion We show that LVS normalization outperforms other normalization methods when the standard assumptions are not satisfied. In the complex spike-in study, LVS performs similarly to the ideal (in practice unknown) housekeeping-gene normalization. An R package called lvs is available in <url>http://www.meb.ki.se/~yudpaw</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Understanding the Role of Layer Normalization in Label-Skewed Federated Learning

Author: Beitollahi Mahdi
Bie Alex
Chen Xi
Zhang Guojun
Publication venue
Publication date: 14/02/2024
Field of study

Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models. Recently, LN has been shown to be surprisingly effective in federated learning (FL) with non-i.i.d. data. However, exactly why and how it works remains mysterious. In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learning. To understand layer normalization better in FL, we identify the key contributing mechanism of normalization methods in FL, called feature normalization (FN), which applies normalization to the latent feature representation before the classifier head. Although LN and FN do not improve expressive power, they control feature collapse and local overfitting to heavily skewed datasets, and thus accelerates global training. Empirically, we show that normalization leads to drastic improvements on standard benchmarks under extreme label shift. Moreover, we conduct extensive ablation studies to understand the critical factors of layer normalization in FL. Our results verify that FN is an essential ingredient inside LN to significantly improve the convergence of FL while remaining robust to learning rate choices, especially under extreme label shift where each client has access to few classes. Our code is available at \url{https://github.com/huawei-noah/Federated-Learning/tree/main/Layer_Normalization}.Comment: accepted at TML

arXiv.org e-Print Archive

Use of genomic DNA control features and predicted operon structure in microarray data analysis: ArrayLeaRNA – a Bayesian approach

Author: A Dagkessamanskaia
C Lanczos
C Pin
Carmen Pin
EJ Alm
GK Smyth
I Lonnstedt
JL DeRisi
JP Townsend
JP Townsend
K Holmes
M Abramowitz
MA Newton
Mark Reuter
MF Anjum
MK Kerr
ML Mohedano
MN Price
MN Price
P Baldi
P Luu
PW Mielke
R Gottardo
RD Wolfinger
RJ Fox
S Eriksson
W Cleveland
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Microarrays are widely used for the study of gene expression; however deciding on whether observed differences in expression are significant remains a challenge. Results A computing tool (ArrayLeaRNA) has been developed for gene expression analysis. It implements a Bayesian approach which is based on the Gumbel distribution and uses printed genomic DNA control features for normalization and for estimation of the parameters of the Bayesian model and prior knowledge from predicted operon structure. The method is compared with two other approaches: the classical LOWESS normalization followed by a two fold cut-off criterion and the OpWise method (Price, et al. 2006. BMC Bioinformatics. 7, 19), a published Bayesian approach also using predicted operon structure. The three methods were compared on experimental datasets with prior knowledge of gene expression. With ArrayLeaRNA, data normalization is carried out according to the genomic features which reflect the results of equally transcribed genes; also the statistical significance of the difference in expression is based on the variability of the equally transcribed genes. The operon information helps the classification of genes with low confidence measurements. ArrayLeaRNA is implemented in Visual Basic and freely available as an Excel add-in at <url>http://www.ifr.ac.uk/safety/ArrayLeaRNA/</url> Conclusion We have introduced a novel Bayesian model and demonstrated that it is a robust method for analysing microarray expression profiles. ArrayLeaRNA showed a considerable improvement in data normalization, in the estimation of the experimental variability intrinsic to each hybridization and in the establishment of a clear boundary between non-changing and differentially expressed genes. The method is applicable to data derived from hybridizations of labelled cDNA samples as well as from hybridizations of labelled cDNA with genomic DNA and can be used for the analysis of datasets where differentially regulated genes predominate.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ProNormz – An integrated approach for human proteins and protein kinases normalization

Author: Natarajan Jeyakumar
Raja Kalpana
Subramani Suresh
Publication venue: Elsevier Inc.
Publication date: 28/02/2014
Field of study

AbstractThe task of recognizing and normalizing protein name mentions in biomedical literature is a challenging task and important for text mining applications such as protein–protein interactions, pathway reconstruction and many more. In this paper, we present ProNormz, an integrated approach for human proteins (HPs) tagging and normalization. In Homo sapiens, a greater number of biological processes are regulated by a large human gene family called protein kinases by post translational phosphorylation. Recognition and normalization of human protein kinases (HPKs) is considered to be important for the extraction of the underlying information on its regulatory mechanism from biomedical literature. ProNormz distinguishes HPKs from other HPs besides tagging and normalization. To our knowledge, ProNormz is the first normalization system available to distinguish HPKs from other HPs in addition to gene normalization task. ProNormz incorporates a specialized synonyms dictionary for human proteins and protein kinases, a set of 15 string matching rules and a disambiguation module to achieve the normalization. Experimental results on benchmark BioCreative II training and test datasets show that our integrated approach achieve a fairly good performance and outperforms more sophisticated semantic similarity and disambiguation systems presented in BioCreative II GN task. As a freely available web tool, ProNormz is useful to developers as extensible gene normalization implementation, to researchers as a standard for comparing their innovative techniques, and to biologists for normalization and categorization of HPs and HPKs mentions in biomedical literature. URL: http://www.biominingbu.org/pronormz

Elsevier - Publisher Connector

An Analysis of Scale Invariance in Object Detection - SNIP

Author: Davis Larry S.
Singh Bharat
Publication venue
Publication date: 25/05/2018
Field of study

An analysis of different techniques for recognizing and detecting objects under extreme scale variation is presented. Scale specific and scale invariant design of detectors are compared by training them with different configurations of input data. By evaluating the performance of different network architectures for classifying small objects on ImageNet, we show that CNNs are not robust to changes in scale. Based on this analysis, we propose to train and test detectors on the same scales of an image-pyramid. Since small and large objects are difficult to recognize at smaller and larger scales respectively, we present a novel training scheme called Scale Normalization for Image Pyramids (SNIP) which selectively back-propagates the gradients of object instances of different sizes as a function of the image scale. On the COCO dataset, our single model performance is 45.7% and an ensemble of 3 networks obtains an mAP of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train with bounding box supervision. Our submission won the Best Student Entry in the COCO 2017 challenge. Code will be made available at \url{http://bit.ly/2yXVg4c}.Comment: CVPR 2018, camera ready versio

arXiv.org e-Print Archive

Crossref

Empirical comparison of cross-platform normalization methods for gene expression data

Author: A Platts
A Ramasamy
A Shabalin
Applied Biosystems
B Bolstad
C Metz
C Yauk
D Hekstra
D Petersen
D Rhodes
E Glaab
F Hong
Faramarz Valafar
G Elvidge
G Hardiman
G Held
G Smyth
H Jiang
H Parkinson
H Yasrebi
I Borozan
I Wick
J Larkin
J Storey
J Storey
J Tuszynski
Jason Rudy
JM Chambers
K Kugler
K Noguchi
L Gautier
L Shi
L Shi
LIK Lin
M Barnes
M Benito
M Mulligan
M Schena
P Tan
P Warnat
P Wirapati
R Development Core Team
R Gentleman
R Grützmann
R Kothapalli
R Lacson
R Martinez
S Assou
S Carter
S Davis
S Rogic
T Barrett
VE Velculescu
W Kuo
W Kuo
W Walker
X Chi
XQ Xia
Y Woo
Z Hu
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow. Results Using two publicly available cross-platform testing data sets, cross-platform normalization methods are compared based on inter-platform concordance and on the consistency of gene lists obtained with transformed data. Scatter and ROC-like plots are produced and new statistics based on those plots are introduced to measure the effectiveness of each method. Bootstrapping is employed to obtain distributions for those statistics. The consistency of platform effects across studies is explored theoretically and with respect to the testing data sets. Conclusions Our comparisons indicate that four methods, DWD, EB, GQ, and XPN, are generally effective, while the remaining methods do not adequately correct for platform effects. Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection. We provide an R package, CONOR, capable of performing the nine cross-platform normalization methods considered. The package can be downloaded at <url>http://alborz.sdsu.edu/conor</url> and is available from CRAN.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Author: Dang Huu-Tien
Phan Xuan-Hieu
Vuong Thi-Hai-Yen
Publication venue
Publication date: 07/09/2022
Field of study

Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNN-CRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.Comment: The 14th International Conference on Knowledge and Systems Engineering (KSE 2022

arXiv.org e-Print Archive