Search CORE

220,934 research outputs found

Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons.

Author: Bajgain
Buschiazzo
Garc a-Gil
Goldman
K. Angelis
Kimura
Li
M. dos Reis
Messier
Nei
Nielsen
Novaes
Schenekar
Yang
Yang
Yang
Z. Yang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 18/04/2014
Field of study

The nonsynonymous/synonymous rate ratio (ω = d(N)/d(S)) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the d(N)/d(S) ratio using two sequences. Both heuristic counting methods and the maximum-likelihood (ML) method based on a codon substitution model are widely used for such analysis. However, these methods do not have nice statistical properties, as the estimates can be zero or infinity in some data sets, so that their means and variances are infinite. In large genome-scale comparisons, such extreme estimates (either 0 or ∞) of ω and sequence distance (t) are common. Here, we implement a Bayesian method to estimate ω and t in pairwise sequence comparisons. Using a combination of computer simulation and real data analysis, we show that the Bayesian estimates have better statistical properties than the ML estimates, because the prior on ω and t shrinks the posterior of those parameters away from extreme values. We also calculate the posterior probability for ω > 1 as a Bayesian alternative to the likelihood ratio test. The new method is computationally efficient and may be useful for genome-scale comparisons of protein-coding gene sequences

Crossref

UCL Discovery

PubMed Central

Queen Mary Research Online

A method for detecting and correcting feature misidentification on expression microarrays

Author: Botstein David
Brown Patrick O
Diehn Maximilian
Fero Michael J
Schaner Marci
Sikic Branimir I
Tu I-Ping
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). Nevertheless, as large datasets based on the Stanford Human array began to accumulate, a small but significant number of discrepancies were detected that required a serious attempt to track down the original source of error. Due to a controlled process environment, sufficient data was available to accurately track the entire process leading to up to the final expression data. In this paper, we describe our statistical methods to detect the inconsistencies in microarray data that arise from process errors, and discuss our technique to locate and fix these errors. RESULTS: To date, the Brown and Botstein laboratories and the Stanford Functional Genomics Facility have together produced 40,000 large-scale (10–50,000 feature) cDNA microarrays. By applying the heuristic described here, we have been able to check most of these arrays for misidentified features, and have been able to confidently apply fixes to the data where needed. Out of the 265 million features checked in our database, problems were detected and corrected on 1.3 million of them. CONCLUSION: Process errors in any genome scale high throughput production regime can lead to subsequent errors in data analysis. We show the value of tracking multi-step high throughput operations by using this knowledge to detect and correct misidentified data on gene expression microarrays

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

GraphCombEx: A Software Tool for Exploration of Combinatorial Optimisation Properties of Large Graphs

Author: A Rosete-Suárez
AL Barabási
C Bachmaier
C Binucci
D Brélaz
D Chalupa
D Chalupa
D Chalupa
D Chalupa
D Holten
David Chalupa
DJ Watts
DS Johnson
F Schreiber
G Csardi
I Xenarios
I Xenarios
I Xenarios
J Ellson
J Leskovec
J Leskovec
J Pattillo
JC Culberson
JS Turner
K Sugiyama
Ken A Hawick
L Salwinski
LM Abualigah
LM Abualigah
MEJ Newman
MM Halldórsson
MR Garey
MY Becker
P Bonami
P Csermely
R Albert
R Tamassia
U Brandes
U Brandes
V Chvátal
W Czech
Y Khosiawan
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/01/2018
Field of study

We present a prototype of a software tool for exploration of multiple combinatorial optimisation problems in large real-world and synthetic complex networks. Our tool, called GraphCombEx (an acronym of Graph Combinatorial Explorer), provides a unified framework for scalable computation and presentation of high-quality suboptimal solutions and bounds for a number of widely studied combinatorial optimisation problems. Efficient representation and applicability to large-scale graphs and complex networks are particularly considered in its design. The problems currently supported include maximum clique, graph colouring, maximum independent set, minimum vertex clique covering, minimum dominating set, as well as the longest simple cycle problem. Suboptimal solutions and intervals for optimal objective values are estimated using scalable heuristics. The tool is designed with extensibility in mind, with the view of further problems and both new fast and high-performance heuristics to be added in the future. GraphCombEx has already been successfully used as a support tool in a number of recent research studies using combinatorial optimisation to analyse complex networks, indicating its promise as a research software tool

arXiv.org e-Print Archive

Crossref

VBN

A Faster Method to Estimate Closeness Centrality Ranking

Author: Gera Ralucca
Iyengar S. R. S.
Saxena Akrati
Publication venue
Publication date: 07/06/2017
Field of study

Closeness centrality is one way of measuring how central a node is in the given network. The closeness centrality measure assigns a centrality value to each node based on its accessibility to the whole network. In real life applications, we are mainly interested in ranking nodes based on their centrality values. The classical method to compute the rank of a node first computes the closeness centrality of all nodes and then compares them to get its rank. Its time complexity is

O(n \cdot m + n)

, where

n

represents total number of nodes, and

m

represents total number of edges in the network. In the present work, we propose a heuristic method to fast estimate the closeness rank of a node in

O(\alpha \cdot m)

time complexity, where

\alpha = 3

. We also propose an extended improved method using uniform sampling technique. This method better estimates the rank and it has the time complexity

O(\alpha \cdot m)

, where

\alpha \approx 10-100

. This is an excellent improvement over the classical centrality ranking method. The efficiency of the proposed methods is verified on real world scale-free social networks using absolute and weighted error functions

arXiv.org e-Print Archive

Calhoun, Institutional Archive of the Naval Postgraduate School