Search CORE

18 research outputs found

A consensus‑based ensemble approach to improve transcriptome assembly

Author: Behera Sairam
Cahoon Edgar B.
Deogun Jitender S.
Kapil Kushagra
Li Xiangjun
Moriyama Etsuko N.
Shanklin John
Voshall Adam
Yu Xiao‑Hong
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2021
Field of study

Background: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. Results: In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. Conclusions: Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genomeguided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http:// bioin folab. unl. edu/ emlab/ conse mble/

DigitalCommons@University of Nebraska

Directory of Open Access Journals

M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

Author: A Darling
A Delcher
AE Darling
B Morgenstern
B Raphael
C Grasso
C Notredame
C Notredame
D Ferre
DA Nix
EP Rocha
I Ovcharenko
J Choudhuri
J Deogun
JD Thompson
K Katoh
K Liolos
K Rutherford
L Florea
L Wang
M Blanchette
M Brudno
M Brudno
M Brudno
M Hohl
M Margulies
M Waterman
N Bray
N Bray
NT Perna
P Chain
RL Tatusov
S Batzoglou
S Batzoglou
S Schwartz
T Carver
Todd J Treangen
W Huang
Xavier Messeguer
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. RESULTS: To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes) in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. CONCLUSION: M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at:

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Self-protection against business logic vulnerabilities

Author: Deogun D
Khakpour N
Weyns D
Zeller S
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date
Field of study

Newcastle University E-Prints

Cluster-Based Adaptive Information Retrieval

Author: Jay N. Bhuyan
Jitender S. Deogun
Vijay V. Raghavan
Publication venue
Publication date
Field of study

This paper discusses the issues involved in the design of a complete information retrieval system based on useroriented clustering schemes. Clusters are constructed taking into account the users' perception of similarity between documents. The system accumulates feedback from the users and employs it to construct useroriented clusters. An optimization function to improve the effectiveness of the clustering process is developed. A retrieval process based on the clustering scheme is described. The system developed is experimentally validated and compared with existing systems. 1 Introduction An information retrieval (ir) system is characterized by a collection of documents and a set of users who perform queries on the collection to fulfill their information needs. To improve the efficiency of retrieval, it has been proposed that the documents which are generally retrieved together in response to some query, should be kept close together within the system in the form of clusters [28, 30]..

CiteSeerX

How do Companies Strategize Today?

Author: B Worthen
CK Prahalad
Lenovo
M El Namaki
McKinsey & Company
N Deogun
The Carlyle Group
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Minimum Eccentricity Shortest Path Problem: An Approximation Algorithm and Relation with the k-Laminarity Problem

Author: C Lekkerkerker
D Aingworth
DG Corneil
DG Corneil
DG Corneil
DG Corneil
FF Dragan
G Bacsó
JS Deogun
JS Deogun
K Yamazaki
N Robertson
S Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Feature selection with adjustable criteria

Author: G.H. John
G.V. Trunk
J.G. Dy
J.S. Deogun
M. Dash
N. Zhong
P.M. Narendra
R. Bellman
W. Ziarko
Y.Y. Yao
Publication venue: Springer-Verlag
Publication date: 01/01/2005
Field of study

Abstract. We present a study on a rough set based approach for feature selection. Instead of using significance or support, Parameterized Average Support Heuristic (PASH) considers the overall quality of the potential set of rules. It will produce a set of rules with balanced support distribution over all decision classes. Adjustable parameters of PASH can help users with different levels of approximation needs to extract predictive rules that may be ignored by other methods. This paper finetunes the PASH heuristic and provides experimental results to PASH.

CiteSeerX

Crossref

A Feature Selection Algorithm Based on Discernibility Matrix

Author: A. Skowron
F. Provost
H. Liu
H. Liu
J. Deogun
K. Hu
K. Thangavel
N. Zhong
N. Zhong
P. Langley
Q. Shen
Q. Shen
R. Jensen
Z. Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Crossref

A Prospective Look at the Usefulness of Separately Reporting Goodwill Charges: An Evaluation of 'Cash Earnings'

Author: ______________
A Bary
E Fama
E Macdonald
F Mishkin
Karen J. Tucker
L Johnson
N Deogun
N Jegadeesh
P Elgers
P Hopkins
Pricewaterhousecoopers
R Sloan
Ray J. Pfeiffer Jr.
S Basu
S P Kothari
V Bernard
William D. Brown Jr.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

Crossref

On-line Algorithms for a Single Machine Scheduling Problem

Author: AAB Pritsker
AR Karlin
C. N. Potts
D Gross
F. Glover
JS Deogun
L Hall
L. Schräge
M. E. Posner
MI Dessouky
PG Gazmuri
R Kincaid
RL Graham
RM Karp
S Chand
TE Phipps
Publication venue: Kluwer Academic Press, Chapter
Publication date: 01/01/1995
Field of study

An increasingly significant branch of computer science is the study of online algorithms. In this paper, we apply the theory of on-line algorithms to job scheduling. In particular, we study the nonpreemptive single machine scheduling of independent jobs with arbitrary release dates to minimize the total completion time. We design and analyze two on-line algorithms which make scheduling decisions without knowing about jobs that will arrive in future. Keywords: job scheduling, on-line algorithm, c-competitiveness 1 Introduction Given a sequence of requests, an on-line algorithm is one that responds to each request in the order it appears in the sequence without the knowledge of any request following it in the sequence. For instance, in the bin packing problem, a list L = (a 1 ; a 2 ; : : : ; a n ) of reals in (0; 1] needs to be packed into the minimum number of unit-capacity bins. An on-line bin packing algorithm packs a i , where i starts from 1, without knowing about a i+1 ; : : : ; ..

CiteSeerX

Crossref