Search CORE

4,088 research outputs found

Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

Author: Cai Richard
Chirn Gung-Wei
Ma Qicheng
Nirmala NR
Szustakowski Joseph D
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. RESULTS: Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. CONCLUSION: Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes

Springer - Publisher Connector

PubMed Central

The Novartis Repository

Heap Reference Analysis Using Access Graphs

Author: Amey Karkare
Amitabha Sanyal
Gadbois D.
Karkare A.
Khedker U. P.
Reid A.
Shaham R.
Uday P. Khedker
Vallée-Rai R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2007
Field of study

Despite significant progress in the theory and practice of program analysis, analysing properties of heap data has not reached the same level of maturity as the analysis of static and stack data. The spatial and temporal structure of stack and static data is well understood while that of heap data seems arbitrary and is unbounded. We devise bounded representations which summarize properties of the heap data. This summarization is based on the structure of the program which manipulates the heap. The resulting summary representations are certain kinds of graphs called access graphs. The boundedness of these representations and the monotonicity of the operations to manipulate them make it possible to compute them through data flow analysis. An important application which benefits from heap reference analysis is garbage collection, where currently liveness is conservatively approximated by reachability from program variables. As a consequence, current garbage collectors leave a lot of garbage uncollected, a fact which has been confirmed by several empirical studies. We propose the first ever end-to-end static analysis to distinguish live objects from reachable objects. We use this information to make dead objects unreachable by modifying the program. This application is interesting because it requires discovering data flow information representing complex semantics. In particular, we discover four properties of heap data: liveness, aliasing, availability, and anticipability. Together, they cover all combinations of directions of analysis (i.e. forward and backward) and confluence of information (i.e. union and intersection). Our analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates referees' comment

arXiv.org e-Print Archive

Crossref

Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences

Author: Do C.B.
Katoh
Sing-Hoi Sze
Yue Lu
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Recommended from our members

Transitivity for height versus speed: To what extent do the under-7s really have a transitive capacity?

Author: Hadfield L
Robertson S
Wright BC
Publication venue: 'Informa UK Limited'
Publication date: 01/02/2011
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2011 Psychology Press.Transitive inference underpins many human reasoning competencies. The dominant task (the “extensive training paradigm”) employs many items and large amounts of training, instilling an ordered series in the reasoner's mind. But findings from an alternative “three-term paradigm” suggest transitivity is not present until 7 + years. Interestingly, a second alternative paradigm (the “spatial task”), using simultaneously displayed height relationships to form premise pairs, can uphold the 4-year estimate. However, this paradigm risks cueing children and hence is problematic. We investigated whether a height-task variant might correspond to a more ecologically valid three-term task. A total of 222 4–6-year-olds either completed a modified height task, including an increased familiarisation phase, or a computer-animated task about cartoon characters running a race in pairs. Findings confirmed that both tasks were functionally identical. Crucially, 4-year-olds were at chance on both, whereas 6-year-olds performed competently. These findings contrast with estimates from all three paradigms considered. A theoretical evaluation of our tasks and procedures against previous ones, leads us to two conclusions. First, our estimate slightly amends the 7-year estimate offered by the three-term paradigm, with the difference explained in terms of its greater relevance to child experiences. Second, our estimate can coexist alongside the 4-year estimate from the extensive training paradigm. This is because, applying a recently developed “dual-process” conception of reasoning, anticipates that extensive training benefits a species-general associative system, while the spatial paradigm and three-term paradigm can potentially index a genuinely deductive system, which has always been the target of transitive research

Brunel University Research Archive

Improving the Consistency of the Failure Mode Effect Analysis (FMEA) Documents in Semiconductor Manufacturing

Author: Kern Roman
Razouk Houssam
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Digitalization of causal domain knowledge is crucial. Especially since the inclusion of causal domain knowledge in the data analysis processes helps to avoid biased results. To extract such knowledge, the Failure Mode Effect Analysis (FMEA) documents represent a valuable data source. Originally, FMEA documents were designed to be exclusively produced and interpreted by human domain experts. As a consequence, these documents often suffer from data consistency issues. This paper argues that due to the transitive perception of the causal relations, discordant and merged information cases are likely to occur. Thus, we propose to improve the consistency of FMEA documents as a step towards more efficient use of causal domain knowledge. In contrast to other work, this paper focuses on the consistency of causal relations expressed in the FMEA documents. To this end, based on an explicit scheme of types of inconsistencies derived from the causal perspective, novel methods to enhance the data quality in FMEA documents are presented. Data quality improvement will significantly improve downstream tasks, such as root cause analysis and automatic process control

ZENODO

TUGraz OPEN Library

Functional Classification of Immune Regulatory Proteins

Author: Almo Steven C.
Fiser Andras
Nathenson Stanley G.
Ramagopal Udupi A.
Rubinstein Rotem
Publication venue: Elsevier Ltd.
Publication date: 07/05/2013
Field of study

SummaryThe members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. Guided by the Brotherhood approach, we present the high-resolution structural characterization of a homophilic interaction involving the class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative

Elsevier - Publisher Connector

PubMed Central

Leveraging Intermediate Artifacts to Improve Automated Trace Link Retrieval

Author: Rodriguez Alberto D
Rodriguez Alberto D
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2021
Field of study

Software traceability establishes a network of connections between diverse artifacts such as requirements, design, and code. However, given the cost and effort of creating and maintaining trace links manually, researchers have proposed automated approaches using information retrieval techniques. Current approaches focus almost entirely upon generating links between pairs of artifacts and have not leveraged the broader network of interconnected artifacts. In this paper we investigate the use of intermediate artifacts to enhance the accuracy of the generated trace links – focus- ing on paths consisting of source, target, and intermediate artifacts. We propose and evaluate combinations of techniques for computing semantic similarity, scaling scores across multiple paths, and aggregating results from multiple paths. We report results from five projects, including one large industrial project. We find that leverag- ing intermediate artifacts improves the accuracy of end-to-end trace retrieval across all datasets and accuracy metrics. After further analysis, we discover that leveraging intermediate artifacts is only helpful when a project’s artifacts share a common vocabulary, which tends to occur in refinement and decomposition hierarchies of artifacts. Given our hybrid approach that integrates both direct and transitive links, we observed little to no loss of accuracy when intermediate artifacts lacked a shared vocabulary with source or target artifacts

DigitalCommons@CalPoly