Search CORE

162,640 research outputs found

Process-oriented Iterative Multiple Alignment for Medical Process Mining

Author: Burd Randall S.
Chen Shuhong
Marsic Ivan
Yang Sen
Zhou Moliang
Publication venue
Publication date: 15/09/2017
Field of study

Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(N2L2) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL2) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.Comment: accepted at ICDMW 201

arXiv.org e-Print Archive

Crossref

George Washington University: Health Sciences Research Commons (HSRC)

Automated Protein Structure Classification: A Survey

Author: Hassanzadeh Oktie
Publication venue
Publication date: 01/01/2008
Field of study

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront

arXiv.org e-Print Archive

CiteSeerX

Transposing from the laboratory to the classroom to generate authentic research experiences for undergraduates.

Author: Burnette James M
Wessler Susan R
Publication venue: eScholarship, University of California
Publication date: 01/02/2013
Field of study

Large lecture classes and standardized laboratory exercises are characteristic of introductory biology courses. Previous research has found that these courses do not adequately convey the process of scientific research and the excitement of discovery. Here we propose a model that provides beginning biology students with an inquiry-based, active learning laboratory experience. The Dynamic Genome course replicates a modern research laboratory focused on eukaryotic transposable elements where beginning undergraduates learn key genetics concepts, experimental design, and molecular biological skills. Here we report on two key features of the course, a didactic module and the capstone original research project. The module is a modified version of a published experiment where students experience how virtual transposable elements from rice (Oryza sativa) are assayed for function in transgenic Arabidopsis thaliana. As part of the module, students analyze the phenotypes and genotypes of transgenic plants to determine the requirements for transposition. After mastering the skills and concepts, students participate in an authentic research project where they use computational analysis and PCR to detect transposable element insertion site polymorphism in a panel of diverse maize strains. As a consequence of their engagement in this course, students report large gains in their ability to understand the nature of research and demonstrate that they can apply that knowledge to independent research projects

PubMed Central

eScholarship - University of California

Data-driven network alignment

Author: Gu Shawn
Milenkovic Tijana
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Biological network alignment (NA) aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for transfer of functional knowledge between the aligned nodes. However, current NA methods do not end up aligning functionally related nodes. A likely reason is that they assume it is topologically similar nodes that are functionally related. However, we show that this assumption does not hold well. So, a paradigm shift is needed with how the NA problem is approached. We redefine NA as a data-driven framework, TARA (daTA-dRiven network Alignment), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity, like traditional NA methods do. TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns. We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. Clearly, TARA as currently implemented uses topological but not protein sequence information for this task. We find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance

arXiv.org e-Print Archive

Directory of Open Access Journals