3,746 research outputs found

    Some studies on protein structure alignment algorithms

    Get PDF
    The alignment of two protein structures is a fundamental problem in structural bioinformatics.Their structural similarity carries with it the connotation of similar functional behavior that couldbe exploited in various applications. A plethora of algorithms, including one by us, is a testamentto the importance of the problem. In this thesis, we propose a novel approach to measure theeectiveness of a sample of four such algorithms, DALI, TM-align, CE and EDAlignsse, for de-tecting structural similarities among proteins. The underlying premise is that structural proximityshould translate into spatial proximity. To verify this, we carried out extensive experiments withve dierent datasets, each consisting of proteins from two to six dierent families.In further addition to our work, we have focused on the area of computational methods foraligning multiple protein structures. This problem is known for its np-complete nature. Therefore,there are many ways to come up with a solution which can be better than the existing ones or atleast as good as them. Such a solution is presented here in this thesis. We have used a heuristicalgorithm which is the Progressive Multiple Alignment approach, to have the multiple sequencealignment. We used the root mean square deviation (RMSD) as a measure of alignment quality andreported this measure for a large and varied number of alignments. We also compared the executiontimes of our algorithm with the well-known algorithm MUSTANG for all the tested alignments

    Grassmann Learning for Recognition and Classification

    Get PDF
    Computational performance associated with high-dimensional data is a common challenge for real-world classification and recognition systems. Subspace learning has received considerable attention as a means of finding an efficient low-dimensional representation that leads to better classification and efficient processing. A Grassmann manifold is a space that promotes smooth surfaces, where points represent subspaces and the relationship between points is defined by a mapping of an orthogonal matrix. Grassmann learning involves embedding high dimensional subspaces and kernelizing the embedding onto a projection space where distance computations can be effectively performed. In this dissertation, Grassmann learning and its benefits towards action classification and face recognition in terms of accuracy and performance are investigated and evaluated. Grassmannian Sparse Representation (GSR) and Grassmannian Spectral Regression (GRASP) are proposed as Grassmann inspired subspace learning algorithms. GSR is a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss §¤1-norm minimization for improved classification. GRASP is a novel subspace learning algorithm that leverages the benefits of Grassmann manifolds and Spectral Regression in a framework that supports high discrimination between classes and achieves computational benefits by using manifold modeling and avoiding eigen-decomposition. The effectiveness of GSR and GRASP is demonstrated for computationally intensive classification problems: (a) multi-view action classification using the IXMAS Multi-View dataset, the i3DPost Multi-View dataset, and the WVU Multi-View dataset, (b) 3D action classification using the MSRAction3D dataset and MSRGesture3D dataset, and (c) face recognition using the ATT Face Database, Labeled Faces in the Wild (LFW), and the Extended Yale Face Database B (YALE). Additional contributions include the definition of Motion History Surfaces (MHS) and Motion Depth Surfaces (MDS) as descriptors suitable for activity representations in video sequences and 3D depth sequences. An in-depth analysis of Grassmann metrics is applied on high dimensional data with different levels of noise and data distributions which reveals that standardized Grassmann kernels are favorable over geodesic metrics on a Grassmann manifold. Finally, an extensive performance analysis is made that supports Grassmann subspace learning as an effective approach for classification and recognition

    Article Segmentation in Digitised Newspapers

    Get PDF
    Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities

    Computational Approaches to Simulation and Analysis of Large Conformational Transitions in Proteins

    Get PDF
    abstract: In a typical living cell, millions to billions of proteins—nanomachines that fluctuate and cycle among many conformational states—convert available free energy into mechanochemical work. A fundamental goal of biophysics is to ascertain how 3D protein structures encode specific functions, such as catalyzing chemical reactions or transporting nutrients into a cell. Protein dynamics span femtosecond timescales (i.e., covalent bond oscillations) to large conformational transition timescales in, and beyond, the millisecond regime (e.g., glucose transport across a phospholipid bilayer). Actual transition events are fast but rare, occurring orders of magnitude faster than typical metastable equilibrium waiting times. Equilibrium molecular dynamics (EqMD) can capture atomistic detail and solute-solvent interactions, but even microseconds of sampling attainable nowadays still falls orders of magnitude short of transition timescales, especially for large systems, rendering observations of such "rare events" difficult or effectively impossible. Advanced path-sampling methods exploit reduced physical models or biasing to produce plausible transitions while balancing accuracy and efficiency, but quantifying their accuracy relative to other numerical and experimental data has been challenging. Indeed, new horizons in elucidating protein function necessitate that present methodologies be revised to more seamlessly and quantitatively integrate a spectrum of methods, both numerical and experimental. In this dissertation, experimental and computational methods are put into perspective using the enzyme adenylate kinase (AdK) as an illustrative example. We introduce Path Similarity Analysis (PSA)—an integrative computational framework developed to quantify transition path similarity. PSA not only reliably distinguished AdK transitions by the originating method, but also traced pathway differences between two methods back to charge-charge interactions (neglected by the stereochemical model, but not the all-atom force field) in several conserved salt bridges. Cryo-electron microscopy maps of the transporter Bor1p are directly incorporated into EqMD simulations using MD flexible fitting to produce viable structural models and infer a plausible transport mechanism. Conforming to the theme of integration, a short compendium of an exploratory project—developing a hybrid atomistic-continuum method—is presented, including initial results and a novel fluctuating hydrodynamics model and corresponding numerical code.Dissertation/ThesisDoctoral Dissertation Physics 201

    Engineering Principles Of Photosystems And Their Practical Application As Long-Lived Charge Separation In Maquettes

    Get PDF
    Light-activated electron transfer reactions between cofactors embedded in proteins serve as the central mechanism underlying numerous biological processes essential to the survival and prosperity of most organisms on this planet. These processes range from navigation, to DNA repair, to metabolism, and to solar energy conversion. The proper functioning of these processes relies on the creation of a charge-separated states lasting for a necessary length of time, from tens of nanoseconds to hundreds of milliseconds, by the arrays of cofactors in photosystems. In spite of decades of experiments and theoretical frameworks providing detailed and extensive description of the behavior of the photosystems, coherent and systematic understanding is lacking regarding the underlying structural and chemical engineering principles that govern the performance of charge-separation in photosystems, evaluated by the fraction of the input energy made available by the photosystem for its intended function. This thesis aims to establish a set of engineering principles of natural and man-made photosystems based on the fundamental theories of electron transfer and the biophysical and biochemical constraints imposed by the protein environment, and then to apply these engineering principles to design and construct man-made photosystems that can excel in charge-separation while incurring minimal cost in their construction. Using the fundamental theories of electron transfer, this thesis develops an efficient computational algorithm that returns a set of guidelines for engineering optimal light-driven charge-separation in cofactor-based photosystems. This thesis then examines the validity of these guidelines in natural photosystems, discovering significant editing and updating of these guidelines imposed by the biological environment in which photosystems are engineered by nature. This thesis then organizes the two layers of engineering principles into a concise set of rules and demonstrates that they can be applied as guidelines to the practical construction of highly efficient man-made photosystems. To test these engineering guidelines in practice, the first ever donor-pigment-acceptor triad is constructed in a maquette and successfully separates charges stably for \u3e300ms, establishing the world record in a triad. Finally, this work looks ahead to the engineering of the prescribed optimal tetrads in maquettes, identifying what’s in place and what challenges yet remain

    Improving nonlinear search with Self-Organizing Maps - Application to Magnetic Resonance Relaxometry

    Get PDF
    Quantification of myelin in vivo is crucial for the understanding of neurological diseases, like multiple sclerosis (MS). Multi-Component Driven Equilibrium Single Pulse Observation T1 and T2 (mcDESPOT) is a rapid and precise method for determination of the longitudinal and transverse relaxation times in a voxel wise fashion. Briefly, mcDESPOT couples sets of SPGR (spoiled gradient-recalled echo) and bSSFP (fully balance steady-state free precession) data acquired over a range of flip angles (α) with constant interpulse spacing (TR) to derive 6 parameters (free-water T1 and T2, myelin-associated water T1 and T2, relative myelin-associated water volume fraction, and the myelin-associated water proton residence time) based on water exchange models. However, this procedure is computationally expensive and extremely difficult due to the need to find the best fit to the 24 MRI signals volumes in a search of nonlinear 6 dimensional space of model parameters. In this context, the aim of this work is to improve mcDESPOT efficiency and accuracy using tissue information contained in the sets of signals (SPGR and bSSFP) acquired. The basic hypothesis is that similar acquired signals are referred to tissue portions with close features, which translate in similar parameters. This similarity could be used to drive the nonlinear mcDESPOT fitting, leading the optimization algorithm (that is based on a stochastic region contraction approach) to look for a solution (i.e. the 6 parameters vector) also in regions defined by previously computed solutions of others voxels with similar signals. For this reason, we clustered the sets of SPGR and bSSFP using the neural network called Self Organizing Map (SOM), which uses a competitive learning technique to train itself in an unsupervised manner. The similarity information obtained from the SOM was then used to accordingly suggest solutions to the optimization algorithm. A first validation phase with in silico data was performed to evaluate the performances of the SOM and of the modified method, SOM+mcDESPOT. The latter was further validated using real magnetic resonance images. The last step consisted of applying the SOM+mcDESPOT to a group of healthy subjects ( ) and a group of MS patients ( ) to look for differences in myelin-associated water fractions values between the two groups. The validation phases with in silico data verified the initial hypothesis: in more the 74% of the times, the correct solution of a certain voxel is in the space dictated by the cluster which that voxel is mapped to. Adding the information of similar solutions extracted from that cluster helps to improve the signals fitting and the accuracy in the determination of the 7 parameters. This result is still present even if the data are corrupted by a high level of noise (SNR=50). Using real images allowed to confirm the power of SOM+mcDESPOT underlined through the in silico data. The application of SOM+mcDESPOT to the controls and to the MS patients allowed firstly obtaining more feasible results than the traditional mcDESPOT. Moreover, a statistically significant difference of the myelin-associated water fraction values in the normal appearing white matter was found between the two groups: the MS patients, in fact, show lower fraction values compared to the normal subjects, indicating an abnormal presence of myelin in the normal appearing white matter of MS patients. In conclusion, we proposed the novel method SOM+mcDESPOT that is able to extract and exploit the information contained in the MRI signals to drive appropriately the optimization algorithm implemented in mcDESPOT. In so doing, the overall accuracy of the method in both the signals fitting and in the determination of the 7 parameters improves. Thus, the outstanding potentiality of SOM+mcDESPOT could assume a crucial role in improving the indirect quantification of myelin in both healthy subjects and patient

    Detecting cells and cellular activity from two-photon calcium imaging data

    Get PDF
    To understand how networks of neurons process information, it is essential to monitor their activity in living tissue. Information is transmitted between neurons by electrochemical impulses called action potentials or spikes. Calcium-sensitive fluorescent probes, which emit a characteristic pulse of fluorescence in response to a spike, are used to visualise spiking activity. Combined with two-photon microscopy, they enable the spiking activity of thousands of neurons to be monitored simultaneously at single-cell and single-spike resolution. In this thesis, we develop signal processing tools for detecting cells and cellular activity from two-photon calcium imaging data. Firstly, we present a method to detect the locations of cells within a video. In our framework, an active contour evolves guided by a model-based cost function to identify a cell boundary. We demonstrate that this method, which includes no assumptions about typical cell shape or temporal activity, is able to detect cells with varied properties from real imaging data. Once the location of a cell has been identified, its spiking activity must be inferred from the fluorescence signal. We present a metric that quantifies the similarity between inferred spikes and the ground truth. The proposed metric assesses the similarity of pulse trains obtained from convolution of the spike trains with a smoothing pulse, whose width is derived from the statistics of the data. We demonstrate that the proposed metric is more sensitive than existing metrics to the temporal and rate precision of inferred spike trains. Finally, we extend an existing framework for spike inference to accommodate a wider class of fluorescence signals. Our method, which is based on finite rate of innovation theory, exploits the known parametric structure of the signal to infer the unknown spike times. On in vitro imaging data, we demonstrate that the updated algorithm outperforms a state of the art approach.Open Acces

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    • …
    corecore