621 research outputs found

    Clustering exact matches of pairwise sequence alignments by weighted linear regression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless.</p> <p>Results</p> <p>We have developed an algorithm that uses the coordinates of all the exact matches or high similarity local alignments, clusters them with respect to the main diagonal in the dot plot using a weighted linear regression technique, and identifies the starting and ending coordinates of the region of interest.</p> <p>Conclusion</p> <p>This algorithm complements existing pairwise sequence alignment packages by replacing the time-consuming seed extension phase with a weighted linear regression for the alignment seeds. It was experimentally shown that the gain in execution time can be outstanding without compromising the accuracy. This method should be of great utility to sequence assembly and genome comparison projects.</p

    ABAplus: Attack Reversal in Abstract and Structured Argumentation with Preferences

    Get PDF
    We present ABAplus, a system that implements reasoning with the argumentation formalism ABA+. ABA+ is a structured argumentation formalism that extends Assumption-Based Argumentation (ABA) with preferences and accounts for preferences via attack reversal. ABA+ also admits as instance Preference-based Argumentation which accounts for preferences by reversing attacks in abstract argumentation (AA). ABAplus readily implements attack reversal in both AA and ABAstyle structured argumentation. ABAplus affords computation, visualisation and comparison of extensions under five argumentation semantics. It is available both as a stand-alone system and as a web application

    Word correlation matrices for protein sequence analysis and remote homology detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive.</p> <p>Results</p> <p>In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection.</p> <p>Conclusion</p> <p>Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p

    Physicochemical property distributions for accurate and rapid pairwise protein homology detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.</p> <p>Results</p> <p>We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost.</p> <p>Conclusions</p> <p>A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

    A SOM-based Chan–Vese model for unsupervised image segmentation

    Get PDF
    Active Contour Models (ACMs) constitute an efficient energy-based image segmentation framework. They usually deal with the segmentation problem as an optimization problem, formulated in terms of a suitable functional, constructed in such a way that its minimum is achieved in correspondence with a contour that is a close approximation of the actual object boundary. However, for existing ACMs, handling images that contain objects characterized by many different intensities still represents a challenge. In this paper, we propose a novel ACM that combines—in a global and unsupervised way—the advantages of the Self-Organizing Map (SOM) within the level set framework of a state-of-the-art unsupervised global ACM, the Chan–Vese (C–V) model. We term our proposed model SOM-based Chan– Vese (SOMCV) active contourmodel. It works by explicitly integrating the global information coming from the weights (prototypes) of the neurons in a trained SOM to help choosing whether to shrink or expand the current contour during the optimization process, which is performed in an iterative way. The proposed model can handle images that contain objects characterized by complex intensity distributions, and is at the same time robust to the additive noise. Experimental results show the high accuracy of the segmentation results obtained by the SOMCV model on several synthetic and real images, when compared to the Chan–Vese model and other image segmentation models

    An average/deprivation/inequality (ADI) analysis of chronic disease outcomes and risk factors in Argentina

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recognition of the global economic and epidemiological burden of chronic non-communicable diseases has increased in recent years. However, much of the research on this issue remains focused on individual-level risk factors and neglects the underlying social patterning of risk factors and disease outcomes.</p> <p>Methods</p> <p>Secondary analysis of Argentina's 2005 <it>Encuesta Nacional de Factores de Riesgo </it>(National Risk Factor Survey, <it>N </it>= 41,392) using a novel analytical strategy first proposed by the United Nations Development Programme (UNDP), which we here refer to as the Average/Deprivation/Inequality (ADI) framework. The analysis focuses on two risk factors (unhealthy diet and obesity) and one related disease outcome (diabetes), a notable health concern in Latin America. Logistic regression is used to examine the interplay between socioeconomic and demographic factors. The ADI analysis then uses the results from the logistic regression to identify the most deprived, the best-off, and the difference between the two ideal types.</p> <p>Results</p> <p>Overall, 19.9% of the sample reported being in poor/fair health, 35.3% reported not eating any fruits or vegetables in five days of the week preceding the interview, 14.7% had a BMI of 30 or greater, and 8.5% indicated that a health professional had told them that they have diabetes or high blood pressure. However, significant variation is hidden by these summary measures. Educational attainment displayed the strongest explanatory power throughout the models, followed by household income, with both factors highlighting the social patterning of risk factors and disease outcomes. As educational attainment and household income increase, the probability of poor health, unhealthy diet, obesity, and diabetes decrease. The analyses also point toward important provincial effects and reinforce the notion that both compositional factors (i.e., characteristics of individuals) and contextual factors (i.e., characteristics of places) are important in understanding the social patterning of chronic diseases.</p> <p>Conclusion</p> <p>The application of the ADI framework enables identification of the regions or groups worst-off for each outcome measure under study. This can be used to highlight the variation embedded within national averages; as such, it encourages a social perspective on population health indicators that is particularly attuned to issues of inequity. The ADI framework is an important tool in the evaluation of policies aiming to prevent or control chronic non-communicable diseases.</p

    Measurement of the Bottom-Strange Meson Mixing Phase in the Full CDF Data Set

    Get PDF
    We report a measurement of the bottom-strange meson mixing phase \beta_s using the time evolution of B0_s -> J/\psi (->\mu+\mu-) \phi (-> K+ K-) decays in which the quark-flavor content of the bottom-strange meson is identified at production. This measurement uses the full data set of proton-antiproton collisions at sqrt(s)= 1.96 TeV collected by the Collider Detector experiment at the Fermilab Tevatron, corresponding to 9.6 fb-1 of integrated luminosity. We report confidence regions in the two-dimensional space of \beta_s and the B0_s decay-width difference \Delta\Gamma_s, and measure \beta_s in [-\pi/2, -1.51] U [-0.06, 0.30] U [1.26, \pi/2] at the 68% confidence level, in agreement with the standard model expectation. Assuming the standard model value of \beta_s, we also determine \Delta\Gamma_s = 0.068 +- 0.026 (stat) +- 0.009 (syst) ps-1 and the mean B0_s lifetime, \tau_s = 1.528 +- 0.019 (stat) +- 0.009 (syst) ps, which are consistent and competitive with determinations by other experiments.Comment: 8 pages, 2 figures, Phys. Rev. Lett 109, 171802 (2012

    Erosion characteristics and floc strenght of Athabasca river cohesive sediments: towards managing sediment-related issues

    Get PDF
    Purpose: Most of Canada’s tar sands exploitations are located in the Athabasca river basin. Deposited cohesive sediments in Athabasca river and tributaries are a potential source of PAHs in the basin. Erosional behavior of cohesive sediments depends not only of fluid turbulence but on sediments structure and particularly the influence of organic content. This research tries to describe this behavior in Athabasca river sediments. Methods: An experimental study of cohesive sediments dynamics in one of the tributaries, the Muskeg river, was developed in a rotating annular flume. Variation of the shear stress allowed the determination of erosional strength for beds with different consolidation periods. Particle size measurements were made with a laser diffraction device operated in a continuous flow through mode. Optical analyses of flocs (ESEM and TEM) were performed with samples taken at the end of the experiments. Results: An inverse relationship between suspended sediment concentration (SS) and the consolidation period was found. The differences are related in this research to the increasing organic content of the sediments with consolidation period. The particle size measurements during the experiments showed differences on floc strength that are also related to changing organic content during different consolidation periods. ESEM and TEM observations confirm the structural differences for beds with different consolidation periods. The effects of SFGL on floc structure and in biostabilization of the bed are discussed. Conclusions: It is recommended in this paper that consolidation period should be taken into account for the modeling of erosion of cohesive sediments in the Athabasca river. Relating to transport models of pollutants (PAHs) it is highly recommended to consider flocs organic content, particularly algae, in the resuspension module.Environment Canada, CONACY

    Jet energy measurement with the ATLAS detector in proton-proton collisions at root s=7 TeV

    Get PDF
    The jet energy scale and its systematic uncertainty are determined for jets measured with the ATLAS detector at the LHC in proton-proton collision data at a centre-of-mass energy of √s = 7TeV corresponding to an integrated luminosity of 38 pb-1. Jets are reconstructed with the anti-kt algorithm with distance parameters R=0. 4 or R=0. 6. Jet energy and angle corrections are determined from Monte Carlo simulations to calibrate jets with transverse momenta pT≄20 GeV and pseudorapidities {pipe}η{pipe}<4. 5. The jet energy systematic uncertainty is estimated using the single isolated hadron response measured in situ and in test-beams, exploiting the transverse momentum balance between central and forward jets in events with dijet topologies and studying systematic variations in Monte Carlo simulations. The jet energy uncertainty is less than 2. 5 % in the central calorimeter region ({pipe}η{pipe}<0. 8) for jets with 60≀pT<800 GeV, and is maximally 14 % for pT<30 GeV in the most forward region 3. 2≀{pipe}η{pipe}<4. 5. The jet energy is validated for jet transverse momenta up to 1 TeV to the level of a few percent using several in situ techniques by comparing a well-known reference such as the recoiling photon pT, the sum of the transverse momenta of tracks associated to the jet, or a system of low-pT jets recoiling against a high-pT jet. More sophisticated jet calibration schemes are presented based on calorimeter cell energy density weighting or hadronic properties of jets, aiming for an improved jet energy resolution and a reduced flavour dependence of the jet response. The systematic uncertainty of the jet energy determined from a combination of in situ techniques is consistent with the one derived from single hadron response measurements over a wide kinematic range. The nominal corrections and uncertainties are derived for isolated jets in an inclusive sample of high-pT jets. Special cases such as event topologies with close-by jets, or selections of samples with an enhanced content of jets originating from light quarks, heavy quarks or gluons are also discussed and the corresponding uncertainties are determined. © 2013 CERN for the benefit of the ATLAS collaboration

    Search for R-parity-violating supersymmetry in events with four or more leptons in sqrt(s) =7 TeV pp collisions with the ATLAS detector

    Get PDF
    A search for new phenomena in final states with four or more leptons (electrons or muons) is presented. The analysis is based on 4.7 fb−1 of s=7  TeV \sqrt{s}=7\;\mathrm{TeV} proton-proton collisions delivered by the Large Hadron Collider and recorded with the ATLAS detector. Observations are consistent with Standard Model expectations in two signal regions: one that requires moderate values of missing transverse momentum and another that requires large effective mass. The results are interpreted in a simplified model of R-parity-violating supersymmetry in which a 95% CL exclusion region is set for charged wino masses up to 540 GeV. In an R-parity-violating MSUGRA/CMSSM model, values of m 1/2 up to 820 GeV are excluded for 10 < tan ÎČ < 40
    • 

    corecore