635 research outputs found
Identification and correction of systematic error in high-throughput sequence data
A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments
Recommended from our members
Evaluating the Accuracy of Value-at-Risk Forecasts: New Multilevel Tests
We propose independence and conditional coverage tests which are aimed at evaluating the accuracy of Value-at-Risk (VaR) forecasts from the same model at different confidence levels. The proposed procedures are multilevel tests, i.e., joint tests of several quantiles corresponding to different confidence levels. In a comprehensive Monte Carlo exercise, we document the superiority of the proposed tests with respect to existing multilevel tests. In an empirical application, we illustrate the implementation of the tests using several VaR models and daily data for 15 MSCI world indices
Recommended from our members
High- and Low-Frequency Correlations in European Government Bond Spreads and Their Macroeconomic Drivers
We propose to adopt high-frequency DCC-MIDAS models to estimate high- and low-frequency correlations in the 10-year government bond spreads for Belgium, France, Italy, the Netherlands, and Spain relative to Germany, from June 1, 2007 to May 31, 2012. The high-frequency component, reflecting financial market conditions, is evaluated at 15-minute frequency, while the low-frequency component, fixed through a month, depends on country-specific macroeconomic conditions. We find strong links between spreads volatility and worsening macroeconomic fundamentals; in presence of similar macroeconomic fundamentals relative spreads move together; the increasing correlation in spreads during the burst of the sovereign debt crisis cannot be entirely ascribed to macroeconomic factors but rather to changes in market liquidity
Doing the right thing or doing things right: what is better for a successful manufacturing reshoring?
The article concerns the revision of earlier decisions to offshore production activities (so called “relocation of second degree”); more specifically it is focused on the “reshoring” (also referred as “relocation to the home country”, “back-reshoring” or “back-shoring”). The research aims are to investigate what types of mistakes occur along the decision-making and implementation process and how they affect the outcome, in terms of success or failure, of a relocation strategy. A multiple case study involving four companies in the fashion industry from Portugal and Italy was conducted. The cross-case analysis allowed to differentiate decision-making mistakes from implementation ones and to assess differences and similarities among the cases in terms of content of the relocation, drivers and outcomes. The research contributes to previous literature on reshoring by bringing evidence of different types of mistakes to be considered, thus requiring further conceptualization of the reshoring process. Managers and entrepreneurs should consider the importance of doing the things right also during the implementation, too often underestimated. The present article is the first one in the reshoring literature bringing evidence of cases of failure in the relocation decisions and discriminating among different kinds of mistakes.info:eu-repo/semantics/publishedVersio
Collaborate for what: a structural topic model analysis on CDP data
[EN] The aim of this paper is to understand why firms engage with their suppliers to collaborate for sustainability. To this purpose, we use the Carbon Disclosure Project (CDP) Supply Chain dataset and apply the Structural Topic Model to 1) identify the topics discussed in an open-ended question related to climate-related supplier engagement and 2) estimate the differences in the discussion of such topics between CDP members and non-members, respectively focal firms and first-tier suppliers. The analysis highlights that the two most prevalent reasons firms engage with their suppliers relate to several aspects of the management of the supply chain, and the services and goods mobility efficiency. It is further noted how first-tier suppliers do not dispose of established capabilities and, therefore, are still in the course of improving their processes. On the contrary, focal firms have more structured capabilities so to manage supplier engagement for information collection. This study demonstrates how big data and machine learning methods can be applied to analyse unstructured textual data from traditional surveys.Salvatore, C.; Madonna, A.; Bianchi, A.; Boffelli, A.; Kalchschmidt, M. (2022). Collaborate for what: a structural topic model analysis on CDP data. En 4th International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. 139-146. https://doi.org/10.4995/CARMA2022.2022.1507413914
Etnicidade mbyá em Puerto Iguazu. Exploração turística de/em comunidades indianas na tríplice fronteira (Misiones, Argentina)
La frontera internacional que divide Argentina de Paraguay y Brasil tiene la particularidad de ser un área turística —en ella se encuentran ubicadas las Cataratas del Iguazú, una de las siete maravillas del mundo— e hidrológica —está atravesada por el Acuífero Guaraní—. Tales características conllevan un flujo importante de personas en el que diferentes actores entran en contacto: turistas, empresarios hoteleros y sus trabajadores, la sociedad local y las comunidades mbyá guaraní que allí habitan, cuyos miembros, en este contexto, han encontrado en el turismo una nueva fuente de subsistencia. Si bien los indígenas ingresan en formas capitalistas como actores desiguales para tal explotación, encuentran diversas estrategias para realizar actividades turísticas, todas ellas atravesadas por la oferta de su etnicidad.The international border that divides Argentina from Paraguay and Brazil, has the particularity of being a border where tourism is prevalent —the site of Iguazú Falls, one of the seven world’s wonders— and a hydrological border —constituted by the Guaraní Aquifer—. These characteristics imply a significant flow of people where distinct actors come into contact: tourists, hotel entrepreneurs and their workers, local society and the Mbyá Guarani communities that live there. In this context the indigenous communities in the area have found in tourism a new source of subsistence; and although the indigenous communites enter into capitalist forms as unequal actors, they find diverse strategies to carry out this activity, all of them imbued with the offer of their ethnicity.A fronteira internacional que divide Argentina, Paraguai e Brasil, caracteriza-se por ser uma fronteira turística —onde localizam-se as Cataratas del Iguazú, uma das sete maravilhas do mundo— e hidrológica —está atravessada pelo Aqüífero Guarani—. Tais características geram um fluxo significativo de pessoas onde diferentes atores interagem: turistas, empresários dos hotéis e seus trabalhadores, a sociedade local e as comunidades mbyá guaraní. Neste contexto, as comunidades indígenas da área encontraram no turismo uma nova fonte de subsistência. Embora participem em práticas capitalistas como atores desiguais, encontram diferentes estratégias para realizar tais atividades, todas elas atravessadas pela promoção da própria etnicidade
MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations
The ability to assay genome-scale methylation patterns using high-throughput sequencing makes it possible to carry out association studies to determine the relationship between epigenetic variation and phenotype. While bisulfite sequencing can determine a methylome at high resolution, cost inhibits its use in comparative and population studies. MethylSeq, based on sequencing of fragment ends produced by a methylation-sensitive restriction enzyme, is a method for methyltyping (survey of methylation states) and is a site-specific and cost-effective alternative to whole-genome bisulfite sequencing. Despite its advantages, the use of MethylSeq has been restricted by biases in MethylSeq data that complicate the determination of methyltypes. Here we introduce a statistical method, MetMap, that produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. MetMap integrates genome sequence information with experimental data, in a statistically sound and cohesive Bayesian Network. It infers the extent of methylation at individual CGs and across regions, and serves as a framework for comparative methylation analysis within and among species. We validated MetMap's inferences with direct bisulfite sequencing, showing that the methylation status of sites and islands is accurately inferred. We used MetMap to analyze MethylSeq data from four human neutrophil samples, identifying novel, highly unmethylated islands that are invisible to sequence-based annotation strategies. The combination of MethylSeq and MetMap is a powerful and cost-effective tool for determining genome-scale methyltypes suitable for comparative and association studies
Functionally conserved enhancers with divergent sequences in distant vertebrates
Conserved transcription factor binding motifs in the five zebrafish/mouse syntenic enhancers. Identical n-mers (n ⼠7) identified in the zebrafish, mouse, and human sequences of the five syntenic CNS were examined for the presence of transcription factor binding motifs; only motifs with E-value E ⤠0.1 are shown. (XLSX 15 kb
- …