Search CORE

635 research outputs found

Identification and correction of systematic error in high-throughput sequence data

Author: Dario Boffelli
David I. K. Martin
Frazer Meacham
Joseph Dhahbi
Lior Pachter
Meromit Singer
Publication venue
Publication date: 01/01/2011
Field of study

A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Caltech Authors

Nature Precedings

Recommended from our members

Evaluating the Accuracy of Value-at-Risk Forecasts: New Multilevel Tests

Author: Boffelli S.
Leccadito A.
Urga G.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2014
Field of study

We propose independence and conditional coverage tests which are aimed at evaluating the accuracy of Value-at-Risk (VaR) forecasts from the same model at different confidence levels. The proposed procedures are multilevel tests, i.e., joint tests of several quantiles corresponding to different confidence levels. In a comprehensive Monte Carlo exercise, we document the superiority of the proposed tests with respect to existing multilevel tests. In an empirical application, we illustrate the implementation of the tests using several VaR models and daily data for 15 MSCI world indices

City Research Online

Crossref

Recommended from our members

High- and Low-Frequency Correlations in European Government Bond Spreads and Their Macroeconomic Drivers

Author: Boffelli S.
Skintzi V. D.
Urga G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/12/2015
Field of study

We propose to adopt high-frequency DCC-MIDAS models to estimate high- and low-frequency correlations in the 10-year government bond spreads for Belgium, France, Italy, the Netherlands, and Spain relative to Germany, from June 1, 2007 to May 31, 2012. The high-frequency component, reflecting financial market conditions, is evaluated at 15-minute frequency, while the low-frequency component, fixed through a month, depends on country-specific macroeconomic conditions. We find strong links between spreads volatility and worsening macroeconomic fundamentals; in presence of similar macroeconomic fundamentals relative spreads move together; the increasing correlation in spreads during the burst of the sovereign debt crisis cannot be entirely ascribed to macroeconomic factors but rather to changes in market liquidity

City Research Online

Crossref

Doing the right thing or doing things right: what is better for a successful manufacturing reshoring?

Author: Boffelli Albachiara
Fratocchi Luciano
Kalchschmidt Matteo
Silva Susana C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/03/2021
Field of study

The article concerns the revision of earlier decisions to offshore production activities (so called “relocation of second degree”); more specifically it is focused on the “reshoring” (also referred as “relocation to the home country”, “back-reshoring” or “back-shoring”). The research aims are to investigate what types of mistakes occur along the decision-making and implementation process and how they affect the outcome, in terms of success or failure, of a relocation strategy. A multiple case study involving four companies in the fashion industry from Portugal and Italy was conducted. The cross-case analysis allowed to differentiate decision-making mistakes from implementation ones and to assess differences and similarities among the cases in terms of content of the relocation, drivers and outcomes. The research contributes to previous literature on reshoring by bringing evidence of different types of mistakes to be considered, thus requiring further conceptualization of the reshoring process. Managers and entrepreneurs should consider the importance of doing the things right also during the implementation, too often underestimated. The present article is the first one in the reshoring literature bringing evidence of cases of failure in the relocation decisions and discriminating among different kinds of mistakes.info:eu-repo/semantics/publishedVersio

Repositório Institucional da Universidade Católica Portuguesa

Collaborate for what: a structural topic model analysis on CDP data

Author: Bianchi Annamaria
Boffelli Albachiara
Kalchschmidt Matteo
Madonna Alice
Salvatore Camilla
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 20/09/2022
Field of study

[EN] The aim of this paper is to understand why firms engage with their suppliers to collaborate for sustainability. To this purpose, we use the Carbon Disclosure Project (CDP) Supply Chain dataset and apply the Structural Topic Model to 1) identify the topics discussed in an open-ended question related to climate-related supplier engagement and 2) estimate the differences in the discussion of such topics between CDP members and non-members, respectively focal firms and first-tier suppliers. The analysis highlights that the two most prevalent reasons firms engage with their suppliers relate to several aspects of the management of the supply chain, and the services and goods mobility efficiency. It is further noted how first-tier suppliers do not dispose of established capabilities and, therefore, are still in the course of improving their processes. On the contrary, focal firms have more structured capabilities so to manage supplier engagement for information collection. This study demonstrates how big data and machine learning methods can be applied to analyse unstructured textual data from traditional surveys.Salvatore, C.; Madonna, A.; Bianchi, A.; Boffelli, A.; Kalchschmidt, M. (2022). Collaborate for what: a structural topic model analysis on CDP data. En 4th International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. 139-146. https://doi.org/10.4995/CARMA2022.2022.1507413914

RiuNet

Etnicidade mbyá em Puerto Iguazu. Exploração turística de/em comunidades indianas na tríplice fronteira (Misiones, Argentina)

Author: Boffelli Clara
Cantore Alfonsina
Publication venue: 'Editorial de la Facultad de Filosofia y Letras - Universidad de Buenos Aires'
Publication date: 27/10/2017
Field of study

La frontera internacional que divide Argentina de Paraguay y Brasil tiene la particularidad de ser un área turística —en ella se encuentran ubicadas las Cataratas del Iguazú, una de las siete maravillas del mundo— e hidrológica —está atravesada por el Acuífero Guaraní—. Tales características conllevan un flujo importante de personas en el que diferentes actores entran en contacto: turistas, empresarios hoteleros y sus trabajadores, la sociedad local y las comunidades mbyá guaraní que allí habitan, cuyos miembros, en este contexto, han encontrado en el turismo una nueva fuente de subsistencia. Si bien los indígenas ingresan en formas capitalistas como actores desiguales para tal explotación, encuentran diversas estrategias para realizar actividades turísticas, todas ellas atravesadas por la oferta de su etnicidad.The international border that divides Argentina from Paraguay and Brazil, has the particularity of being a border where tourism is prevalent —the site of Iguazú Falls, one of the seven world’s wonders— and a hydrological border —constituted by the Guaraní Aquifer—. These characteristics imply a significant flow of people where distinct actors come into contact: tourists, hotel entrepreneurs and their workers, local society and the Mbyá Guarani communities that live there. In this context the indigenous communities in the area have found in tourism a new source of subsistence; and although the indigenous communites enter into capitalist forms as unequal actors, they find diverse strategies to carry out this activity, all of them imbued with the offer of their ethnicity.A fronteira internacional que divide Argentina, Paraguai e Brasil, caracteriza-se por ser uma fronteira turística —onde localizam-se as Cataratas del Iguazú, uma das sete maravilhas do mundo— e hidrológica —está atravessada pelo Aqüífero Guarani—. Tais características geram um fluxo significativo de pessoas onde diferentes atores interagem: turistas, empresários dos hotéis e seus trabalhadores, a sociedade local e as comunidades mbyá guaraní. Neste contexto, as comunidades indígenas da área encontraram no turismo uma nova fonte de subsistência. Embora participem em práticas capitalistas como atores desiguais, encontram diferentes estratégias para realizar tais atividades, todas elas atravessadas pela promoção da própria etnicidade

Revistas Científicas de Filo (Facultad de Filosofía y Letras, UBA - Universidad de Buenos Aires)

MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations

Author: Boffelli Dario
Dhahbi Joseph
Martin David I. K.
Pachter Lior
Schroth Gary P.
Schönhuth Alexander
Singer Meromit
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/08/2010
Field of study

The ability to assay genome-scale methylation patterns using high-throughput sequencing makes it possible to carry out association studies to determine the relationship between epigenetic variation and phenotype. While bisulfite sequencing can determine a methylome at high resolution, cost inhibits its use in comparative and population studies. MethylSeq, based on sequencing of fragment ends produced by a methylation-sensitive restriction enzyme, is a method for methyltyping (survey of methylation states) and is a site-specific and cost-effective alternative to whole-genome bisulfite sequencing. Despite its advantages, the use of MethylSeq has been restricted by biases in MethylSeq data that complicate the determination of methyltypes. Here we introduce a statistical method, MetMap, that produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. MetMap integrates genome sequence information with experimental data, in a statistically sound and cohesive Bayesian Network. It infers the extent of methylation at individual CGs and across regions, and serves as a framework for comparative methylation analysis within and among species. We validated MetMap's inferences with direct bisulfite sequencing, showing that the methylation status of sites and islands is accurately inferred. We used MetMap to analyze MethylSeq data from four human neutrophil samples, identifying novel, highly unmethylated islands that are invisible to sequence-based annotation strategies. The combination of MethylSeq and MetMap is a powerful and cost-effective tool for determining genome-scale methyltypes suitable for comparative and association studies

Directory of Open Access Journals

PubMed Central

Caltech Authors

Functionally conserved enhancers with divergent sequences in distant vertebrates

Author: Alexander Poliakov
Dario Boffelli
Inna Dubchak
Nadav Ahituv
Nir Oksenberg
Sachiko Takayama
Seok-Jin Heo
Song Yang
Publication venue: Springer Nature
Publication date: 01/10/2015
Field of study

Conserved transcription factor binding motifs in the five zebrafish/mouse syntenic enhancers. Identical n-mers (n âĽ 7) identified in the zebrafish, mouse, and human sequences of the five syntenic CNS were examined for the presence of transcription factor binding motifs; only motifs with E-value E â¤ 0.1 are shown. (XLSX 15 kb

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

The Francis Crick Institute