22 research outputs found
Highly sensitive feature detection for high resolution LC/MS
<p>Abstract</p> <p>Background</p> <p>Liquid chromatography coupled to mass spectrometry (LC/MS) is an important analytical technology for e.g. metabolomics experiments. Determining the boundaries, centres and intensities of the two-dimensional signals in the LC/MS raw data is called feature detection. For the subsequent analysis of complex samples such as plant extracts, which may contain hundreds of compounds, corresponding to thousands of features – a reliable feature detection is mandatory.</p> <p>Results</p> <p>We developed a new feature detection algorithm <it>centWave </it>for high-resolution LC/MS data sets, which collects regions of interest (partial mass traces) in the raw-data, and applies continuous wavelet transformation and optionally Gauss-fitting in the chromatographic domain. We evaluated our feature detection algorithm on dilution series and mixtures of seed and leaf extracts, and estimated recall, precision and F-score of seed and leaf specific features in two experiments of different complexity.</p> <p>Conclusion</p> <p>The new feature detection algorithm meets the requirements of current metabolomics experiments. <it>centWave </it>can detect close-by and partially overlapping features and has the highest overall recall and precision values compared to the other algorithms, <it>matchedFilter </it>(the original algorithm of <it>XCMS</it>) and the centroidPicker from <it>MZmine</it>. The <it>centWave </it>algorithm was integrated into the Bioconductor R-package <it>XCMS </it>and is available from <url>http://www.bioconductor.org/</url></p
Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
Increasingly, scholarly articles contain URI references to "web at large" resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource's content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems
Understanding Learner’s Drop-Out in MOOCs
International audienceThis paper focuses on anticipating the drop-out among MOOC learners and helping in the identification of the reasons behind this drop-out. The main reasons are those related to course design and learners behavior, according to the requirements of the MOOC provider OpenClassrooms. Two critical business needs are identified in this context. First, the accurate detection of at-risk droppers, which allows sending automated motivational feedback to prevent learners drop-out. Second, the investigation of possible drop-out reasons, which allows making the necessary personalized interventions. To meet these needs, we present a supervised machine learning based drop-out prediction system that uses Predictive algorithms (Random Forest and Gradient Boosting) for automated intervention solutions, and Explicative algorithms (Logistic Regression, and Decision Tree) for personalized intervention solutions. The performed experimentations cover three main axes; (1) Implementing an enhanced reliable dropout-prediction system that detects at-risk droppers at different specified instants throughout the course. (2) Introducing and testing the effect of advanced features related to the trajectories of learners’ engagement with the course (backward jumps, frequent jumps, inactivity time evolution). (3) Offering a preliminary insight on how to use readable classifiers to help determine possible reasons for drop-out. The findings of the mentioned experimental axes prove the viability of reaching the expected intervention strategies
Quality of Experience for Personalized Sightseeing Tours: Studies and Proposition for an Evaluation Method
International audienc
Directed partial correlation : inferring large-scale gene regulatory network through induced topology disruptions
Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a
popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since
the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate
inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, largescale
data. We propose a directed partial correlation (DPC) method as an efficient and effective solution to regulatory
network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale
datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional
independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that
when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction
signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the
DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied
to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network
modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The
R package DPC is available for download (http://code.google.com/p/dpcnet/)