663 research outputs found

    Random Forests for Big Data

    Get PDF
    Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations

    Carte auto-organisatrice pour graphes étiquetés.

    No full text
    National audienceDans de nombreux cas d'études concrets, l'analyse de données sur les graphes n'est pas limitée à la seule connaissance du graphe. Il est courant que des informations supplémentaires soient disponibles sur les sommets et que l'utilisateur souhaite combiner ces informations à la structure du graphe lui-même pour comprendre l'intégralité des données en sa possession. C'est ce problème que nous souhaitons aborder dans cet article, en nous focalisant sur une méthode de fouille de données qui combine classification (non supervisée) et visualisation : les cartes auto-organisatrices. Nous expliquons comment l'utilisation de méthodes à noyaux permet de combiner de manière efficace des informations de natures diverses (graphe, variables numériques, facteurs, variables textuelles...) pour décortiquer la structure des données et en offrir une représentation simplifiée. Notre approche est illustrée sur divers exemples : un premier exemple, sur des données simulées, permet de comprendre comment se comporte l'algorithme. Un second exemple illustre la méthode sur un graphe réel de plusieurs centaines de sommets, qui modélise un corpus de documents médiévaux

    Multiple kernel self-organizing maps

    No full text
    International audienceIn a number of real-life applications, the user is interested in analyzing several sources of information together: a graph combined with the additional information known on its nodes, numerical variables measured on individuals and factors describing these individuals... The combination of all sources of information can help him to understand the dataset in its whole better. The present article focuses on such an issue, by using self-organizing maps. The use a kernel version of the algorithm allows us to combine various types of information and automatically tune the data combination. This approach is illustrated on a simulated example

    Classification non supervisée d'un graphe de co-expression avec des méta-données pour la détection de micro-ARNs

    No full text
    National audienceNous présentons dans cet article une méthode de classification non supervisée de sommets d'un graphe qui est utilisée dans un contexte biologique particulier. La problématique est de détecter de manière non supervisée des micro-ARNs probables. Pour ce faire, nous utilisons une approche multi-noyaux permettant d'intégrer des informations sur le graphe de co-expression et des informations supplémentaires sur les sommets de ce graphe. Cette approche est rendue robuste par une technique de bagging de classifications. Les résultats obtenus donnent des groupes de miRNAs potentiels dont certains permettent de discriminer avec une bonne confiance les vrais miRNAs des faux positifs

    Multi-image flock size estimation with CountEm: A casestudy with half a million Common Eiders and Greater Snow Geese

    Get PDF
    Many of the methods used for estimating population size from ecological surveys have limitations on precision, cost, and/or applicability. The CountEm method was proposed recently for estimating the number of individuals in large groups from single images. It is simple and efficient, and can be applied to any species. Here we present a case study by applying CountEm to a real ecological survey with 278 images of Greater Snow Geese (Anser caerulescens atlanticus) and Common Eiders (Somateria mollissima) flocks taken from fixed-wing aircraft in Eastern Canada. First, we evaluated the precision and counting time of CountEm on single images. Second, we developed and tested a new multi-image version of the CountEm software. We show that flock sizes of N?>?35,000 can be estimated on single images in ?5 min, from counting a sample of ?200 birds, yielding relative SEs in the 5%?10% range. Processing times increased to 10?20?min when simultaneously processing large numbers of images that contained over half a million birds with only modest increases in relative SE (range: 10%?15%). Our results suggest that CountEm may be used to save time and resources if incorporated into monitoring programs that utilize imagery in the abundance estimates.Proyecto Puente (Contrato Programa Gobierno de Cantabria-Universidad de Cantabria), ConsejerĂ­a de Universidades, Igualdad, Cultura y Deporte del Gobierno de Cantabri

    Investigation of liver alcohol dehydrogenase catalysis using an NADH biomimetic and comparison with a synthetic zinc model complex

    Get PDF
    We have compared the catalytic activity of horse liver alcohol dehydrogenase (LADH) with a synthetic zinc model complex in the presence of N-benzyl-1,4-dihydronicotinamide (BNAH), a cofactor which serves as a biomimetic for the natural cofactor NADH. We have used five different substrates (benzaldehyde, p-anisaldehyde, 4-nitrobenzaldehyde, 2-pyridine carboxaldehyde, and 5-pyrimidine carboxaldehyde) in this study. These substrates vary in their substituent inductive effect, which is the ability to donate or withdraw electron density away from their carbonyl-functional group. Our results reveal that in the presence of NADH, geometric factors (induced fit of the substrate and cofactor in the enzyme active site) are vital. However, reactivity assays show that in the presence of BNAH, there is a strong correlation between substrate electronic environment and the observed catalytic rate, i.e. the more electron withdrawn the substrate, the greater the speed at which the reduction reaction occurs. NMR spectroscopy reveals that a synthetic zinc model complex catalyzes the reduction of substrates in a manner consistent with LADH enzyme

    Dissection of PIM serine/threonine kinases in FLT3-ITD–induced leukemogenesis reveals PIM1 as regulator of CXCL12–CXCR4-mediated homing and migration

    Get PDF
    FLT3-ITD–mediated leukemogenesis is associated with increased expression of oncogenic PIM serine/threonine kinases. To dissect their role in FLT3-ITD–mediated transformation, we performed bone marrow reconstitution assays. Unexpectedly, FLT3-ITD cells deficient for PIM1 failed to reconstitute lethally irradiated recipients, whereas lack of PIM2 induction did not interfere with FLT3-ITD–induced disease. PIM1-deficient bone marrow showed defects in homing and migration and displayed decreased surface CXCR4 expression and impaired CXCL12–CXCR4 signaling. Through small interfering RNA–mediated knockdown, chemical inhibition, expression of a dominant-negative mutant, and/or reexpression in knockout cells, we found PIM1 activity to be essential for proper CXCR4 surface expression and migration of cells toward a CXCL12 gradient. Purified PIM1 led to the phosphorylation of serine 339 in the CXCR4 intracellular domain in vitro, a site known to be essential for normal receptor recycling. In primary leukemic blasts, high levels of surface CXCR4 were associated with increased PIM1 expression, and this could be significantly reduced by a small molecule PIM inhibitor in some patients. Our data suggest that PIM1 activity is important for homing and migration of hematopoietic cells through modification of CXCR4. Because CXCR4 also regulates homing and maintenance of cancer stem cells, PIM1 inhibitors may exert their antitumor effects in part by interfering with interactions with the microenvironment

    Synthesis, Characterization, and Computational Study of Three-Coordinate SNS Copper(I) Complexes Based on Bis-Thione Ligand Precursors

    Get PDF
    A series of tridentate pincer ligands, each possessing two sulfur and one nitrogen donor (SNS), based on bis-imidazolyl or bis-triazolyl salts were metallated with CuCl2 to give new tridentate SNS pincer copper(I) complexes [(SNS)Cu]+. These orange complexes exhibit a three-coordinate pseudo-trigonal-planar geometry in copper. During the formation of these copper(I) complexes, disproportionation is observed as the copper(II) salt precursor is converted into the Cu(I) [(SNS)Cu]+ cation and the [CuCl4]2– counteranion. The [(SNS)Cu]+ complexes were characterized with single crystal X-ray diffraction, electrospray mass spectrometry, EPR spectroscopy, attenuated total reflectance infrared spectroscopy, UV–Vis spectroscopy, cyclic voltammetry, and elemental analysis. The EPR spectra are consistent with anisotropic Cu(II) signals with four hyperfine splittings in the lower-field region (g||) and g values consistent with the presence of the tetrachlorocuprate. Various electronic transitions are apparent in the UV–Vis spectra of the complexes and originate in the copper-containing cations and anions. Density functional calculations support the nature of the SNS binding, allowing assignment of a number of features present in the UV–Vis and IR spectra and cyclic voltammograms of these complexes

    Can influence anxiety and depression the six-minute walking test performance in post-surgical heart valve patients? A pilot study

    Get PDF
    Various functional indicators are utilized to measure outcome in cardiac rehabilitation. Little information exists regarding the role played by psychological variables during the rehabilitative period, after cardiac valve surgery. The present study aims at exploring the relationship existing between different levels of functional capacity measured by six-minute walking test, (6MWT) and emotional aspects such as anxiety and depression. Materials and methods. 126 post-surgical heart valve patients underwent at the beginning and at the end of the rehabilitative programme: 1) 6MWT; 2) assessment of anxiety and depression (A-D Questionnaire according to the CBA-2.0 Primary Scale). Results. Cardiac rehabilitation was associated with a general and significant improvement in the 6MWT (273+98 metres versus 363+96; p<0.001) and the functional performance parameters (diastolic blood pressure; p<0.001 and fatigue p<0.001). Simultaneously there was a significant improvement of patient-reported quality of life, revealed by the A-D questionnaire in both male and female patients. The Depression Questionnaire score is predictive of functional capacity. It was demonstrated that no matter what the clinical condition of the patient, the depression score influences the patient’s performance during the 6MWT, not only regards the distance covered (p=.008), but also fatigue expressed by the Borg RPE index (p=.044). Conclusion. Depression, an emotional variable, selfevaluated by the standardized questionnaire can, even if only partially, influence the 6MWT, a functional indicator of exercise tolerance, widely utilized in cardiac rehabilitation

    Syntheses and characterization of three-and five-coordinate copper(II) complexes based on SNS pincer ligand precursors

    Get PDF
    A series of tridentate pincer ligands, each possessing two sulfur- and one nitrogen-donor functionalities (SNS), based on a bis-imidazolyl precursor were metallated with CuCl2 to give new tridentate SNS pincer copper(II) complexes [(SNS)CuCl2]. These purple complexes exhibit a five-coordinate pseudo-square pyramidal geometry at the copper center. The [(SNS)CuCl2] complexes were characterized with single crystal X-ray diffraction, electrospray mass spectrometry, EPR spectroscopy, attenuated total reflectance infrared spectroscopy, UV–Vis spectroscopy, cyclic voltammetry, and elemental analysis. The EPR spectra are consistent with typical anisotropic Cu(II) signals with four hyperfine splittings in the lower-field region (g||). Various electronic transitions are apparent in the UV–Vis spectra of the complexes and originate from d-to-d transitions or various charge transfer transitions. We preformed computational studies to understand the influence that structural constraints internal to our tridentate SNS ligand precursors have on the oxidation state of the resulting bound copper complex. We have determined that a d9 copper(II) metal center is better situated than a d10 copper(I) center to bind our tridentate SNS ligand set when it does not contain an internal CH2 group. Without this methylene linker, the SNS ligand forces the N and S atoms into a T-shaped arrangement about the metal center
    • …
    corecore