16 research outputs found

    The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

    Full text link
    Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers. This curation process is believed to be necessary to produce performant models with broad zero-shot generalization abilities. However, as larger models requiring pretraining on trillions of tokens are considered, it is unclear how scalable is curation and whether we will run out of unique high-quality data soon. At variance with previous beliefs, we show that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models from the state-of-the-art trained on The Pile. Despite extensive filtering, the high-quality data we extract from the web is still plentiful, and we are able to obtain five trillion tokens from CommonCrawl. We publicly release an extract of 600 billion tokens from our RefinedWeb dataset, and 1.3/7.5B parameters language models trained on it

    The Falcon Series of Open Language Models

    Full text link
    We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurrently developed models such as LLaMA 2 or Inflection-1. It nears the performance of PaLM-2-Large at a reduced pretraining and inference cost, making it, to our knowledge, one of the three best language models in the world along with GPT-4 and PaLM-2-Large. We report detailed evaluations, as well as a deep dive into the methods and custom tooling employed to pretrain Falcon. Notably, we report on our custom distributed training codebase, allowing us to efficiently pretrain these models on up to 4,096 A100s on cloud AWS infrastructure with limited interconnect. We release a 600B tokens extract of our web dataset, as well as the Falcon-7/40/180B models under a permissive license to foster open-science and accelerate the development of an open ecosystem of large language models

    Brain metastasis and renal cell carcinoma : prognostic scores assessment in the era of targeted therapies

    Get PDF
    Aim: This study aimed at exploring several brain metastatic prognostic scores in patients with renal cell carcinoma. Patients and Methods: We retrospectively analyzed data of 93 metastatic renal cell carcinoma patients who were diagnosed with brain metastases between October 2005 and July 2016 who received targeted therapy. Potential prognostic factors (RTOG RPA, BS-BM, and a newly developed score CERENAL) were analyzed. Results: A total of 75 patients received targeted therapy. All scores showed prognostic value in progression-free survival after first-line treatment with CERENAL being the sole independent prognostic factor associated with improved duration of first-line treatment. Both RTOG RPA and CERENAL were potential prognosticators for overall survival, whereas only the CERENAL score was associated with prolonged disease-specific survival. Conclusion: Several prognostic scores can be useful to predict survival of patients with brain metastases from renal cancer, especially the newly developed CERENAL score

    Vers la prĂ©diction de l’évolution de la microstructure sous irradiation d’alliages ferritiques modĂšles par une approche hybride AKMC-OKMC

    No full text
    Ce travail de thĂšse consistait en premier lieu Ă  accĂ©lĂ©rer un modĂšle de Monte Carlo CinĂ©tique Atomique visant Ă  simuler l’évolution de la microstructure d’alliages modĂšles du type FeCuMnNiSiP reprĂ©sentatifs de l’acier de cuve sous irradiation neutronique. Cette accĂ©lĂ©ration Ă©tait nĂ©cessaire pour atteindre des doses ainsi que des flux comparables Ă  l’expĂ©rience en des temps raisonnables. Pour cela, une accĂ©lĂ©ration algorithmique du code de calcul LAKIMOCA a d’abord Ă©tĂ© rĂ©alisĂ©e. Les diverses optimisations apportĂ©es ont permis d’accĂ©lĂ©rer le code d’un facteur 7. Cette accĂ©lĂ©ration ne s’avĂ©rant pas suffisante, l’approche retenue a Ă©tĂ© le dĂ©veloppement d’une approche hybride entre une approche Monte Carlo atomique et Monte Carlo d’objets. La paramĂ©trisation du modĂšle objet a permis de mieux comprendre les macro Ă©vĂšnements en jeux dans les simulations, mais s’est rĂ©vĂ©lĂ©e ĂȘtre d’une grande difficultĂ© lorsque la complexitĂ© chimique des objets devient trop importante. NĂ©anmoins, l’approche hybride a apportĂ© une accĂ©lĂ©ration des temps de calcul d’environ deux ordres de grandeur permettant de simuler des doses correspondant Ă  40 ans d’irradiation en production. De ces rĂ©sultats, diffĂ©rentes limitations du modĂšle ainsi que de sa paramĂ©trisation ont Ă©tĂ© mises en Ă©vidence. La difficultĂ© du modĂšle Ă  reproduire des effets de flux a Ă©tĂ© comblĂ©e par l’ajout d’un absorbeur visant Ă  rĂ©duire la force de puits des joints de grains ainsi que par l’ajout de piĂšges pour rendre compte de la prĂ©sence d’impuretĂ© dans le fer pur. Les simulations Ă  hautes doses dans les alliages du type FeCuMnNiSiP ont aussi mis en Ă©vidence des diffĂ©rences entre les microstructures simulĂ©es et celles observĂ©es expĂ©rimentalement. Ainsi, dans un second temps, un nouveau modĂšle de cohĂ©sion basĂ©e sur des interactions de paires dĂ©pendantes de la concentration locale a Ă©tĂ© dĂ©veloppĂ© et paramĂ©trĂ©. Bien que le nouveau modĂšle de cohĂ©sion soit numĂ©riquement plus lourd, il a Ă©tĂ© possible d’atteindre la dose ciblĂ©e en le couplant Ă  l’approche hybride. Les rĂ©sultats obtenus sont en meilleur en accord avec les calculs DFT rĂ©cents ainsi qu’avec les microstructures expĂ©rimentales.This PhD thesis work consisted, in the first place, in accelerating an atomic kinetic Monte Carlo model aiming at simulating the microstructure evolution of the FeCuMnNiP model alloys, representative of the reactor pressure vessel steels, under irradiation. This acceleration was required to reach, in a reasonable amount of time, doses and flux conditions comparable to the experimental ones. To do so, an algorithmic optimization has first been performed. The different optimizations introduced lead to an acceleration of the code of a 7 factor. Since this acceleration was not sufficient, the retained approach was to develop an hybrid between an AKMC and an OKMC. The parameterization of the object model provided a better understanding of the macro events involved in the simulations. It turns out that parameterize the model became too complex when increasing the chemical complexity of the objects. However, the hybrid approach brings an acceleration of two orders of magnitude allowing reaching doses corresponding to 40 years of irradiation in service condition. From these results, different limitations of the model as well as the parameterization were highlighted. The difficulty of the model to reproduce flux effect has been solved by adding an absorber that reduced the grain boundary sink strength. Traps have also been introduced to simulate the presence of impurities in pure iron. The high doses simulations in FeCuMnNiSiP model alloys also highlighted differences between the microstructures simulated and those observed experimentally. Thus, in a second time, a new cohesive model based on concentration dependent pair interactions has been developed and parameterized. While the new cohesive model is numerically heavier than the previous one, it has been possible to reach the target dose by coupling it with the hybrid model. The results obtained are in better agreement with recent DFT calculations and experimental microstructures

    GNN-based structural information to improve DNN-based basal ganglia segmentation in children following early brain lesion

    No full text
    International audienceAnalyzing the basal ganglia following an early brain lesion is crucial due to their noteworthy role in sensory–motor functions. However, the segmentation of these subcortical structures on MRI is challenging in children and is further complicated by the presence of a lesion. Although current deep neural networks (DNN) perform well in segmenting subcortical brain structures in healthy brains, they lack robustness when faced with lesion variability, leading to structural inconsistencies. Given the established spatial organization of the basal ganglia, we propose enhancing the DNN-based segmentation through post-processing with a graph neural network (GNN). The GNN conducts node classification on graphs encoding both class probabilities and spatial information regarding the regions segmented by the DNN. In this study, we focus on neonatal arterial ischemic stroke (NAIS) in children. The approach is evaluated on both healthy children and children after NAIS using three DNN backbones: U-Net, UNETr, and MSGSE-Net. The results show an improvement in segmentation performance, with an increase in the median Dice score by up to 4% and a reduction in the median Hausdorff distance (HD) by up to 93% for healthy children (from 36.45 to 2.57) and up to 91% for children suffering from NAIS (from 40.64 to 3.50). The performance of the method is compared with atlas-based methods. Severe cases of neonatal stroke result in a decline in performance in the injured hemisphere, without negatively affecting the segmentation of the contra-injured hemisphere. Furthermore, the approach demonstrates resilience to small training datasets, a widespread challenge in the medical field, particularly in pediatrics and for rare pathologies

    Detecting cerebral palsy in neonatal stroke children: GNN-based detection considering the structural organization of basal ganglia

    No full text
    International audienceAs a long-term consequence of neonatal arterial ischaemic stroke (NAIS), the presence of cerebral palsy (CP) depends on the structural integrity of brain areas, especially of basal ganglia. Yet, it remains challenging to establish an early diagnosis of CP from a conventional structural MRI. In this study, we introduce a graph neural network-based classification for the recognition of NAIS children and mainly for the detection of children with CP among the NAIS ones. From the structural MRI of 68 children aged 7 years old and their corresponding segmentation of basal ganglia, we construct graphs where nodes represent structures, carrying on node and edge attributes structural information (volumes, distances). The classification accuracy achieved by the proposed method is of 86% for the detection of NAIS and of 89% for the detection of CP among neonatal stroke children

    Hand function after neonatal stroke: a graph model based on basal ganglia and thalami structure

    No full text
    International audienceIntroduction: Neonatal arterial ischemic stroke (NAIS) is a common model to study the impact of a unilateral early brain insult on developmental brain plasticity and the appearance of long-term outcomes. Motor difficulties that may arise are typically related to poor function of the affected (contra-lesioned) hand, but surprisingly also of the ipsilesional hand. Although many longitudinal studies after NAIS have shown that predicting the occurrence of gross motor difficulties is easier, accurately predicting hand motor function (for both hands) from morphometric MRI remains complicated. The hypothesis of an association between the structural organization of the basal ganglia (BG) and thalamus with hand motor function seems intuitive, given their key role in sensorimotor function. Neuroimaging studies have frequently investigated these structures to evaluate the correlation between their volumes and motor function following early brain injury. However, the results have been controversial. We hypothesize the involvement of other structural parameters.Method: The study involves 35 children (mean age 7.3 years, SD 0.4) with middle cerebral artery NAIS who underwent a structural T1-weighted 3D MRI and clinical examination to assess manual dexterity using the Box and Blocks Test (BBT). Graphs are used to represent high-level structural information of the BG and thalami (volumes, elongations, distances) measured from the MRI. A graph neural network (GNN) is proposed to predict children’s hand motor function through a graph regression. To reduce the impact of external factors on motor function (such as behavior and cognition), we calculate a BBT score ratio for each child and hand.Results: The results indicate a significant correlation between the score ratios predicted by our method and the actual score ratios of both hands (p < 0.05), together with a relatively high accuracy of prediction (mean L1 distance < 0.03). The structural information seems to have a different influence on each hand’s motor function. The affected hand’s motor function is more correlated with the volume, while the ‘unaffected’ hand function is more correlated with the elongation of the structures. Experiments emphasize the importance of considering the whole macrostructural organization of the basal ganglia and thalami networks, rather than the volume alone, to predict hand motor function.Conclusion: There is a significant correlation between the structural characteristics of the basal ganglia/thalami and motor function in both hands. These results support the use of MRI macrostructural features of the basal ganglia and thalamus as an early biomarker for predicting motor function in both hands after early brain injury

    Hand function after neonatal stroke: A graph model based on basal ganglia and thalami structure

    No full text
    Introduction: Neonatal arterial ischemic stroke (NAIS) is a common model to study the impact of a unilateral early brain insult on developmental brain plasticity and the appearance of long-term outcomes. Motor difficulties that may arise are typically related to poor function of the affected (contra-lesioned) hand, but surprisingly also of the ipsilesional hand. Although many longitudinal studies after NAIS have shown that predicting the occurrence of gross motor difficulties is easier, accurately predicting hand motor function (for both hands) from morphometric MRI remains complicated. The hypothesis of an association between the structural organization of the basal ganglia (BG) and thalamus with hand motor function seems intuitive given their key role in sensorimotor function. Neuroimaging studies have frequently investigated these structures to evaluate the correlation between their volumes and motor function following early brain injury. However, the results have been controversial. We hypothesize the involvement of other structural parameters. Method: The study involves 35 children (mean age 7.3 years, SD 0.4) with middle cerebral artery NAIS who underwent a structural T1-weighted 3D MRI and clinical examination to assess manual dexterity using the Box and Blocks Test (BBT). Graphs are used to represent high-level structural information of the BG and thalami (volumes, elongations, distances) measured from the MRI. A graph neural network (GNN) is proposed to predict children’s hand motor function through a graph regression. To reduce the impact of external factors on motor function (such as behavior and cognition), we calculate a BBT score ratio for each child and hand. Results: The results indicate a significant correlation between the score ratios predicted by our method and the actual score ratios of both hands (p < 0.05), together with a relatively high accuracy of prediction (mean L1 distance < 0.03). The structural information seems to have a different influence on each hand’s motor function. The affected hand’s motor function is more correlated with the volume, while the ‘unaffected’ hand function is more correlated with the elongation of the structures. Experiments emphasize the importance of considering the whole macrostructural organization of the basal ganglia and thalami networks, rather than the volume alone, to predict hand motor function. Conclusion: There is a significant correlation between the structural characteristics of the basal ganglia/thalami and motor function in both hands. These results support the use of MRI macrostructural features of the basal ganglia and thalamus as an early biomarker for predicting motor function in both hands after early brain injury
    corecore