31 research outputs found

    The Software Heritage License Dataset (2022 Edition)

    Full text link
    Context: When software is released publicly, it is common to include with it either the full text of the license or licenses under which it is published, or a detailed reference to them. Therefore public licenses, including FOSS (free, open source software) licenses, are usually publicly available in source code repositories.Objective: To compile a dataset containing as many documents as possible that contain the text of software licenses, or references to the license terms. Once compiled, characterize the dataset so that it can be used for further research, or practical purposes related to license analysis.Method: Retrieve from Software Heritage-the largest publicly available archive of FOSS source code-all versions of all files whose names are commonly used to convey licensing terms. All retrieved documents will be characterized in various ways, using automated and manual analyses.Results: The dataset consists of 6.9 million unique license files. Additional metadata about shipped license files is also provided, making the dataset ready to use in various contexts, including: file length measures, MIME type, SPDX license (detected using ScanCode), and oldest appearance. The results of a manual analysis of 8102 documents is also included, providing a ground truth for further analysis. The dataset is released as open data as an archive file containing all deduplicated license files, plus several portable CSV files with metadata, referencing files via cryptographic checksums.Conclusions: Thanks to the extensive coverage of Software Heritage, the dataset presented in this paper covers a very large fraction of all software licenses for public code. We have assembled a large body of software licenses, characterized it quantitatively and qualitatively, and validated that it is mostly composed of licensing information and includes almost all known license texts. The dataset can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. It can also be used in practice to improve tools detecting licenses in source code

    Distributed optical fibre sensing for early detection of shallow landslides triggering

    Get PDF
    A distributed optical fibre sensing system is used to measure landslide-induced strains on an optical fibre buried in a\uc2\ua0large scale physical model of a slope. The fibre sensing cable is deployed at the predefined failure surface and interrogated by means of optical frequency domain reflectometry. The strain evolution is measured with centimetre spatial resolution until the occurrence of the slope failure. Standard legacy sensors measuring soil moisture and pore water pressure are installed at different depths and positions along the slope for comparison and validation. The evolution of the strain field is related to landslide dynamics with unprecedented resolution and insight. In fact, the results of the experiment clearly identify several phases within the evolution of the landslide and show that optical fibres can detect precursory signs of failure well before the collapse, paving the way for the development of more effective early warning systems

    Assisting Software Developers With License Compliance

    Get PDF
    Open source licensing determines how open source systems are reused, distributed, and modified from a legal perspective. While it facilitates rapid development, it can present difficulty for developers in understanding due to the legal language of these licenses. Because of misunderstandings, systems can incorporate licensed code in a way that violates the terms of the license. Such incompatibilities between licensing can result in the inability to reuse a particular library without either relicensing the system or redesigning the architecture of the system. Prior efforts have predominantly focused on license identification or understanding the underlying phenomena without reasoning about compatibility in a broad scale. The work in this dissertation first investigates the rationale of developers and identifies the areas that developers struggle with respect to free/open source software licensing. First, we investigate the diffusion of licenses and the prevalence of license changes in a large scale empirical study of 16,221 Java systems. We observed a clear lack of traceability and a lack of standardized licensing that led to difficulties and confusion for developers trying to reuse source code. We further investigated the difficulty by surveying the developers of the systems with license changes to understand why they first adopted a license and then changed licenses. Additionally, we performed an analysis on issue trackers and legal mailing lists to extract licensing bugs. From these works, we identified key areas in which developers struggled and needed support. While developers need support to identify license incompatibilities and understand both the cause and implications of the incompatibilities, we observed that state-of-the-art license identification tools did not identify license exceptions. Since these exceptions directly modify the license terms (either the permissions granted by the license or the restrictions imposed by the license), we proposed an approach to complement current license identification techniques in order to classify license exceptions. The approach relies on supervised machine learners to classify the licensing text to identify the particular license exceptions or the lack of a license exception. Subsequently, we built an infrastructure to assist developers with evaluating license compliance warnings for their system. The infrastructure evaluates compliance across the dependency tree of a system to ensure it is compliant with all of the licenses of the dependencies. When an incompatibility is present, it notes the specific library/libraries and the conflicting license(s) so that the developers can investigate these compliance warnings, which would prevent distribution of their software, in their system. We conduct a study on 121,094 open source projects spanning 6 programming languages, and we demonstrate that the infrastructure is able to identify license incompatibilities between these projects and their dependencies

    The Dairy Industry: Process, Monitoring, Standards, and Quality

    Get PDF
    Sampling and analysis occur along the milk processing train: from collection at farm level, to intake at the diary plant, the processing steps, and the end products. Milk has a short shelf life; however, products such as milk powders have allowed a global industry to be developed. Quality control tests are vital to support activities for hygiene and food standards to meet regulatory and customer demands. Multiples of chemical and microbiological contamination tests are undertaken. Hazard analysis testing strategies are necessary, but some tests may be redundant; it is therefore vital to identify product optimization quality control strategies. The time taken to undergo testing and turnaround time are rarely measured. The dairy industry is a traditional industry with a low margin commodity. Industry 4.0 vision for dairy manufacturing is to introduce the aspects of operational excellence and implementation of information and communications technologies. The dairy industries’ reply to Industry 4.0 is represented predominantly by proactive maintenance and optimization of production and logistical chains, such as robotic milking machines and processing and packaging line automation reinforced by sensors for rapid chemical and microbial analysis with improved and real-time data management. This chapter reviews the processing trains with suggestions for improved optimization

    Source Code and License Statement Co-Evolution

    Get PDF
    RESUME Les logiciels libres reposent largement sur la éutilisation de composants logiciels disponibles sous une variété de licences (e.g., Apache, BSD, GPL, ou LGPL). Différentes licences imposent des limitations et des conditions différentes sur la réutilisation d’un programme et sa redistribution ce qui rend difficile la compréhension des contraintes juridiques imposées au système final. La licence d’un fichier est spécifié par une déclaration de licence. Les déclarations de licence sont des extraits de texte insérées en haut du code source ou de tout autre fichier qui spécifie la licence sous laquelle le fichier peut être réutilisé, ainsi que les contributeurs qui possèdent des droits d’auteur sur le fichier. Les déclarations de licence ne sont pas un concept statique car les projets peuvent mettre à jour leur licences (version ou type) ou ajouter des contributeurs. Comme ces changements peuvent avoir un impact majeur sur un système en terme de sa distribution et son utilisation, (1) il est important de comprendre quand ils se produisent au cours du développement relativement à l’évolution du système (le changement des licences peut être pendant d’importantes modifications ou indépendamment de l’évolution des modifications du système), (2) combien de fois ils se produisent (rare vs. récurants), et (3) qui les effectue (experts vs. développeurs réguliers). D’abord, nous proposons, un métamodèle pour effectuer des analyses qui permettent la détection des problèmes de licence et ce meta-modèle présente aussi une source d’information structurée qui peut être utilisé dans les études reliées aux licences. Ensuite, nous présentons une étude sur la co-évolution des déclarations de licence et le code source dans sept systèmes OSS : JFreeChart, Jitsi, PHP, Rhino, Tomcat, XalanJ et XercesJ. Notre étude montre que ce n’est que dans quelques cas, dans PHP, que les évolutions des déclarations de licences et celle du logiciel sont soigneusement planifiées et gérées ensemble juste avant les versions majeures. Dans tous les systèmes, les développeurs qui effectuent plus de changement de code source, sont aussi les plus actifs mainteneurs de licence. Notre travail permet de comprendre quand les déclarations de licence sont changées et permet d’identifier les développeurs qui effectuent ces changements. De ce point de vue, notre travail est un travail préliminaire afin de mieux contrôler l’impact de ces changements sur le système, i.e., éviter l’introduction des inconsistences en proposant une méthodologie pour la gestion des changements de licences des règles de vérification des termes de license en se basant sur notre metamodèle.----------ABSTRACT Open-source software (OSS) systems heavily rely on the reuse of software components made available under a variety of software licenses (e.g., Apache, BSD, GPL, or LGPL). Different licenses impose different limitations and conditions on program reuse and redistribution, thus making it difficult to understand the legal constraints for the final system. The file license is specified using a license statement. License statements are snippets of text near the top of a source code or other file that specify the software license under which the file can be used as well as which contributors own copyrights over the file. Such license statements are not static because, projects might update the licenses (version or type) or add contributors. Such changes can have a major impact on a software system, so it is important to understand when they happen during development (with major source code changes vs. independently), how often they happen (rare vs. recurring), and who performs them (experts vs. regular developers). In this thesis, we first propose a meta-model based on previous work and on information gathered from license statements and text. We use the meta-model to find which data must be analysed to study license evolution. Then, we perform a study on the co-evolution of license statements and source code in seven OSS systems: JFreeChart, Jitsi, PHP, Rhino, Tomcat, XalanJ, and XercesJ. Only in a few cases in PHP, license statement and software evolution are carefully planned and managed together just before major releases. In all systems, the developers performing most of the commits, are also the most active license maintainers. Thus, we are able to understand when license statements are changed and we identified the developers that perform these changes. We consider our finding to be preliminary work to permit better control the impact of license change on the system (avoiding the risk of introducing inconsistencies) verifying license changes, using rules based on our meta-model. Indeed, we show that our meta-model could help analyse to detect license issues in studies related to licenses

    HIV interactions with monocytes and dendritic cells: viral latency and reservoirs

    Get PDF
    HIV is a devastating human pathogen that causes serious immunological diseases in humans around the world. The virus is able to remain latent in an infected host for many years, allowing for the long-term survival of the virus and inevitably prolonging the infection process. The location and mechanisms of HIV latency are under investigation and remain important topics in the study of viral pathogenesis. Given that HIV is a blood-borne pathogen, a number of cell types have been proposed to be the sites of latency, including resting memory CD4+ T cells, peripheral blood monocytes, dendritic cells and macrophages in the lymph nodes, and haematopoietic stem cells in the bone marrow. This review updates the latest advances in the study of HIV interactions with monocytes and dendritic cells, and highlights the potential role of these cells as viral reservoirs and the effects of the HIV-host-cell interactions on viral pathogenesis
    corecore