1,232 research outputs found

    A taxonomic look at instance-based stream classifiers

    Get PDF
    Large numbers of data streams are today generated in many fields. A key challenge when learning from such streams is the problem of concept drift. Many methods, including many prototype methods, have been proposed in recent years to address this problem. This paper presents a refined taxonomy of instance selection and generation methods for the classification of data streams subject to concept drift. The taxonomy allows discrimination among a large number of methods which pre-existing taxonomies for offline instance selection methods did not distinguish. This makes possible a valuable new perspective on experimental results, and provides a framework for discussion of the concepts behind different algorithm-design approaches. We review a selection of modern algorithms for the purpose of illustrating the distinctions made by the taxonomy. We present the results of a numerical experiment which examined the performance of a number of representative methods on both synthetic and real-world data sets with and without concept drift, and discuss the implications for the directions of future research in light of the taxonomy. On the basis of the experimental results, we are able to give recommendations for the experimental evaluation of algorithms which may be proposed in the future.project RPG-2015-188 funded by The Leverhulme Trust, UK, and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731593

    Machine learning assists the classification of reports by citizens on disease-carrying mosquitoes

    Get PDF
    Mosquito Alert (www.mosquitoalert.com/en) is an expert-validated citizen science platform for tracking and controlling disease-carrying mosquitoes. Citizens download a free app and use their phones to send reports of presumed sightings of two world-wide disease vector mosquito species (the Asian Tiger and the Yellow Fever mosquito). These reports are then supervised by a team of entomologists and, once validated, added to a database. As the platform prepares to scale to much larger geographical areas and user bases, the expert validation by entomologists becomes the main bottleneck. In this paper we describe the use of machine learning on the citizen reports to automatically validate a fraction of them, therefore allowing the entomologists either to deal with larger report streams or to concentrate on those that are more strategic, such as reports from new areas (so that early warning protocols are activated) or from areas with high epidemiological risks (so that control actions to reduce mosquito populations are activated). The current prototype flags a third of the reports as “almost certainly positive” with high confidence. It is currently being integrated into the main workflow of the Mosquito Alert platform.Postprint (published version

    A Survey Paper on Software Bug Classification Techniques using Data Mining

    Get PDF
    A Software bug is a blunder, blemish, disappointment or deficiency in a PC project or framework that causes it to deliver an off base or surprising result. At the point when bugs emerge, we need to settle them which is difficult. The greater part of the organizations burn through 40% of expense to settling bugs. The procedure of altering bug will be bug triage or bug collection. Triaging this approaching report physically is blunder inclined and tedious .programming organization pays the greater part of their expense in managing these bugs. In this paper we arranging the bugs with the goal that we can decide the class of the bug at which class that bug is has a place and in the wake of applying the order we can dole out the specific bug to the precise designer for altering them. This is effective. In this paper we are utilizing mix of two grouping strategies, guileless bayes (NB) and k closest neighbor (KNN).In advanced days organization utilizes programmed bug triaging framework yet in Traditional manual Triaging framework is utilized which not effective and setting aside an excess of time .For is triaging the bug we require bug subtle element which is called bug store. In this paper we likewise diminishing the bug dataset in light of the fact that on the off chance that we having more information with unused data which causes issue to relegating bugs. For actualizing this we utilize occasion determination and highlight choice for lessening bug information. This paper portray the entire methodology of bug assignment from beginning to end and finally result will appear on the premise of chart .Graph speaks to the most extreme plausibility of class means at which class the bug will has a place

    Scalable Profiling and Visualization for Characterizing Microbiomes

    Get PDF
    Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information. The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters are linked to changes in microbiome compositions. In this work we propose novel approaches for the creation and interpretation of microbial community profiles. This includes: (a) a cloud-based, distributed computational system that generates detailed community profiles by processing large DNA sequencing datasets against large reference genome collections, (b) the creation of Microbiome Maps: interpretable, high-resolution visualizations of community profiles, and (c) a machine learning framework for characterizing microbiomes from the Microbiome Maps that delivers deep insights into microbial communities. The proposed approaches have been implemented in three software solutions: Flint, a large scale profiling framework for commercial cloud systems that can process millions of DNA sequencing fragments and produces microbial community profiles at a very low cost; Jasper, a novel method for creating Microbiome Maps, which visualizes the abundance profiles based on the Hilbert curve; and Amber, a machine learning framework for characterizing microbiomes using the Microbiome Maps generated by Jasper with high accuracy. Results show that Flint scales well for reference genome collections that are an order of magnitude larger than those used by competing tools, while using less than a minute to profile a million reads on the cloud with 65 commodity processors. Microbiome maps produced by Jasper are compact, scalable representations of extremely complex microbial community profiles with numerous demonstrable advantages, including the ability to display latent relationships that are hard to elicit. Finally, experiments show that by using images as input instead of unstructured tabular input, the carefully engineered software, Amber, can outperform other sophisticated machine learning tools available for classification of microbiomes

    MOLECULAR DIET ANALYSES OF NORTH AMERICAN BATS

    Get PDF
    A food web is a model of the feeding relationships among organisms in an environment. The fidelity of this model is limited principally by the ability to detect these interactions. Researchers who study cryptic interactions such as nocturnal insectivory in bats typically rely on fecal samples to identify trophic connections. Historically these diet analyses were limited to morphological inspection of arthropod fragments, however modern metabarcoding techniques have improved the richness and specificity of consumed prey: rather than bats foraging for a few arthropod orders, we observe hundreds of species among guano samples. Animal metabarcoding is not without bias; nevertheless, a decade of improvements upon such biases have focused largely on molecular portions while bioinformatic considerations remain unresolved. When researchers use distinct software to perform their analyses—tools that have not yet been compared in animal metabarcoding studies—it is unclear if distinct perspectives between two experiments represent meaningful biological differences, or if they arise because of the alternative programs and parameters deployed. We investigated three fundamental bioinformatic tasks that impact a metabarcoding experiment: sequence processing, database construction, and classification (Chapter I). These comparisons offer guidance regarding which steps are most sensitive to parameterization and are therefore in need of optimizing for individual experiments, as well as highlight areas that are in need of critical improvement. We applied these bioinformatic lessons to a molecular diet analysis of Indiana bats, the first ever for this endangered species (Chapter II). While management decisions currently focus on protecting roosting habitat, our molecular analyses provide evidence that site-specific data is needed to more effectively inform conservation practices. For example, while these bats forage a broad swath of the arthropod community, the molecular data suggests they rely on particular aquatic habitats that are not currently protected. Finally, we investigated the diets of New Hampshire bats by collaborating with citizen scientist volunteers throughout the state to perform an extensive sampling regime in that spanned 20 locations over 2015 and 2016, and sequenced more than 900 guano samples (Chapter III). Molecular analysis of these data suggested these bats are foraging hundreds of arthropod species, including some turf and forest pests, demonstrating that our local bats provide ecosystem services. Individual diets varied across season and site, providing evidence of highly flexible and local foraging behaviors
    • …
    corecore