7,208 research outputs found

    Improving data extraction methods for large molecular biology datasets.

    Get PDF
    In the past, an experiment involving a pair wise comparison normally involved one or a few dependant variables. Now, 1000s of dependent variables can be measured simultaneously in a single experiment, be it detecting genes via a microarray experiment, sequencing genomes, or detecting microbial species based on DNA fragments using molecular techniques. How we analyze such large collections of data will be a major scientific focus over the next decade. Statistical methods that were once acceptable for comparing a few conditions are being revised to handle 1000?s of experiments. Molecular biology techniques that explored 1 gene or species have evolved and are now capable of generating complex datasets requiring new strategies and ways of thinking in order to discover biologically meaningful results. The central theme of this dissertation is to develop strategies that deal with a number of issues that are present in these large scale datasets. In chapter 1, I describe a microarray analytical method that can be applied to low replicate experiments. In chapter?s 2-4, the focus is how to best analyze data from ARISA (a PCR based molecular method for rapidly generating a finger print of microbial diversity). Chapter 2 focuses on qualifying ARISA data so that data will best represent its biological source, prior to further analysis. Chapter 3 focuses on how to best compare ARISA profiles to one another. Chapter 4 focuses on developing a software tool that implements the data processing and clustering strategies from chapter?s 2 and 3. The findings described herein provide the scientific community with improved analytical strategies in both the microarray and ARISA research areas

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Hybrid approaches to optimization and machine learning methods: a systematic literature review

    Get PDF
    Notably, real problems are increasingly complex and require sophisticated models and algorithms capable of quickly dealing with large data sets and finding optimal solutions. However, there is no perfect method or algorithm; all of them have some limitations that can be mitigated or eliminated by combining the skills of different methodologies. In this way, it is expected to develop hybrid algorithms that can take advantage of the potential and particularities of each method (optimization and machine learning) to integrate methodologies and make them more efficient. This paper presents an extensive systematic and bibliometric literature review on hybrid methods involving optimization and machine learning techniques for clustering and classification. It aims to identify the potential of methods and algorithms to overcome the difficulties of one or both methodologies when combined. After the description of optimization and machine learning methods, a numerical overview of the works published since 1970 is presented. Moreover, an in-depth state-of-art review over the last three years is presented. Furthermore, a SWOT analysis of the ten most cited algorithms of the collected database is performed, investigating the strengths and weaknesses of the pure algorithms and detaching the opportunities and threats that have been explored with hybrid methods. Thus, with this investigation, it was possible to highlight the most notable works and discoveries involving hybrid methods in terms of clustering and classification and also point out the difficulties of the pure methods and algorithms that can be strengthened through the inspirations of other methodologies; they are hybrid methods.Open access funding provided by FCT|FCCN (b-on). This work has been supported by FCT— Fundação para a Ciência e Tecnologia within the R &D Units Project Scope: UIDB/00319/2020. Beatriz Flamia Azevedo is supported by FCT Grant Reference SFRH/BD/07427/2021 The authors are grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national funds FCT/ MCTES (PIDDAC) to CeDRI (UIDB/05757/2020 and UIDP/05757/2020) and SusTEC (LA/P/0007/2021).info:eu-repo/semantics/publishedVersio

    Machine Learning Techniques for Credit Card Fraud Detection

    Get PDF
    The term “fraud”, it always concerned about credit card fraud in our minds. And after the significant increase in the transactions of credit card, the fraud of credit card increased extremely in last years. So the fraud detection should include surveillance of the spending attitude for the person/customer to the determination, avoidance, and detection of unwanted behavior. Because the credit card is the most payment predominant way for the online and regular purchasing, the credit card fraud raises highly. The Fraud detection is not only concerned with capturing of the fraudulent practices, but also, discover it as fast as they can, because the fraud costs millions of dollar business loss and it is rising over time, and that affects greatly the worldwide economy. . In this paper we introduce 14 different techniques of how data mining techniques can be successfully combined to obtain a high fraud coverage with a high or low false rate, the Advantage and The Disadvantages of every technique, and The Data Sets used in the researches by researcher

    Roadway Traffic Analysis using Data Mining Techniques for Providing Safety Measures to Avoid Fatal Accidents

    Get PDF
    Roadway traffic safety is a major concern for transportation governing agencies as well as ordinary citizens.Data Mining is taking out of hidden patterns from huge database. It is commonly used in a marketing, surveillance, fraud detection and scientific discovery. In data mining, machine learning is mainly focused as research which is automatically learnt to recognize complex patterns and make intelligent decisions based on data. Globalization has affected many countries. There has been a drastic increase in the economic activities and consumption level, leading to expansion of travel and transportation. The increase in the vehicles, traffic lead to road accidents. Considering the importance of the road safety, government is trying to identify the causes of road accidents to reduce the accidents level. The exponential increase in the accidents data is making it difficult to analyse the constraints causing the road accidents. The paper describes how to mine frequent patterns causing road accidents from collected data set. We find associations among road accidents and predict the type of accidents for existing as well as for new roads. We make use of association and classification rules to discover the patterns between road accidents and as well as predict road accidents for new roads
    corecore