147,017 research outputs found

    Unsupervised machine learning application to perform a systematic review and meta-analysis in medical research.

    Get PDF
    When trying to synthesize information from multiple sources and perform a statistical review to compare them, particularly in the medical research field, several statistical tools are available, most common are the systematic review and the meta-analysis. These techniques allow the comparison of the effectiveness or success among a group of studies. However, a problem of these tools is that if the information to be compared is incomplete or mismatched between two or more studies, the comparison becomes an arduous task. On a parallel line, machine learning methodologies have been proven to be a reliable resource, such software is developed to classify several variables and learn from previous experiences to improve the classification. In this paper, we use unsupervised machine learning methodologies to describe a simple yet effective algorithm that, given a dataset with missing data, completes such data, which leads to a more complete systematic review and meta-analysis, capable of presenting a final effectiveness or success rating between studies. Our method is first validated in a movie ranking database scenario, and then used in a real life systematic review and meta-analysis of obesity prevention scientific papers, where 66.6% of the outcomes are missing

    DETECTING APPLICATION ANOMALIES: MACHINE LEARNING APPROACH

    Get PDF
    In the modern era, world has completely relied on software technology. As software applications became highly demanded, security concerns have arrived. Application security has become one of the chief concerns where companies have to protect their systems from vulnerabilities. Various other securities include mobile or end-point security, operation system security and network security. All these security categories are intended to protect their users and clients from the malicious intents and hackers. Application security became a prime requirement. Security risks of the applications are enveloped and lead to direct threat to the available business. All the application vulnerabilities take the advantage to compromise the software application security. Once a flaw is been found and private data access is determined, attacker will have capability to exploit the software application vulnerability to facilitate cyber crimes. The confidentiality of the data, availability and integrity of resources are targeted by the cyber crimes (“What is Application Security?” 2019). Overall, more than 13% of the reviewed sites were compromised with the web application security vulnerabilities and they are not completely extinct even with the traditional security methodologies (Application Security Vulnerability, 2014). In order to resolve these numerous common security issues, few of the detection, remediation and prevention techniques are to be used which includes defensive programming, sophisticated input validation, dynamic checks, and static source code analysis. In this paper, runtime environment framework is been introduced. This research study extracted few publications. All the publications considered various approaches to resolve the issue. In this research paper framework, machine learning is utilized to train and predict the output. Firstly, a sample java code is executed in various CPU cores and the generated output files are collected. These output files are then used to train machine learning. Machine learning results are then compared with actual output for decision statement

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    室内植物表型平台及性状鉴定研究进展和展望

    Get PDF
    Plant phenomics is under rapid development in recent years, a research field that is progressing towards integration, scalability, multi-perceptivity and high-throughput analysis. Through combining remote sensing, Internet of Things (IoT), robotics, computer vision, and artificial intelligence techniques such as machine learning and deep learning, relevant research methodologies, biological applications and theoretical foundation of this research domain have been advancing speedily in recent years. This article first introduces the current trends of plant phenomics and its related progress in China and worldwide. Then, it focuses on discussing the characteristics of indoor phenotyping and phenotypic traits that are suitable for indoor experiments, including yield, quality, and stress related traits such as drought, cold and heat resistance, salt stress, heavy metals, and pests. By connecting key phenotypic traits with important biological questions in yield production, crop quality and Stress-related tolerance, we associated indoor phenotyping hardware with relevant biological applications and their plant model systems, for which a range of indoor phenotyping devices and platforms are listed and categorised according to their throughput, sensor integration, platform size, and applications. Additionally, this article introduces existing data management solutions and analysis software packages that are representative for phenotypic analysis. For example, ISA-Tab and MIAPPE ontology standards for capturing metadata in plant phenotyping experiments, PHIS and CropSight for managing complicated datasets, and Python or MATLAB programming languages for automated image analysis based on libraries such as OpenCV, Scikit-Image, MATLAB Image Processing Toolbox. Finally, due to the importance of extracting meaningful information from big phenotyping datasets, this article pays extra attention to the future development of plant phenomics in China, with suggestions and recommendations for the integration of multi-scale phenotyping data to increase confidence in research outcomes, the cultivation of cross-disciplinary researchers to lead the next-generation plant research, as well as the collaboration between academia and industry to enable world-leading research activities in the near future

    Development of portable air quality sensor network based on IoT devices

    Get PDF
    Air pollution has been one of the major agendas around the globe in recent years. With rising awareness among all citizens, it's of extraordinary importance to measure data related to air pollution in order to have interested parties making informative decisions. Network composed of IoT devices has been one of the tools researchers relied on. Recent years saw rapid progress in sensor technology, which in turn flourished the market for low-cost sensors, giving citizens opportunities to measure various physical properties with affordable and portable sensors. Countless organizations have deployed wireless sensor networks (WSN) involving the usage of IoT devices and budget-friendly sensing hardware. Statistical Analysis of Networks and Systems (SANS) is a research group of the Computer Architecture Department at the Polytechnic University of Catalonia, which has launched several campaigns using WSN composed of Captor devices. Researchers then use relevant machine learning techniques to provide more meaningful information out of otherwise flawed data. A platform using a technology stack composed of Captor devices and machine learning techniques has gone through several stages and is still in the progress of improving itself. This thesis discusses the latest iteration of such a platform, by means of introducing characteristics of hardware, software, as well as machine learning methodologies used. By overviewing and comparing the older iterations of Captor and similar platforms used by other researchers, this thesis hopes to serve as a reference outlook into the current and future development of WSN (with focus on air-quality), where innovations are constantly needed to improve its capabilities. The result of the thesis is an autonomous IoT device, Captor4b, that is self-sufficient for at least 1 month and half where the autonomy can be further tweaked by adjusting duty cycle of Raspberry Pi and Arduino Nano separately from a software perspective

    A foundation for reliable spatial proteomics data analysis.

    Get PDF
    Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis..G., C.M.M., and M.F. were supported by the European Union 7th Framework Program (PRIME-XS Project, Grant No. 262067). L.M.B. was supported by a BBSRC Tools and Resources Development Fund (Award No. BB/K00137X/1). T.B. was supported by the Proteomics French Infrastructure (ProFI, ANR-10-INBS-08). A.C. was supported by BBSRC Grant No. BB/D526088/1. A.J.G. was supported by BBSRC Grant No. BB/E024777/ and a generous gift from King Abdullah University for Science and Technology, Saudi Arabia. D.J.N.H. was supported by a BBSRC CASE studentship (BB/I016147/1)

    Using a Machine Learning Approach to Implement and Evaluate Product Line Features

    Get PDF
    Bike-sharing systems are a means of smart transportation in urban environments with the benefit of a positive impact on urban mobility. In this paper we are interested in studying and modeling the behavior of features that permit the end user to access, with her/his web browser, the status of the Bike-Sharing system. In particular, we address features able to make a prediction on the system state. We propose to use a machine learning approach to analyze usage patterns and learn computational models of such features from logs of system usage. On the one hand, machine learning methodologies provide a powerful and general means to implement a wide choice of predictive features. On the other hand, trained machine learning models are provided with a measure of predictive performance that can be used as a metric to assess the cost-performance trade-off of the feature. This provides a principled way to assess the runtime behavior of different components before putting them into operation.Comment: In Proceedings WWV 2015, arXiv:1508.0338

    Data clustering procedures: a general review

    Get PDF
    In the age of data science, the clustering of various types of objects (e.g., documents, genes, customers) has become a key activity and many high-quality computer implementations are provided for this purpose by many general software packages. Clustering consists of grouping a set of objects in such a way that objects which are similar to one another according to some metric belong to the same group, named a cluster. It is one of the most valuable and used tasks of exploratory data mining and can be applied to a wide variety of fields. Research on the problem of clustering tends to be fragmented across pattern recognition, database, data mining, and machine learning communities. This work discusses the common techniques that are used in cluster analysis. These methodologies will be applied to data analysis in the framework of polymer processing.A. Manuela Gonçalves was partially financed by Portuguese Funds through FCT (Fundação para a Ciência e a Tecnologia) within the Projects UIDB/00013/2020 and UIDP/00013/2020 of CMAT-UMThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant agreement No. 734205 – H2020-MSCA-RISE-2016
    • …
    corecore