26,456 research outputs found
Automating biomedical data science through tree-based pipeline optimization
Over the past decade, data science and machine learning has grown from a
mysterious art form to a staple tool across a variety of fields in academia,
business, and government. In this paper, we introduce the concept of tree-based
pipeline optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement a Tree-based Pipeline Optimization
Tool (TPOT) and demonstrate its effectiveness on a series of simulated and
real-world genetic data sets. In particular, we show that TPOT can build
machine learning pipelines that achieve competitive classification accuracy and
discover novel pipeline operators---such as synthetic feature
constructors---that significantly improve classification accuracy on these data
sets. We also highlight the current challenges to pipeline optimization, such
as the tendency to produce pipelines that overfit the data, and suggest future
research paths to overcome these challenges. As such, this work represents an
early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding
Proof-of-Concept Application - Annual Report Year 2
This document first gives an introduction to Application Layer Networks and subsequently presents the catallactic resource allocation model and its integration into the middleware architecture of the developed prototype. Furthermore use cases for employed service models in such scenarios are presented as general application scenarios as well as two very detailed cases: Query services and Data Mining services. This work concludes by describing the middleware implementation and evaluation as well as future work in this area. --Grid Computing
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
ANTIDS: Self-Organized Ant-based Clustering Model for Intrusion Detection System
Security of computers and the networks that connect them is increasingly
becoming of great significance. Computer security is defined as the protection
of computing systems against threats to confidentiality, integrity, and
availability. There are two types of intruders: the external intruders who are
unauthorized users of the machines they attack, and internal intruders, who
have permission to access the system with some restrictions. Due to the fact
that it is more and more improbable to a system administrator to recognize and
manually intervene to stop an attack, there is an increasing recognition that
ID systems should have a lot to earn on following its basic principles on the
behavior of complex natural systems, namely in what refers to
self-organization, allowing for a real distributed and collective perception of
this phenomena. With that aim in mind, the present work presents a
self-organized ant colony based intrusion detection system (ANTIDS) to detect
intrusions in a network infrastructure. The performance is compared among
conventional soft computing paradigms like Decision Trees, Support Vector
Machines and Linear Genetic Programming to model fast, online and efficient
intrusion detection systems.Comment: 13 pages, 3 figures, Swarm Intelligence and Patterns (SIP)- special
track at WSTST 2005, Muroran, JAPA
- …