1,547 research outputs found
Revisiting the Sequential Symbolic Regression Genetic Programming
Sequential Symbolic Regression (SSR) is a technique that recursively induces functions over the error of the current solution, concatenating them in an attempt to reduce the error of the resulting model. As proof of concept, the method was previously evaluated in one-dimensional problems and compared with canonical Genetic Programming (GP) and Geometric Semantic Genetic Programming (GSGP). In this paper we revisit SSR exploring the method behaviour in higher dimensional, larger and more heterogeneous datasets. We discuss the difficulties arising from the application of the method to more complex problems, e.g., overfitting, along with suggestions to overcome them. An experimental analysis was conducted comparing SSR to GP and GSGP, showing SSR solutions are smaller than those generated by the GSGP with similar performance and more accurate than those generated by the canonical GP
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning
In the field of quantitative trading, it is common practice to transform raw
historical stock data into indicative signals for the market trend. Such
signals are called alpha factors. Alphas in formula forms are more
interpretable and thus favored by practitioners concerned with risk. In
practice, a set of formulaic alphas is often used together for better modeling
precision, so we need to find synergistic formulaic alpha sets that work well
together. However, most traditional alpha generators mine alphas one by one
separately, overlooking the fact that the alphas would be combined later. In
this paper, we propose a new alpha-mining framework that prioritizes mining a
synergistic set of alphas, i.e., it directly uses the performance of the
downstream combination model to optimize the alpha generator. Our framework
also leverages the strong exploratory capabilities of reinforcement
learning~(RL) to better explore the vast search space of formulaic alphas. The
contribution to the combination models' performance is assigned to be the
return used in the RL process, driving the alpha generator to find better
alphas that improve upon the current set. Experimental evaluations on
real-world stock market data demonstrate both the effectiveness and the
efficiency of our framework for stock trend forecasting. The investment
simulation results show that our framework is able to achieve higher returns
compared to previous approaches.Comment: Accepted by KDD '23, ADS trac
A Framework Based on Symbolic Regression Coupled with eXtended Physics-Informed Neural Networks for Gray-Box Learning of Equations of Motion from Data
We propose a framework and an algorithm to uncover the unknown parts of
nonlinear equations directly from data. The framework is based on eXtended
Physics-Informed Neural Networks (X-PINNs), domain decomposition in space-time,
but we augment the original X-PINN method by imposing flux continuity across
the domain interfaces. The well-known Allen-Cahn equation is used to
demonstrate the approach. The Frobenius matrix norm is used to evaluate the
accuracy of the X-PINN predictions and the results show excellent performance.
In addition, symbolic regression is employed to determine the closed form of
the unknown part of the equation from the data, and the results confirm the
accuracy of the X-PINNs based approach. To test the framework in a situation
resembling real-world data, random noise is added to the datasets to mimic
scenarios such as the presence of thermal noise or instrument errors. The
results show that the framework is stable against significant amount of noise.
As the final part, we determine the minimal amount of data required for
training the neural network. The framework is able to predict the correct form
and coefficients of the underlying dynamical equation when at least 50\% data
is used for training
ReFixar: Multi-version Reasoning for Automated Repair of Regression Errors
Software programs evolve naturally as part of the ever-changing customer needs and fast-paced market. Software evolution, however, often introduces regression bugs, which un-duly break previously working functionalities of the software. To repair regression bugs, one needs to know when and where a bug emerged from, e.g., the bug-inducing code changes, to narrow down the search space. Unfortunately, existing state-of-the-art automated program repair (APR) techniques have not yet fully exploited this information, rendering them less efficient and effective to navigate through a potentially large search space containing many plausible but incorrect solutions. In this work, we revisit APR on repairing regression errors in Java programs. We empirically show that existing state-of-the-art APR techniques do not perform well on regression bugs due to their algorithm design and lack of knowledge on bug inducing changes. We subsequently present ReFixar, a novel repair technique that leverages software evolution history to generate high quality patches for Java regression bugs. The key novelty that empowers ReFixar to more efficiently and effectively traverse the search space is two-fold: (1) A systematic way for multi-version reasoning to capture how a software evolves through its history, and (2) A novel search algorithm over a set of generic repair templates, derived from the principle of incorrectness logic and informed by both past bug fixes and their bug-inducing code changes; this enables ReFixar to achieve a balance of both genericity and specificity, i.e., generic common fix patterns of bugs and their specific contexts. We compare ReFixar against the state-of-the-art APR techniques on a data set of 51 real regression bugs from 28 large real-world programs. Experiments show that ReFixar significantly outperforms the best baseline by a large margin, i.e., ReFixar can fix correctly 24 bugs while the best baseline can only correctly fix 9 bugs
Past Before Future: A Comprehensive Review on Software Defined Networks Road Map
Software Defined Networking (SDN) is a paradigm that moves out the network switch2019;s control plane (routing protocols) from the switch and leaves only the data plane (user traffic) inside the switch. Since the control plane has been decoupled from hardware and given to a logically centralized software application called a controller; network devices become simple packet forwarding devices that can be programmed via open interfaces. The SDN2019;s concepts: decoupled control logic and programmable networks provide a range of benefits for management process and has gained significant attention from both academia and industry. Since the SDN field is growing very fast, it is an active research area. This review paper discusses the state of art in SDN, with a historic perspective of the field by describing the SDN paradigm, architecture and deployments in detail
Discovering Causal Relations and Equations from Data
Physics is a field of science that has traditionally used the scientific
method to answer questions about why natural phenomena occur and to make
testable models that explain the phenomena. Discovering equations, laws and
principles that are invariant, robust and causal explanations of the world has
been fundamental in physical sciences throughout the centuries. Discoveries
emerge from observing the world and, when possible, performing interventional
studies in the system under study. With the advent of big data and the use of
data-driven methods, causal and equation discovery fields have grown and made
progress in computer science, physics, statistics, philosophy, and many applied
fields. All these domains are intertwined and can be used to discover causal
relations, physical laws, and equations from observational data. This paper
reviews the concepts, methods, and relevant works on causal and equation
discovery in the broad field of Physics and outlines the most important
challenges and promising future lines of research. We also provide a taxonomy
for observational causal and equation discovery, point out connections, and
showcase a complete set of case studies in Earth and climate sciences, fluid
dynamics and mechanics, and the neurosciences. This review demonstrates that
discovering fundamental laws and causal relations by observing natural
phenomena is being revolutionised with the efficient exploitation of
observational data, modern machine learning algorithms and the interaction with
domain knowledge. Exciting times are ahead with many challenges and
opportunities to improve our understanding of complex systems.Comment: 137 page
- …