Current software is designed to support a large spectrum of features to meet various users’ needs. However, each end-user only requires a small set of the features to perform
required tasks, rendering the software bloated. The bloated code not only hinders optimal execution, but also leads to a larger attack surface. Code debloating technique, which aims
to remove the unneeded features’ code, has been proposed to reduce the bloated software’s attack surface. However, there is a fundamental gap between the features needed by a user and the implementation, so it is challenging to identify the code that only supports the needed features
Previous works ask end-users to provide a set of sample inputs to demonstrate how they will use the software and generate a debloated version of the software that only contains the code triggered from running the sample inputs. Unfortunately, software debloated by this approach only supports running given inputs, presenting an unusable notion of debloating: if the debloated software only needs to support a fixed set of inputs, the debloating process
is as simple as synthesizing a map from the input to the observed output. We call this Over-debloating Problem. This dissertation focuses on removing software’s unneeded code while providing high robustness for debloated software to run more inputs sharing the same functionalities with the given inputs, with approaches either using heuristics, feature-code
map, or code partitioning. Our evaluation shows that our systems can debloat large-scale and complex software (I.e., Chromium) by removing 25% ~ 70% code and run the debloated software without affecting the original functionality.Ph.D