I am a first year PhD student in systems and computer security at UCSC.
In a large computer system, there are many different parameters. A parameter is a setting that affects the computer’s activity in some way. For example, clock rate affects how much computation the central processing unit (CPU) can do in one second. If we raise the clock speed, we can do more in the same amount of time at the penalty of drawing more power. Conversely, if we drop the clock rate, we save power, but need to take more time to complete a computational task. In a large system, there could potentially be thousands of parameters such as clock rate, and changing a parameter can have adverse of beneficial effects on other parameters.
Currently, companies hire domain experts, who are people who have used a system for many years and are experts on what they can do, to manually tune their systems to squeeze every last bit of performance out of their hardware. This is an expensive and time consuming venture, as these experts are few and far between, and even then performance gains are not guaranteed. On a surface level, what these experts do is no different than lowering or raising a series of levers on a mechanical system to see how it responds. Therefore, the issue of performance tuning now presents itself as an optimization issue: what is the optimal set of parameter values, and how can we reach them in a reasonable amount of time?
Machine learning becomes an attractive means of solving such an issue. We use Deep Reinforcement Learning to improve performance in a system. Deep Reinforcement Learning allows the system decide on what parameters to modify in order to improve the overall system’s performance. My research centers on taking machine learning tools, and applying them to the problem of finding that optimal set of parameters. Recently, I have helped create a prototype system that applies machine learning to tuning a large file storage server, and demonstrated that this technique could allow for up to 45% increase in file throughput. In future works, I plan on using the same ideas to improve other aspects of computer systems such as security, speed, and storage use.