Examples of the Power of Bayesian Black Box Optimization
1. 10 Evaluations of a sample Black Box Function
I realize at this point you’re probably thinking “Why all the background information, just get to the method already!” But the good news is that Gaussian Process Regression is the majority of the Bayesian Optimization method, and once it is understood, the entire method follows shortly after.
Sidenote: I won’t be explaining everything about Gaussian Processes here, because even though they are really cool, you don’t need the entirety to properly understand Bayesian Optimization. However, if you do want to learn more about then, then I recommend watching these helpful videos, or these helpful videos from UBC. If you want some code that uses both constrained and unconstrained Gaussian Processes, as well as some of my initial implementations of Bayesian Optimization, look here.
Suppose you have a black box function, where we’ve already evaluated three points. Better yet, here:
*In Bayesian Optimization we often randomly pick two points to start with.
Earlier, we went over what black box optimization was, some of the inherent problems with it, and some of the conventional methods of black box optimization. In this post, we’ll be starting with Gaussian Processes, a fundamental part of Bayesian Optimization. If you aren’t familiar with black box optimization yet, then I strongly recommend looking at the previous post in this series.
Bayesian Optimization is called Bayesian Optimization because Bayesian methods tend to use Gaussian Distributions. So we have to know what Gaussian Distributions are. If you’ve ever taken an IQ Test, the results are given in this form, also known as a “Bell Curve” or “Normal Distribution”:
My goal here is to provide complete and overarching explanations and implementations of algorithms useful in Machine Learning and Artificial Intelligence Research and Development, but if you don’t care about understanding it, or already understand it, then you can view my (hopefully) well-commented code on the Github page. With that said, let’s begin!
(Credit to Wikipedia)
Black box functions are very much like this picture shows, no different than a normal function, except we don’t know what the function is. So for instance, if we had a function like this:
Then we can look at it and easily know that it’s simply going to raise any input to the power 2, for any inputs. However, if this were a black box function:
We have no idea what the operation(s) performed on the input(s) are, and therefore the function is a black box function.