Gradient Descent finds the best w and b that minimizes the cost function. Remember that Gradient Vector points to max increase. Gradient descent initializes itself a point in the surface, and takes steps in the steepest downhill direction.