Used to scale features that are too large or too small, which causes Gradient Descent Algorithm to run slower. One straight-forward way is to divide all the features by the biggest feature.