What gradient descent means
Gradient descent is the optimization algorithm that most machine learning models use to learn. A model has thousands or billions of internal numbers, called weights, and the right values for them are not known in advance. Gradient descent finds good values by repeatedly measuring how wrong the model is and then adjusting the weights a little in the direction that reduces the error.
The "error" is measured by a loss function, a single number that says how far the model's predictions are from the correct answers. The "gradient" is the slope of that error: it tells you which way to change each weight to make the loss smaller. "Descent" means you keep walking downhill on that slope until the error is as low as you can get it.
In plain words
Imagine you are blindfolded on a hillside and want to reach the lowest point. You feel the ground around your feet, notice which way slopes down most steeply, and take a step that way. Then you feel again and step again. Gradient descent does exactly this with a model's error: feel the slope, step downhill, repeat, until you are at the bottom of the valley.
Why it matters
- It is how learning actually happens. Training a neural network, fine-tuning an LLM, or fitting a simple regression all come down to running gradient descent.
- The step size is a real decision. The learning rate controls how big each step is. Too big and the model overshoots the valley; too small and training takes forever.
- It scales. Variants like stochastic gradient descent look at small batches of data at a time, which is what makes training on huge datasets practical.
Common pitfalls
- Learning rate set wrong. This is the most common cause of training that diverges or crawls. It is usually the first thing to tune.
- Getting stuck in a local minimum. The algorithm finds a low point, but not the lowest one. In practice, momentum and modern optimizers like Adam help the model escape shallow dips.
- Bad or unscaled data. If features are on wildly different scales, the slope points in misleading directions and training struggles. Normalize your inputs first.
Related articles:
- What is a neural network? - The structure that gradient descent actually trains.
- Machine learning vs deep learning - Where this kind of learning fits in the bigger picture.
- What is fine-tuning? - Adapting an existing model, which also runs on gradient descent.
Want to stay one step ahead?
Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.
