TCP Congestion Control – Ultra Deep Explanation from First Principles to Advanced Algorithms
TCP Congestion Control is one of the most important and misunderstood topics in computer networks. It is the reason the Internet works at global scale. Without TCP congestion control, the Internet would collapse under its own traffic.
This post is written as a complete learning resource. If you understand everything here, you will not need any other article or video.
Part 1: First Principles – What Is Congestion Really?
Congestion happens when the amount of data injected into the network exceeds what the network can forward.
Key insight:
- Congestion is NOT caused by one sender
- Congestion is an emergent network behavior
- Routers have finite buffers
- Links have finite bandwidth
When buffers fill up:
- Packets get queued
- Delay increases
- Eventually packets are dropped
Packet loss is not the problem — it is the signal.
Part 2: Why Congestion Control Is TCP’s Responsibility
Routers do not coordinate traffic. Each sender must regulate itself.
If TCP sent data as fast as possible:
- Routers would drop packets
- Senders would retransmit
- Retransmissions would create more congestion
- The network would enter congestion collapse
TCP congestion control exists to prevent this collapse.
Part 3: Flow Control vs Congestion Control (Critical Distinction)
| Flow Control | Congestion Control |
|---|---|
| Protects the receiver | Protects the network |
| Uses receiver window (rwnd) | Uses congestion window (cwnd) |
| Local decision | Global effect |
TCP sends data based on:
min(rwnd, cwnd)
Part 4: Core Variables That Control TCP Behavior
1️⃣ Congestion Window (CWND)
CWND limits how much unacknowledged data TCP can have in the network.
- Measured in MSS (Maximum Segment Size)
- Dynamic and adaptive
Think of CWND as TCP’s belief about available bandwidth.
2️⃣ Slow Start Threshold (ssthresh)
ssthresh divides TCP behavior into two modes:
- Aggressive probing
- Cautious probing
CWND < ssthresh → Slow Start CWND ≥ ssthresh → Congestion Avoidance
Part 5: Slow Start – Why It Is Fast but Named Slow
Purpose of Slow Start
At connection start, TCP knows nothing about network capacity. Starting fast would be reckless.
Slow Start solves this by:
- Starting conservatively
- Increasing rapidly if no congestion is detected
Slow Start Mechanics (Deep)
Initial CWND is small (often 1 MSS).
For every ACK received:
- CWND increases by 1 MSS
Because each RTT delivers multiple ACKs, CWND doubles every RTT.
This is exponential growth.
Why Slow Start Must End
Exponential growth is unstable. Eventually it will overshoot network capacity.
Slow Start ends when:
- CWND reaches ssthresh
- Packet loss occurs
Part 6: Congestion Avoidance – Stability over Speed
Why TCP Switches Behavior
Once TCP approaches network capacity, aggressive growth would cause loss.
Congestion Avoidance introduces caution.
Additive Increase Explained Intuitively
In Congestion Avoidance:
- CWND increases by ~1 MSS per RTT
- Growth is linear
This allows TCP to gently probe for extra bandwidth.
Part 7: Detecting Congestion – How TCP Knows Something Is Wrong
TCP has no explicit congestion signals. It infers congestion indirectly.
Two Signals
- Timeout → severe congestion
- 3 Duplicate ACKs → mild congestion
Duplicate ACKs mean packets are still flowing — only one is missing.
Part 8: Fast Retransmit – Acting Before Timeout
When TCP receives 3 duplicate ACKs:
- It assumes a packet was lost
- Retransmits immediately
This avoids waiting for a long timeout.
Part 9: Fast Recovery – Do Not Start from Zero
Old TCP versions restarted Slow Start after loss. This was inefficient.
Fast Recovery improves this:
- ssthresh = cwnd / 2
- cwnd reduced but not reset
- TCP continues in Congestion Avoidance
This assumes the network is congested but still usable.
Part 10: AIMD – The Mathematical Heart of TCP
Additive Increase
Slow, steady increase ensures fairness.
Multiplicative Decrease
Sharp reduction prevents collapse.
Together they create the famous sawtooth pattern.
Part 11: Why AIMD Guarantees Fairness
If two TCP flows share a link:
- Both increase equally
- Both decrease proportionally
Over time, they converge to equal bandwidth.
This is a rare example of fairness emerging without coordination.
Part 12: Reading the TCP Congestion Graph Like a Pro
- Steep curve → Slow Start
- Straight line → Congestion Avoidance
- Sudden drop → Multiplicative Decrease
If you can explain this graph verbally, you fully understand TCP congestion control.
Part 13: Real-World Importance
- Internet stability
- Cloud performance
- Video streaming quality
- Distributed systems reliability
Every large-scale system depends on TCP behaving correctly.
Part 14: Interview Gold – Questions That Test Real Understanding
- Why does TCP halve CWND?
- Why exponential then linear growth?
- Why are duplicate ACKs better than timeouts?
- How does AIMD ensure fairness?
Final Mental Model (Remember This)
TCP is not trying to be fast. TCP is trying to be:
- Fair
- Stable
- Adaptive
Speed is a consequence, not the goal.
If you truly understand TCP congestion control, you understand the Internet itself 🚀
