Deep Dive

Response Time Monitoring: Beyond Up/Down

March 2026 · 6 min read · by the PingSentry team

Most uptime monitoring tells you one thing: is the site up or down? That's useful, but it's an incomplete picture. A page that responds in 8 seconds is technically "up" by any binary measure — but for your users, it might as well be down.

Response time monitoring fills this gap. It tells you not just whether your service is reachable, but how fast it is — and more importantly, how that changes over time.

Why response time matters

The research on page speed and user behavior is consistent and stark:

53% of mobile users abandon a page that takes more than 3 seconds to load (Google)
A 100ms delay in load time can reduce conversion rates by 7% (Akamai)
Google uses page speed as a ranking signal for both desktop and mobile search

But these numbers are about perceived load time from a browser — what response time monitoring measures is a simpler upstream metric: how long does the server take to return a response? This doesn't capture full page load (JavaScript execution, asset loading, rendering), but it's a reliable proxy for server-side health and a leading indicator of problems.

What normal looks like

Here's a rough guide to HTTP response times and what they indicate:

Response time	Status	What it usually means
< 200ms	Excellent	Well-optimized, likely cached or edge-served
200ms – 800ms	Good	Normal for dynamic, database-backed responses
800ms – 2s	Acceptable	Worth investigating; could indicate slow queries
2s – 5s	Degraded	Users will notice; likely a backend performance issue
> 5s	Critical	Functionally broken for most users; investigate immediately

These are rough benchmarks. The right threshold depends on your application. A search API should respond in under 200ms. A complex report generation endpoint might legitimately take 2–3 seconds. The baseline that matters is your own historical average, not a universal standard.

Trends matter more than snapshots

A single response time reading tells you almost nothing. What's valuable is the trend over days and weeks.

A service that was averaging 220ms three months ago and is now averaging 890ms hasn't crossed any threshold — it's still technically "fast" — but something has changed. Maybe your database has grown and queries are slower. Maybe a new feature is making expensive API calls on every request. Maybe a background index needs rebuilding.

This kind of gradual degradation is invisible unless you're watching trends. And it almost always precedes an actual outage: response times creep up, then spike, then the service starts returning errors or timing out entirely.

Think of response time as a leading indicator. An outage is a lagging indicator — it tells you something already went wrong. Rising response times are the warning sign that something is about to go wrong. Catching the trend gives you time to act before users are affected.

What to track

Average response time

The mean response time over a window (last hour, last 24 hours, last 7 days). Useful for trend analysis and SLA reporting. Sensitive to outliers — a handful of very slow requests can pull the average up significantly.

Median (p50) response time

The middle value — 50% of requests are faster, 50% are slower. More robust than mean for understanding typical user experience.

p95 / p99 response time

95th or 99th percentile — 95% (or 99%) of requests are faster than this value. This is the metric that tells you about the tail: the slow requests that affect a small but real percentage of users. High p95 with a good median often points to a specific slow path (a missing database index, an inefficient query on a particular data shape, a cold cache).

Min and max

Useful for spotting anomalies. A max response time of 45 seconds in a window where the average is 300ms means something specific and unusual happened — not general degradation.

Common causes of response time degradation

When you see response times trending up, these are the most common culprits:

Database growth — Queries that were fast on a small dataset become slow as rows accumulate. Missing indexes compound this.
Memory pressure — As application memory fills up, garbage collection pauses increase and the OS starts swapping.
Third-party API latency — If your request handler calls an external service synchronously, that service's latency becomes yours.
Undrained connection pools — Database or HTTP connection pool exhaustion causes requests to queue and wait.
Cold starts — Serverless functions or containers that haven't received traffic recently take longer to spin up.
Cache invalidation — A deploy or cache flush that removes warm cache entries causes a sudden spike while caches rebuild.

Setting up response time alerts

The challenge with response time alerting is avoiding noise. A single slow response is often just natural variance. Here's a practical approach:

Establish your baseline — Run monitors for 1–2 weeks before setting thresholds. Look at your 7-day average and use 3–4x that as your alert threshold.
Alert on sustained degradation, not spikes — Require response time to exceed threshold for 3+ consecutive checks before alerting. This eliminates most transient noise.
Use different thresholds per endpoint — Your homepage and your heaviest data export endpoint will have very different normal response times.
Route response time alerts differently than outage alerts — A slow endpoint needs investigation, not a 2am page. Send these to a Slack channel, not PagerDuty.

Response time data is some of the most actionable infrastructure data you can collect. It tells you when things are getting worse before they're broken, and it gives you the evidence you need to justify optimization work and infrastructure investment.

Track response times over time

PingSentry records response time for every check and shows you trends, so you catch degradation before it becomes an outage.

Start Free — No Credit Card