Published on

Go Context timeouts can be harmful

Still using Jaeger/Sentry? Uptrace is an open source distributed tracing tool that uses OpenTelemetry to monitor performance, errors, and logs.

You probably should avoid ctx.WithTimeout or ctx.WithDeadline with code that makes network calls. Here is why.


Using context for cancellation

Typically, context.Context is used to cancel operations like this:

package main

import (
	"context"
	"fmt"
	"time"
)

func main() {
	ctx := context.Background()
	ctx, cancel := context.WithTimeout(ctx, 3*time.Second)
	defer cancel()

	select {
	case <-ctx.Done():
		fmt.Println(ctx.Err())
		fmt.Println("cancelling...")
	}
}

Later, you can use such context with, for example, Redis client:

import "github.com/go-redis/redis/v8"

rdb := redis.NewClient(...)

ctx := context.Background()
ctx, cancel := context.WithTimeout(ctx, 3*time.Second)
defer cancel()

val, err := rdb.Get(ctx, "redis-key").Result()

At first glance, the code above works fine. But what happens when rdb.Get operation exceeds 1 second timeout?

Context deadline exceeded

When context is cancelled, go-redis and most other database clients (including database/sql) must do the following:

  1. Close the connection, because it can't be safely reused.
  2. Open a new connection.
  3. Perform TLS handshake using the new connection.
  4. Optionally, pass some authentication checks, for example, using Redis AUTH command.

Effectively, your application does not use the connection pool any more which makes each operation slower and greatly increases chances of exceeding the timeout again. The result can be disastrous.

Technically, this problem is not caused by context.Context and using small deadlines with net.Conn will have the same problem. But context.Context imposes a single timeout on all operations so each individual operation has a random timeout which depends on timings of previous operations.

What to do instead?

Your first option is to use fixed net.Conn deadlines:

var cn net.Conn
cn.SetDeadline(time.Now().Add(3 * time.Second))

With go-redis, you can use ReadTimeout and WriteTimeout options which control net.Conn deadlines:

rdb := redis.NewClient(&redis.Options{
    ReadTimeout:  3 * time.Second,
    WriteTimeout: 3 * time.Second,
})

Alternatively, you can also use a separate context timeout for each operation:

ctx := context.Background()
op1(ctx.WithTimeout(ctx, time.Second))
op2(ctx.WithTimeout(ctx, time.Second))

And avoid timeouts smaller than 1 second, because they have the same problem. If you must deliver a SLA, you can generate a response in time but let the operation to continue in background:

func handler(w http.ResponseWriter, req *http.Request) {
	// Process asynchronously in a goroutine.
	ch := process(req)

	select {
	case res := <-ch:
		// success
	case <-time.After(time.Second):
		// unknown result
	}
}