文章

Nginx常见日志分析

1. status是502,服务Sentry日志未上报

网关日志内容

1
2
3
{ "@timestamp": "2025-07-11T03:00:41+00:00", "remote_addr": "10.141.178.206", "x-forward-for": "10.141.178.206", "request_id": "d9056f532aae93ac10d8a28295e02ca3", "remote_user": "", "bytes_sent": 349, "request_time": 50.187, "status": 502, "vhost": "sg-api-jiyin.hannto.com", "request_proto": "HTTP/1.1", "path": "/v1/c/res/gen/ks3_upload_url/","args": "bucket_name=hannto-jiyin-photo&model=perilla&key_name=xx.png", "request_query": "bucket_name=xx-xx-photo&model=xxx&key_name=xxx.png", "request_length": 799, "duration": 50.187, "method": "GET", "http_referrer": "", "http_user_agent": "Dart/3.4 (dart:io)", "latency": "33.110, 16.013, 1.064", "http_log_request_id": "", "app_id": "", "proxy_upstream_name": "srv-core-pro-srv-core-uwsgi-80", "upstream": "10.141.176.8:8000, 10.141.176.105:8000, 10.141.176.221:8000", "upstream_status": "502, 502, 502", "upstream_connect_time": "0.000, 0.000, 0.000", "upstream_header_time": "-, -, -", "upstream_response_time": "33.110, 16.013, 1.064", "upstream_response_length": "0, 0, 0", "upstream_cache_status": "", "cluster": "xx-xx"}

2025/07/11 03:00:41 [error] 3406#3406: *23406057 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.141.178.206, server: sg-api-xx.xx.com, request: "GET /v1/c/res/gen/ks3_upload_url/?bucket_name=xx-xx-photo&model=xx&key_name=xx.png HTTP/1.1", upstream: "http://10.141.176.221:8000/v1/c/res/gen/ks3_upload_url/?bucket_name=xx-xx-photo&model=xx&key_name=xx.png", host: "sg-api-xx.xx.com"

请求链路分析

  1. 请求初始部分
    1
    
    2025/07/11 03:00:40 [error] recv() failed (104: Connection reset by peer)
    

    表示:NGINX 在等待后端返回响应头时,被对端(后端)主动断开了连接。

  2. 请求元数据
1
2
3
4
5
"status": 502,
"request_time": 50.187,
"latency": "33.110, 16.013, 1.064",
"upstream": "10.141.176.8:8000, 10.141.176.105:8000, 10.141.176.221:8000",
"upstream_status": "502, 502, 502"
字段 含义
request_time 请求总耗时,50.187 秒
latency upstream 每次尝试耗时(共3次)
第3次失败于 33.1s,第2次失败于 16s,第1次失败于 1s
upstream_status 所有尝试的 upstream 都返回了 502(即网关内部失败)
recv() failed (104) 说明 NGINX 在等后端响应时,被对方主动关闭了连接(一般是程序崩了、OOM、GC卡顿、超时等)

为什么Sentry没有上报

  • 请求还未到你的应用逻辑中(e.g. Flask、FastAPI、Django 的处理函数未执行),就被中间件层断开了。

  • 应用可能刚开始接收连接,还没完全初始化完,就被 NGINX 断了。

  • 后端服务返回的是 TCP 层断开,不是 HTTP 500 或异常堆栈,Sentry 没法捕获。

根本原因可能有哪些?

类别 可能性 原因解释
🧠 应用卡死 ✅ 常见 程序逻辑中存在卡顿、死循环、慢 Redis 写、慢外部 API
🔥 OOM ✅ 需检查 程序直接被 kill,连接就会被 RST(reset)
🧱 超时 程序超时,没来得及处理完
💥 服务重启 正在 deploy 或滚动更新中,请求被打断
📶 网络中断 ❓ 可能性较低 网卡问题或节点间链路问题
🧊 GC 停顿 ✅ Java / Python 都可能 导致无法及时响应,连接被 reset
本文由作者按照 CC BY 4.0 进行授权