Nginx常见日志分析
1. status是502,服务Sentry日志未上报
网关日志内容
1
2
3
{ "@timestamp": "2025-07-11T03:00:41+00:00", "remote_addr": "10.141.178.206", "x-forward-for": "10.141.178.206", "request_id": "d9056f532aae93ac10d8a28295e02ca3", "remote_user": "", "bytes_sent": 349, "request_time": 50.187, "status": 502, "vhost": "sg-api-jiyin.hannto.com", "request_proto": "HTTP/1.1", "path": "/v1/c/res/gen/ks3_upload_url/","args": "bucket_name=hannto-jiyin-photo&model=perilla&key_name=xx.png", "request_query": "bucket_name=xx-xx-photo&model=xxx&key_name=xxx.png", "request_length": 799, "duration": 50.187, "method": "GET", "http_referrer": "", "http_user_agent": "Dart/3.4 (dart:io)", "latency": "33.110, 16.013, 1.064", "http_log_request_id": "", "app_id": "", "proxy_upstream_name": "srv-core-pro-srv-core-uwsgi-80", "upstream": "10.141.176.8:8000, 10.141.176.105:8000, 10.141.176.221:8000", "upstream_status": "502, 502, 502", "upstream_connect_time": "0.000, 0.000, 0.000", "upstream_header_time": "-, -, -", "upstream_response_time": "33.110, 16.013, 1.064", "upstream_response_length": "0, 0, 0", "upstream_cache_status": "", "cluster": "xx-xx"}
2025/07/11 03:00:41 [error] 3406#3406: *23406057 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.141.178.206, server: sg-api-xx.xx.com, request: "GET /v1/c/res/gen/ks3_upload_url/?bucket_name=xx-xx-photo&model=xx&key_name=xx.png HTTP/1.1", upstream: "http://10.141.176.221:8000/v1/c/res/gen/ks3_upload_url/?bucket_name=xx-xx-photo&model=xx&key_name=xx.png", host: "sg-api-xx.xx.com"
请求链路分析
- 请求初始部分
1
2025/07/11 03:00:40 [error] recv() failed (104: Connection reset by peer)
表示:NGINX 在等待后端返回响应头时,被对端(后端)主动断开了连接。
- 请求元数据
1
2
3
4
5
"status": 502,
"request_time": 50.187,
"latency": "33.110, 16.013, 1.064",
"upstream": "10.141.176.8:8000, 10.141.176.105:8000, 10.141.176.221:8000",
"upstream_status": "502, 502, 502"
字段 | 含义 |
---|---|
request_time |
请求总耗时,50.187 秒 |
latency |
upstream 每次尝试耗时(共3次) 第3次失败于 33.1s,第2次失败于 16s,第1次失败于 1s |
upstream_status |
所有尝试的 upstream 都返回了 502(即网关内部失败) |
recv() failed (104) |
说明 NGINX 在等后端响应时,被对方主动关闭了连接(一般是程序崩了、OOM、GC卡顿、超时等) |
为什么Sentry没有上报
-
请求还未到你的应用逻辑中(e.g. Flask、FastAPI、Django 的处理函数未执行),就被中间件层断开了。
-
应用可能刚开始接收连接,还没完全初始化完,就被 NGINX 断了。
-
后端服务返回的是 TCP 层断开,不是 HTTP 500 或异常堆栈,Sentry 没法捕获。
根本原因可能有哪些?
类别 | 可能性 | 原因解释 |
---|---|---|
🧠 应用卡死 | ✅ 常见 | 程序逻辑中存在卡顿、死循环、慢 Redis 写、慢外部 API |
🔥 OOM | ✅ 需检查 | 程序直接被 kill,连接就会被 RST(reset) |
🧱 超时 | ✅ | 程序超时,没来得及处理完 |
💥 服务重启 | ✅ | 正在 deploy 或滚动更新中,请求被打断 |
📶 网络中断 | ❓ 可能性较低 | 网卡问题或节点间链路问题 |
🧊 GC 停顿 | ✅ Java / Python 都可能 | 导致无法及时响应,连接被 reset |
本文由作者按照
CC BY 4.0
进行授权