systemctl 启动 openresty 失败的原因排查

近日, 为了改善 API 日益恶化的调用鸭梨,尝试做应用层的负载均衡,来调度读写流量和内外部流量。需用用到 OpenResty。由于这次的平台是 CentOS7,已经采用了 systemctl 来作为服务的管理工具,并且这次是自己下载源码构建安装,碰到一些问题,记录下来。

首先呢,安装 OpenResty 的过程是波澜不惊的,按照官网的新手教程,一路顺利构建完成。

下载最新版本,

$ cd ~/ $ wget https://openresty.org/download/openresty-1.11.2.1.tar.gz

解压,

$ tar -zxvf openresty-1.11.2.1.tar.gz

安装必要依赖,

$ sudo yum install readline-devel pcre-devel openssl-devel gcc

配置,

$ cd openresty-1.11.2.1 $ ./configure –prefix=/usr/local/openresty \ –with-pcre-jit \ –with-ipv6 \ –without-http_redis2_module \ –with-http_iconv_module \ -j2

编译 & 安装

$ gmake $ gmake install

openresty 准备相应用户与用户组,

$ sudo useradd -d /var/lib/nginx nginx -s /sbin/nologin $ sudo groupadd www-data $ sudo usermod -aG www-data nginx

修改 openresty 安装目录下日志与 html 文件夹权限,

$ sudo chown -R nginx:www-data /usr/local/openresty/nginx/{logs,html}

编写 openresty.service 文件,

$ cat /usr/lib/systemd/system/openresty.service [Unit] Description=OpenResty is a dynamic web platform based on NGINX and LuaJIT. Documentation=http://openresty.org/en/ After=network.target remote-fs.target nss-lookup.target [Service] Type=forking PIDFile=/run/openresty.pid ExecStartPre=/usr/bin/rm -f /run/openresty.pid ExecStartPre=/usr/local/openresty/nginx/sbin/nginx -t -c /usr/local/openresty/nginx/conf/nginx.conf ExecStart=/usr/local/openresty/nginx/sbin/nginx -c /usr/local/openresty/nginx/conf/nginx.conf ExecReload=/bin/kill -s HUP $MAINPID ExecStop=/bin/kill -s QUIT $MAINPID KillSignal=SIGQUIT TimeoutStartSec=10 TimeoutStopSec=5 KillMode=process PrivateTmp=true Restart=on-failure RestartSec=30s [Install] WantedBy=multi-user.target

这个文件部分参考了在 CentOS 7 通过 yum 安装的 nginx 的 service 文件,同时加入了进程失败自动重启的部分。

到这里,如果一切顺利的话,就没什么事收工了,那我还写这些干啥?

启动一下服务看看,

$ sudo systemctl start openresty.service

经过漫长的一阵等待,得到这些消息,

$ sudo systemctl start openresty.service Job for openresty.service failed because a timeout was exceeded. See “systemctl status openresty.service” and “journalctl -xe” for details.

呐尼,一个 timeout 的错误终结了启动。按照提示看看有什么错误信息,

$ sudo systemctl status openresty.service ● openresty.service – OpenResty is a dynamic web platform based on NGINX and LuaJIT. Loaded: loaded (/usr/lib/systemd/system/openresty.service; disabled; vendor preset: disabled) Active: failed (Result: timeout) since Sat 2016-09-24 15:32:06 CST; 1min 10s ago Docs: http://openresty.org/en/ Process: 25521 ExecStart=/usr/local/openresty/nginx/sbin/nginx -c /usr/local/openresty/nginx/conf/nginx.conf (code=exited, status=0/SUCCESS) Process: 25516 ExecStartPre=/usr/local/openresty/nginx/sbin/nginx -t -c /usr/local/openresty/nginx/conf/nginx.conf (code=exited, status=0/SUCCESS) Process: 25513 ExecStartPre=/usr/bin/rm -f /run/openresty.pid (code=exited, status=0/SUCCESS) Main PID: 23647 (code=exited, status=0/SUCCESS) Sep 24 15:30:36 foo systemd[1]: Starting OpenResty is a dynamic web platform based on NGINX and LuaJIT…. Sep 24 15:30:36 foo nginx[25516]: nginx: the configuration file /usr/local/openresty/nginx/conf/nginx.conf syntax is ok Sep 24 15:30:36 foo nginx[25516]: nginx: configuration file /usr/local/openresty/nginx/conf/nginx.conf test is successful Sep 24 15:30:36 foo systemd[1]: PID file /run/openresty.pid not readable (yet?) after start. Sep 24 15:32:06 foo systemd[1]: openresty.service start operation timed out. Terminating. Sep 24 15:32:06 foo systemd[1]: Failed to start OpenResty is a dynamic web platform based on NGINX and LuaJIT.. Sep 24 15:32:06 foo systemd[1]: Unit openresty.service entered failed state. Sep 24 15:32:06 foo systemd[1]: openresty.service failed.

一头雾水,再看看 journalctl -xe,

$ sudo journalctl -xe — Unit user-0.slice has begun shutting down. Sep 24 14:35:19 foo systemd[1]: openresty.service start operation timed out. Terminating. Sep 24 14:35:19 foo systemd[1]: Failed to start OpenResty is a dynamic web platform based on NGINX and LuaJIT.. — Subject: Unit openresty.service has failed — Defined-By: systemd — Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel — — Unit openresty.service has failed. — — The result is failed. Sep 24 14:35:19 foo systemd[1]: Unit openresty.service entered failed state. Sep 24 14:35:19 foo systemd[1]: openresty.service failed. Sep 24 14:35:19 foo polkitd[1080]: Unregistered Authentication Agent for unix-process:23722:139633287 (system bus name :1.52565, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus lines 4854-4904/4904 (END)

按照上面的信息,问了问 Google,似乎并没有什么有用的信息。

失望之余,看看 opensresty 自己的错误日志,可惜什么都没有。又顺手看了一下 /var/log/messages,看看能否找到点什么。

$ tail -30 /var/log/messages|less Sep 24 15:35:04 foo systemd: Cannot add dependency job for unit firewalld.service, ignoring: Unit firewalld.service is masked. Sep 24 15:35:04 foo systemd: Starting OpenResty is a dynamic web platform based on NGINX and LuaJIT…. Sep 24 15:35:04 foo nginx: nginx: the configuration file /usr/local/openresty/nginx/conf/nginx.conf syntax is ok Sep 24 15:35:04 foo nginx: nginx: configuration file /usr/local/openresty/nginx/conf/nginx.conf test is successful Sep 24 15:35:04 foo systemd: PID file /run/openresty.pid not readable (yet?) after start. Sep 24 15:35:12 foo systemd: Removed slice user-0.slice. Sep 24 15:35:12 foo systemd: Stopping user-0.slice. Sep 24 15:36:01 foo systemd: Created slice user-0.slice. Sep 24 15:36:01 foo systemd: Starting user-0.slice. Sep 24 15:36:01 foo systemd: Started Session 26276 of user root. Sep 24 15:36:01 foo systemd: Starting Session 26276 of user root. Sep 24 15:36:09 foo systemd: Removed slice user-0.slice. Sep 24 15:36:09 foo systemd: Stopping user-0.slice. Sep 24 15:36:34 foo systemd: openresty.service start operation timed out. Terminating. Sep 24 15:36:34 foo systemd: Failed to start OpenResty is a dynamic web platform based on NGINX and LuaJIT.. Sep 24 15:36:34 foo systemd: Unit openresty.service entered failed state.

看到这么一句日志,让我眼前一亮,

Sep 24 15:35:04 foo systemd: PID file /run/openresty.pid not readable (yet?) after start.

隐隐觉得这就是开门的钥匙了,结合之前的服务启动错误提示,启动超时失败,这里却说 PID 文件读不到,说明什么呢?

再执行一次服务启动命令,同时观察 /run/openresty.pid,确实并没有生成。到这里,可以大胆的猜测,systemd 按照 service 文件的描述,依次执行了 ExecStartPre, ExecStartPreExecStart 来启动 openresty,之后就通过指定的 PID 文件去监视系统中有没有这个 PID 的进程被创建出来,这里连 PID 文件都没有,自然是无法等待成功,一直监测不到 nginx 进程,只好报告 timeout

检查一下 openresty 默认的 nginx.conf,pid 是被注释了的,那么,自然就存在 /run/openresty.pid 这么一个 PID 文件,于是一次又一次的试图启动服务都是以超时而告终。

解决这个问题就很简单了,取消 pid 的注释,修改为和 openresty.service 中的 PID 保持一致,再次启动服务,秒起。

另外,顺手试了一下进程崩溃重启,来来来,试一下,

$ pkill nginx

nginx 进程都没有了,世界一片安宁。

回到上面,回顾一下 openresty.service 的配置,有如下两个属性的配置,

Restart=on-failure RestartSec=30s

果然,在等待了 30s 之后,nginx 进程被神奇地重新启动了。

[全文完]

PS: 写完整篇,回过头看,其实 PID 的报错在 journalctl -xe 中已经出现了,可惜没有引起我的注意。

© 2021, XZD