OpenResty Lua脚本实战 反爬虫限制IP访问频率
2022-12-02
阅读 {{counts.readCount}}
评论 {{counts.commentCount}}
## 前言
结合上一篇 [OpenResty yum 方式安装 支持 CentOS 7和8](https://zzzmh.cn/post/ed75b440720b11eda7050242ac110003)
这一篇学习 `OpenResty` 最重要的一个部分 `Lua` 脚本
实战写一个限制IP频率等简单反爬虫脚本
先说明这样的脚本网上有很多例子,随便 `Ctrl + C` 就能实现这功能,但我希望的是学会 `Lua` 语法,以便以后自由实现各种功能
<br><br>
## 折腾
为防止搞奔溃线上环境,这一篇仅在本地docker中完成编写和测试
<br>
**Docker安装OpenResty**
```shell
docker run -d -p 8000:80\
-e "TZ=Asia/Shanghai"\
-m 200M --oom-kill-disable --memory-swap=-1\
--name openresty\
openresty/openresty
```
<br>
访问 [localhost:8000](http://localhost:8000/)
看到下面这个画面说明启动成功
![](/api/file/getImage?fileId=634f63dbda740500130155d9)
<br>
接下来遇到一个问题,操作openresty内部的文件不太方便,这里选择复制内部文件到外面,再重新开容器映射到里面
(我这里外面是linux系统的虚拟机)
```shell
# 复制内部目录到外部 停止并删除容器
docker cp openresty:/usr/local/openresty /home/docker/
sudo chmod -R 777 openresty
docker stop openresty
docker rm openresty
# 重新启动容器 将外部目录映射给内部 这里顺便把redis和mysql关联一下
docker run -d -p 8000:80\
-e "TZ=Asia/Shanghai"\
-m 200M --oom-kill-disable --memory-swap=-1\
-v /home/docker/openresty:/usr/local/openresty\
--link mysql\
--link redis\
--name openresty\
openresty/openresty
```
<br>
到这里还发现一个小问题,直接操作 `nginx.conf` 没效果,后来发现容器内部有一个 `default.conf` 需要先删除
```shell
docker exec -it openresty bash
cd /etc/nginx/conf.d/
rm -f default.conf
```
<br>
至此 就可以在 `/home/docker/openresty/` 目录下操作,然后浏览器访问 [localhost:8000](http://localhost:8000/),看效果,有些操作需要重启生效
<br>
找到nginx配置文件,稍微精简了一下
`/home/docker/openresty/nginx/conf/nginx.conf`
```shell
user root;
worker_processes 8;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type text/html;
keepalive_timeout 65;
include /etc/nginx/conf.d/*.conf;
server {
listen 80;
server_name localhost;
location / {
root html;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
```
<br>
修改后需要重新加载配置文件才会生效
```shell
# 进入容器
docker exec -it openresty bash
# 测试配置文件是否正确
openresty -t
# 如果没有异常 可以重新加载配置文件
openresty -s reload
```
<br>
先实现个helloword证明已跑通
在nginx目录下新建一个lua文件夹
再在文件夹内新建一个lua文件 `hello.lua`
```shell
server {
location / {
# 执行lua脚本文件
content_by_lua_file lua/hello.lua;
# root html;
# index index.html index.htm;
}
...
}
```
<br>
`nginx/lua/hello.lua`
```lua
ngx.say('Hello Lua!');
```
<br>
最后在容器里重新加载配置文件
```shell
docker exec -it openresty bash
openresty -s reload
```
访问[localhost:8000](http://localhost:8000/)
即可看到 `Hello Lua!`
<br><br>
接下来可以直接快进到写本篇主题:反爬虫的脚本实现
思路如下图
(例子用的是60秒20次的限制,也可以改成任意时间任意次数)
![](/api/file/getImage?fileId=634f7ad3da740500130155e7)
<br><br>
`nginx.conf`
```shell
server {
location / {
# 执行lua脚本文件
content_by_lua_file lua/access_limit.lua;
# root html;
# index index.html index.htm;
}
...
}
```
`access_limit.lua`
```lua
local redis_iresty = require "resty.redis_iresty"
local redis = redis_iresty:new()
local key = ngx.md5(ngx.var.remote_addr)
local count = redis:get(key);
if count then
-- 大于单位时间的限制次数则给与500错误 测试暂定200次60秒
if tonumber(count) > 200 then
ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
else
-- 次数仍在正常范围则只 +1 记录次数
redis:incr(key)
end
else
-- 首次访问次数记为1
redis:set(key, 1)
-- 过期时间60秒
redis:expire(key, 60)
end
-- 在测试页面打印key value ttl 3个值
ngx.say(key..'<br>'..redis:get(key)..'<br>'..redis:ttl(key))
```
<br><br>
**这里需要注意!**
`local redis = require "resty.redis_iresty"`
这行代码默认是跑不通的,因为不存在这个库,这是一位up主自己实现并封装的库,需要预先加入到resty文件夹中。
参考 [Redis 接口的二次封装 (gitbooks)](https://moonbingbing.gitbooks.io/openresty-best-practices/content/redis/out_package.html)
具体操作是
在目录 `openresty/lualib/resty/` 下
新增一个文件 `redis_iresty.lua`
内容如下,注意:其中redis连接的IP地址和PORT端口,需要改为你实际情况的
```lua
local redis_c = require "resty.redis"
local ok, new_tab = pcall(require, "table.new")
if not ok or type(new_tab) ~= "function" then
new_tab = function (narr, nrec) return {} end
end
local _M = new_tab(0, 155)
_M._VERSION = '0.01'
local commands = {
"append", "auth", "bgrewriteaof",
"bgsave", "bitcount", "bitop",
"blpop", "brpop",
"brpoplpush", "client", "config",
"dbsize",
"debug", "decr", "decrby",
"del", "discard", "dump",
"echo",
"eval", "exec", "exists",
"expire", "expireat", "flushall",
"flushdb", "get", "getbit",
"getrange", "getset", "hdel",
"hexists", "hget", "hgetall",
"hincrby", "hincrbyfloat", "hkeys",
"hlen",
"hmget", "hmset", "hscan",
"hset",
"hsetnx", "hvals", "incr",
"incrby", "incrbyfloat", "info",
"keys",
"lastsave", "lindex", "linsert",
"llen", "lpop", "lpush",
"lpushx", "lrange", "lrem",
"lset", "ltrim", "mget",
"migrate",
"monitor", "move", "mset",
"msetnx", "multi", "object",
"persist", "pexpire", "pexpireat",
"ping", "psetex", "psubscribe",
"pttl",
"publish", --[[ "punsubscribe", ]] "pubsub",
"quit",
"randomkey", "rename", "renamenx",
"restore",
"rpop", "rpoplpush", "rpush",
"rpushx", "sadd", "save",
"scan", "scard", "script",
"sdiff", "sdiffstore",
"select", "set", "setbit",
"setex", "setnx", "setrange",
"shutdown", "sinter", "sinterstore",
"sismember", "slaveof", "slowlog",
"smembers", "smove", "sort",
"spop", "srandmember", "srem",
"sscan",
"strlen", --[[ "subscribe", ]] "sunion",
"sunionstore", "sync", "time",
"ttl",
"type", --[[ "unsubscribe", ]] "unwatch",
"watch", "zadd", "zcard",
"zcount", "zincrby", "zinterstore",
"zrange", "zrangebyscore", "zrank",
"zrem", "zremrangebyrank", "zremrangebyscore",
"zrevrange", "zrevrangebyscore", "zrevrank",
"zscan",
"zscore", "zunionstore", "evalsha"
}
local mt = { __index = _M }
local function is_redis_null( res )
if type(res) == "table" then
for k,v in pairs(res) do
if v ~= ngx.null then
return false
end
end
return true
elseif res == ngx.null then
return true
elseif res == nil then
return true
end
return false
end
-- 这里要改为你实际情况的地址和端口
function _M.connect_mod( self, redis )
redis:set_timeout(self.timeout)
return redis:connect("172.0.0.1", 6379)
end
function _M.set_keepalive_mod( redis )
-- put it into the connection pool of size 100, with 60 seconds max idle time
return redis:set_keepalive(60000, 1000)
end
function _M.init_pipeline( self )
self._reqs = {}
end
function _M.commit_pipeline( self )
local reqs = self._reqs
if nil == reqs or 0 == #reqs then
return {}, "no pipeline"
else
self._reqs = nil
end
local redis, err = redis_c:new()
if not redis then
return nil, err
end
local ok, err = self:connect_mod(redis)
if not ok then
return {}, err
end
redis:init_pipeline()
for _, vals in ipairs(reqs) do
local fun = redis[vals[1]]
table.remove(vals , 1)
fun(redis, unpack(vals))
end
local results, err = redis:commit_pipeline()
if not results or err then
return {}, err
end
if is_redis_null(results) then
results = {}
ngx.log(ngx.WARN, "is null")
end
-- table.remove (results , 1)
self.set_keepalive_mod(redis)
for i,value in ipairs(results) do
if is_redis_null(value) then
results[i] = nil
end
end
return results, err
end
function _M.subscribe( self, channel )
local redis, err = redis_c:new()
if not redis then
return nil, err
end
local ok, err = self:connect_mod(redis)
if not ok or err then
return nil, err
end
local res, err = redis:subscribe(channel)
if not res then
return nil, err
end
res, err = redis:read_reply()
if not res then
return nil, err
end
redis:unsubscribe(channel)
self.set_keepalive_mod(redis)
return res, err
end
local function do_command(self, cmd, ... )
if self._reqs then
table.insert(self._reqs, {cmd, ...})
return
end
local redis, err = redis_c:new()
if not redis then
return nil, err
end
local ok, err = self:connect_mod(redis)
if not ok or err then
return nil, err
end
local fun = redis[cmd]
local result, err = fun(redis, ...)
if not result or err then
-- ngx.log(ngx.ERR, "pipeline result:", result, " err:", err)
return nil, err
end
if is_redis_null(result) then
result = nil
end
self.set_keepalive_mod(redis)
return result, err
end
for i = 1, #commands do
local cmd = commands[i]
_M[cmd] =
function (self, ...)
return do_command(self, cmd, ...)
end
end
function _M.new(self, opts)
opts = opts or {}
local timeout = (opts.timeout and opts.timeout * 1000) or 1000
local db_index= opts.db_index or 0
return setmetatable({
timeout = timeout,
db_index = db_index,
_reqs = nil }, mt)
end
return _M
```
<br>
这里需要重启就不说了,反正每次改完 `nginx.conf` 或引用的 `lua` 文件,都需要 `openresty -s reload` 一下
<br>
接下来 访问 [localhost:8000](http://localhost:8000/) 查看最终效果
前200次是 状态200 展示 key + value + ttl
第201次开始是 状态500 错误页面
之后无论如何访问都是500
直到1分钟计时结束
再次访问回到200 展示 key + value + ttl
大功告成
<br><br>
<br><br>
补充发现的2个小问题
1 首次访问数值是1,之后每次访问都是+2,直到结束, 这个问题藏得挺深的,通过查看openresty日志才能看到
```shell
# 查看日志命令
docker logs openresty
# 返回的日志
10.0.2.2 - - [20/Oct/2022:14:18:27 +0800] "GET / HTTP/1.1" 200 55 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
10.0.2.2 - - [20/Oct/2022:14:18:27 +0800] "GET /favicon.ico HTTP/1.1" 200 55 "http://localhost:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
10.0.2.2 - - [20/Oct/2022:14:18:28 +0800] "GET / HTTP/1.1" 200 55 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
10.0.2.2 - - [20/Oct/2022:14:18:29 +0800] "GET /favicon.ico HTTP/1.1" 200 55 "http://localhost:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
```
可以看到每次访问都产生2条,除了访问根目录,还会请求一次 `favicon.ico` 网页图标,而我们nginx的配置中 / 为接受所有请求,所有会立即被触发2次
**最终打死不改版**
`nginx.conf`
```shell
server {
location / {
# 执行lua脚本文件
content_by_lua_file lua/access_limit.lua;
# root html;
# index index.html index.htm;
}
location /favicon.ico {
return 404;
}
...
}
```
`access_limit.lua`
```lua
local redis_iresty = require "resty.redis_iresty"
local redis = redis_iresty:new()
local key = ngx.md5(ngx.var.remote_addr)
local count = redis:get(key);
if count then
-- 大于单位时间的限制次数则给与500错误 测试暂定200次60秒
if tonumber(count) > 200 then
ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
else
-- 次数仍在正常范围则只 +1 记录次数
redis:incr(key)
end
else
-- 首次访问次数记为1
redis:incr(key)
-- 过期时间60秒
redis:expire(key, 60)
end
-- 在测试页面打印key value ttl 3个值
ngx.say(key..'<br>'..redis:get(key)..'<br>'..redis:ttl(key))
```
<br>
2 Win 11 VirtualBox下安装的Docker,居然是光盘启动的系统,磁盘只是一个挂载,如果把数据存在home目录下重启就会消失,怪不得只需要1.8G,我前文是把Docker中的数据映射到/home/docker下,后来发现重启就消失了,就是这个原因,后续改为映射到 `/mnt/sda1/var/lib/docker/data`,其中data是新建文件夹,至此重启也不会丢失,但有个先后次序问题,导致openresty不能在这里开机启动,还需要手动start一下
<br>
顺便怀念一波在Linux下开发的日子,就没有这种破事,同一台硬件,速度翻2~3倍,指哪打哪。
<br><br>
## END
参考
[OpenResty全套课程 (bilibili)](https://www.bilibili.com/video/BV1nU4y1x7Lt)
[分布式--OpenResty+lua+Redis实现限流与防爬虫 (csdn)](https://blog.csdn.net/qq_24000367/article/details/125536798)