基于阿里云函数实现弹幕文件解析接口

前言

一直苦于看非B站视频不能下载弹幕,遂上网搜索下载非B站视频弹幕的方法,后找到一篇博客较为满意。但为了更方便的下载弹幕,而不用本地跑Python代码。萌生了将这份代码做成HTTP触发器的念头,以后只需要使用浏览器,直接调用接口,就可下载弹幕文件。方便多了!如同直接访问comment.bilibili.com,可以查看并下载B站弹幕。
此篇文章代码主要参考这篇博客——主流视频网站弹幕下载
要是不想自己部署,也可以直接用我做好的接口
访问http://fc.home999.cc

使用

首先,在阿里云函数中创建一个带HTTP触发器的函数。
也可以在应用中心——新建应用——Python简单示例应用中创建,这是一个官方自带的Hello World模板。
删除原有文件后,在在线编辑页中创建以下两个文件index.html index.py
如下图:
在线编辑页面
其中
index.html用于显示不调用接口时默认的主页面。
index.py用于从视频源网站上爬取弹幕信息。将爬取结果转换为B站的弹幕XML格式后存入OSS中,并重定向到OSS上去下载。(这样做的目的是为了省流量费,香港阿里云一个月5G以下流量不收钱。自己部署也可以直接返回XML的内容)
附代码如下:

代码中的save2oss函数需要自行修改为自己的oss bucket。

index.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
# -*- coding: utf-8 -*-
from random import randint
from urllib.parse import unquote,quote,parse_qs
import requests
import urllib.request
import json
import time as t
import datetime
import zlib
import re
import xml.etree.ElementTree as ET
import oss2

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE',
}


def get_response_iqiyi(url):
req = urllib.request.Request(url)
req.add_header("User-Agent",
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36")
response = urllib.request.urlopen(req).read()
return response


def get_response(url):
response = requests.get(url)
response.encoding = 'utf-8'
return response.text


def judgeIllegalChar(str):
illegal = False # 标志是否有非法XML字符
for char in ["<", ">", "&", "\u0000", "\b"]:
if char in str:
illegal = True
break
return illegal


def make_response_head():
return '<?xml version="1.0" encoding="UTF-8"?>\n<i>\n'


def make_response_foot():
return '</i>'


def make_response_body(timepoint, content, ct=1, size=20, color=16777215,
unixtime=int(t.mktime(datetime.datetime.now().timetuple())), uid=0):
return '\t<d p="{},{},{},{},{},0,{},26732601000067074">{}</d>\n'.format(timepoint, ct, size, color, unixtime, uid,
content)
# 第一个参数是弹幕出现的时间 以秒数为单位。
# 第二个参数是弹幕的模式1..3 滚动弹幕 4底端弹幕 5顶端弹幕 6.逆向弹幕 7精准定位 8高级弹幕
# 第三个参数是字号, 12非常小,16特小,18小,25中,36大,45很大,64特别大
# 第四个参数是字体的颜色 以HTML颜色的十位数为准
# 第五个参数是Unix格式的时间戳。基准时间为 1970-1-1 08:00:00
# 第六个参数是弹幕池 0普通池 1字幕池 2特殊池 【目前特殊池为高级弹幕专用】
# 第七个参数是发送者的ID,用于“屏蔽此弹幕的发送者”功能
# 第八个参数是弹幕在弹幕数据库中rowID 用于“历史弹幕”功能。


def mgtv(url):
cid = url.split('/')[4]
vid = url.split('/')[5].split('?')[0].strip('.html')
title = re.search(r'partName:"(.*?)",', get_response(url))
if title is not None:
title = title.group(1)
else:
title = 'Unknow'
contents = set()
ret = make_response_head()

time = 0
total = 0 # 弹幕总数
cnt = 0 # 筛选弹幕数
while True:
result = get_response('https://galaxy.bz.mgtv.com/rdbarrage?vid=' + vid + '&cid=' + cid + '&time=' + str(time))
danmu = json.loads(result)
if danmu['data']['items'] == None:
break
for j in danmu['data']['items']:
total += 1
if judgeIllegalChar(j['content']):
continue
timepoint = j['time'] / 1000 # 弹幕发送时间
uid = j['uid'] # 发送者uid
content = j['content'] # 弹幕内容
if content not in contents:
cnt += 1
contents.add(content)
ret += make_response_body(timepoint=timepoint, content=content, uid=uid)
time = danmu['data']['next']
ret += make_response_foot()
print("Download {} danmakus, Select {} danmakus\nfinish.".format(total, cnt))
return [title, ret]


def tencentvideo(url):
# url = 'https://v.qq.com/x/cover/a8oeend1e9gfdzs/f0031nbupkq.html' # 视频的url
video_info = json.loads(
str([s for s in get_response(url).split('\n') if 'VIDEO_INFO' in str(s)]).strip('[\'var VIDEO_INFO = ').strip(
'\']'))
duration = video_info['duration']
title = video_info['title']
vid = video_info['vid']
targetid = json.loads(get_response('http://bullet.video.qq.com/fcgi-bin/target/regist?otype=json&vid=' + vid).strip(
'QZOutputJson=').strip(';'))['targetid']
contents = set()
ret = make_response_head()

total = 0 # 弹幕总数
cnt = 0 # 筛选弹幕数
for i in range(int(duration) // 30 + 1):
timestamp = i * 30
danmu = json.loads(
get_response('http://mfm.video.qq.com/danmu?timestamp=' + str(timestamp) + '&target_id=' + targetid),
strict=False)
for j in danmu['comments']:
total += 1
if judgeIllegalChar(j['content']):
continue
timepoint = j['timepoint'] # 弹幕发送时间
ct = 1 # 弹幕样式
size = 20 # 字体大小
# 获取颜色
if "color" in j["content_style"]:
content_style = json.loads(j["content_style"])
color = int(content_style["color"], 16)
else:
color = 16777215
unixtime = int(t.mktime(datetime.datetime.now().timetuple())) # unix时间戳
content = j['content'] # 弹幕内容
if content not in contents:
cnt += 1
contents.add(content)
ret += '\t<d p="{},{},{},{},{},0,0,26732601000067074">{}</d>\n'.format(timepoint, ct, size, color,
unixtime, content)
ret += make_response_foot()
print("Download {} danmakus, Select {} danmakus\nfinish.".format(total, cnt))
return [title, ret]


def youku(url):
# url = 'https://v.youku.com/v_show/(%id%).html' # 视频的url'
res = get_response(url)
title = re.search(r'<title>(.*)</title>', res).group(1).split('—')[0]
iid = re.search(r'videoId: \'(\d*)\'', res).group(1)
duration = float(re.search(r'seconds: \'(.*)\',', res).group(1))
contents = set()

total = 0 # 弹幕总数
cnt = 0 # 筛选弹幕数
ret = make_response_head()
for mat in range(int(duration) // 60 + 1):
# req = urllib.request.Request('https://service.danmu.youku.com/list?mat=' + str(mat) + '&ct=1001&iid=' + iid)
# req.add_header("User-Agent",
# "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36")
# response = urllib.request.urlopen(req)
response = get_response('https://service.danmu.youku.com/list?mat=' + str(mat) + '&ct=1001&iid=' + iid)
danmu = json.loads(response)
for i in range(len(danmu["result"])):
total += 1
if judgeIllegalChar(danmu["result"][i]["content"]):
continue
playat = danmu["result"][i]["playat"] / 1000 # 弹幕发送时间
# 获取颜色
if "color" in danmu["result"][i]["propertis"]:
propertis = json.loads(danmu["result"][i]["propertis"])
color = propertis["color"]
else:
color = 16777215
content = danmu["result"][i]["content"] # 弹幕内容
if content not in contents:
contents.add(content)
cnt += 1
ret += make_response_body(timepoint=playat, color=color, content=content)
ret += make_response_foot()
print("Download {} danmakus, Select {} danmakus\nfinish.".format(total, cnt))
return [title, ret]


def iqiyi(url):
# url = 'https://www.iqiyi.com/(%id%).html' # 视频的url'
ret = get_response(url)
return ['iqiyi',ret]
page_info = json.loads(re.search(r"page-info='(.*)' :video-info=", get_response(url)).group(1))
duration_str = page_info['duration'].split(':')
duration = 0
for i in range(len(duration_str) - 1):
duration = (duration + int(duration_str[i])) * 60
duration = duration + int(duration_str[-1])
title = page_info['tvName']
albumid = page_info['albumId']
tvid = page_info['tvId']
categoryid = page_info['cid']
page = duration // (60 * 5) + 1
contents = set()

total = 0 # 弹幕总数
cnt = 0 # 筛选弹幕数
ret = make_response_head()
for i in range(duration // (60 * 5) + 1):
dec = zlib.decompressobj(32 + zlib.MAX_WBITS)
b = dec.decompress(get_response_iqiyi(
'http://cmts.iqiyi.com/bullet/' + str(tvid)[-4:-2] + '/' + str(tvid)[-2:] + '/' + str(tvid) + '_300_' + str(
i + 1) + '.z?rn=0.' + ''.join(["%s" % randint(0, 9) for num in range(0,
16)]) + '&business=danmu&is_iqiyi=true&is_video_page=true&tvid=' + str(
tvid) + '&albumid=' + str(albumid) + '&categoryid=' + str(categoryid) + '&qypid=01010021010000000000'))
root = ET.fromstring(b.decode("utf-8"))
for bulletInfo in root.iter('bulletInfo'):
total += 1
timepoint = bulletInfo[3].text # 弹幕发送时间
color = int(bulletInfo[5].text, 16) # 颜色
content = bulletInfo[1].text # 弹幕内容
size = bulletInfo[4].text
if content not in contents:
cnt += 1
contents.add(content)
ret += make_response_body(timepoint=timepoint, color=color, content=content, size=size)
ret += make_response_foot()
print("Download {} danmakus, Select {} danmakus\nfinish.".format(total, cnt))
return [title, ret]


def save2oss(title, xml, download):
url = "XML/" + title + ".xml"
auth = oss2.Auth('******', '******')
endpoint_internal = 'oss-cn-******-internal.aliyuncs.com'
endpoint = 'oss-cn-******.aliyuncs.com'
bucketName = '****'
bucket = oss2.Bucket(auth, endpoint, bucketName)
headers = {}
if (download):
headers['Content-Type']='application/force-download'
else:
headers['Content-Type']='application/xml'

result = bucket.put_object(url, xml,headers=headers)
print('Upload URL:{}\tHTTP status: {}'.format(url, result.status))
#bucket = oss2.Bucket(auth, endpoint, bucketName)
return unquote(bucket.sign_url('GET', url, 120),'utf-8')


def bilibili(url):
text = get_response(url)
keyStr = re.findall(r'"cid":[\d]*', text) # B站有两种寻址方式,第二种多一些
if not keyStr: # 若列表为空,则等于“False”
keyStr = re.findall(r'cid=[\d]*', text)
key = eval(keyStr[0].split('=')[1])
else:
key = eval(keyStr[0].split(':')[1])
commentUrl = 'https://comment.bilibili.com/' + str(key) + '.xml' # 弹幕存储地址
return commentUrl


def build_response(url,download):
if url.find('mgtv.com') >= 0:
[title, ret] = mgtv(url)
elif url.find('qq.com') >= 0:
[title, ret] = tencentvideo(url)
elif url.find('youku.com') >= 0:
[title, ret] = youku(url)
elif url.find('iqiyi.com') >= 0:
[title, ret] = iqiyi(url)
elif url.find('bilibili.com') >= 0:
return bilibili(url)
else:
return None
return save2oss(title, ret, download)


def handler(environ, start_response):
if 'QUERY_STRING' not in environ:
status = '200 OK'
response_headers = [('Content-type', 'text/html; charset=UTF-8')]
with open("index.html", "r", encoding="utf-8") as f:
ret = f.read()
else:
query_string = environ['QUERY_STRING']
params = parse_qs(query_string)
download = params.get('download',['off'])[0]
download = (download == 'on')
url = unquote(params['url'][0], 'utf-8')
returl = build_response(url,download)
if returl is not None:
status = '302 Found'
response_headers = [('Location', quote(returl,safe='/:?&='))]
ret = ''
else:
status = '200 OK'
response_headers = [('Content-type', 'text/html; charset=UTF-8')]
ret = "不支持的视频网址"
start_response(status, response_headers)
return [ret.encode('utf8')]

#build_response("https://www.iqiyi.com/v_19rr1lm35o.html")

代码中http://fc.home999.cc/需要替换为自己OSS触发器的地址或者自己绑定的用户域名

index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
<!DOCTYPE html>
<html lang="zh-CN">

<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>弹幕文件解析接口</title>
<!-- jQuery (Bootstrap 的所有 JavaScript 插件都依赖 jQuery,所以必须放在前边) -->
<script src="https://cdn.staticfile.org/jquery/3.4.1/jquery.min.js"></script>
<!-- bootstrap -->
<link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/3.4.1/css/bootstrap.min.css">
<script src="https://cdn.staticfile.org/twitter-bootstrap/3.4.1/js/bootstrap.min.js"></script>
</head>

<body>
<div class="container">
<div class="row text-center">
<div class="page-header">
<h1>
主流视频网站弹幕文件解析接口
</h1>
</div>
</div>
<div class="row">
<p>
这是一个弹幕文件解析接口!输入你要解析的视频地址,即可获得B站弹幕形式的XML文件。<br/>
通过使用<a href="http://www.dandanplay.com/">弹弹Play播放器</a>
或者<a href='https://tiansh.github.io/us-danmaku/bilibili/'>bilibili ASS 弹幕在线转换项目</a>
转换为普通字幕文件,即可在本地播放器中播放。
</p>
<p>
使用方法:在当前页面添加一个查询字符串url<br/>
目前支持芒果TV,腾讯视频,优酷视频,爱奇艺视频,哔哩哔哩。
例子<br/>
http://fc.home999.cc/?url=https://www.mgtv.com/b/336727/8087768.html<br/>
http://fc.home999.cc/?url=https://v.qq.com/x/cover/a8oeend1e9gfdzs/f0031nbupkq.html<br/>
http://fc.home999.cc/?url=https://v.youku.com/v_show/id_XNDYyMDM2NDMyOA==.html<br />
http://fc.home999.cc/?url=https://www.iqiyi.com/v_19rr1lm35o.html<br />
http://fc.home999.cc/?url=https://www.bilibili.com/video/av170001
</p>
</div>
<div class="row">
<p>在下方直接输入视频网址,点击提交按钮也可解析。</p>
<form class="form-horizontal">
<div class="form-group">
<label class="col-sm-1 control-label">视频网址</label>
<div class="col-sm-5">
<input type="text" class="form-control" placeholder="URL" name="url">
</div>
</div>
<div class="form-group">
<div class="col-sm-offset-1 col-sm-5">
<div class="checkbox">
<label>
<input type="checkbox" name="download" checked='checked'> 强制下载
</label>
</div>
</div>
</div>
<div class="form-group">
<div class="col-sm-offset-1 col-sm-5">
<button type="submit" class="btn btn-primary">提交</button>
</div>
</div>
</form>

</div>
<hr />
<footer class="footer">
<div class="container">
Powered by <strong>Aliyun FC</strong>
<span class="post-meta-divider">|</span>
Reference blog:<a
href="https://lxmymjr.github.io/contents/%E4%B8%BB%E6%B5%81%E8%A7%86%E9%A2%91%E7%BD%91%E7%AB%99%E5%BC%B9%E5%B9%95%E4%B8%8B%E8%BD%BD">主流视频网站弹幕下载</a>
</div>
</footer>
</div>

</body>
<script>

</script>

</html>