/ 详情

HTML里面的charset gbk 无法正常解析

Backlog
Opened this issue  
2018-08-02 11:19

类: DownloadWorker
private String getCharsetFromBodyStr(final String bodyStr) {
if (K.isBlank(bodyStr)) {
return null;
}

	String html = bodyStr.trim().toLowerCase();
	String s1 = K.findOneByRegex(html, "(?=<meta ).*charset=.[^/]*");
	if (K.isBlank(s1)) {
		return null;
	}
	
	String s2 = K.findOneByRegex(s1, "(?=charset\\=).[^;/\"']*");
	if (K.isBlank(s2)) {
		return null;
	}
	String charsetName = s2.replace("charset=", "");
	return K.getCharsetName(charsetName);
}

html编码:

String s1 = K.findOneByRegex(html, "(?=<meta ).charset=.[^/]");
s1: <meta charset="gbk"

String s2 = K.findOneByRegex(s1, "(?=charset\=).[^;/"']*");
s2: charset=

Comments (0)

6937 created 任务

Sign in to comment

状态
Assignees
Milestones
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
Branches
Planed to start   -   Planed to end
-
Top level
Priority
参与者(1)