8 Star 21 Fork 11

z77z / TouTiaoCrawler

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
index.html 42.11 KB
一键复制 编辑 原始数据 按行查看 历史
z77z 提交于 2016-12-27 10:00 . Upload index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>java spring+mybatis整合实现爬虫之《今日头条》搞笑动态图片爬取</title>
<link rel="stylesheet" href="https://stackedit.io/res-min/themes/base.css" />
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
</head>
<body><div class="container"><h2 id="java-springmybatis整合实现爬虫之今日头条搞笑动态图片爬取详细">java spring+mybatis整合实现爬虫之《今日头条》搞笑动态图片爬取(详细)</h2>
<hr>
<h2 id="一此爬虫介绍">一.此爬虫介绍</h2>
<blockquote>
<p>今日头条本身就是做爬虫的,爬取各大网站的图片文字信息,再自己整合后推送给用户,特别是里面的动态图片,很有意思。在网上搜了搜,大多都是用Python来写的,本人是学习javaweb这块的,对正则表达式也不是很熟悉,就想着能不能换个我熟悉的方式来写。此爬虫使用spring+mybatis框架整合实现,使用mysql数据库保存爬取的数据,用jsoup来操作HTML的标签节点(完美避开正则表达式),获取文章中动态图片的链接,通过响应头中“Content-Type”的值来判断图片的格式,再将图片保存在本地。当然也可以爬取里面的文字,比如一些搞笑的黄段子,在此基础上稍加改动就可以实现,此爬虫只是提供一个入门的思路,更多好玩的爬虫玩法还待大家去开发,哈哈。</p>
</blockquote>
<h2 id="二技术选型">二.技术选型</h2>
<blockquote>
<ol>
<li>核心语言:java;</li>
<li>核心框架:spring;</li>
<li>持久层框架:mybatis;</li>
<li>数据库连接池:Alibaba Drui;</li>
<li>日志管理:Log4j;</li>
<li>jar包管理:maven; 。。。。</li>
</ol>
</blockquote>
<h2 id="三找规律划重点">三.找规律,划重点</h2>
<blockquote>
<p>打开头条首页,找到点击搞笑模块,点击F12,下滚后加载下一页,发现是通过ajax请求api来获取的数据,如下图:</p>
</blockquote>
<p><img src="http://img.blog.csdn.net/20161226223513868?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcXFfMjA5NTQ5NTk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>
<blockquote>
<p><strong>这是响应的json数据,里面的参数和值顾名思义大家都懂得。</strong></p>
</blockquote>
<p><img src="http://img.blog.csdn.net/20161226223904416?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcXFfMjA5NTQ5NTk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>
<blockquote>
<p>是ajax访问就好解决了,通过我百度谷歌各种研究后发现,ajax请求的前三个参数是不变的,改变category参数是请求不同的模块,本列子是请求的搞笑模块所以值为funny,max_behot_time和max_behot_time_tmp这两个参数值是时间戳,首次请求是0,之后的值是响应json数据里面的next中的值。as和cp值是通过一段js生成的,其实就是一个加密了的时间戳而已。js代码后面会贴。</p>
</blockquote>
<h2 id="四开始搭框架撸代码">四.开始搭框架撸代码</h2>
<blockquote>
<p>项目搭建后之后为下图所示的文件结构,不懂得自行谷歌 哈哈</p>
</blockquote>
<p><img src="http://img.blog.csdn.net/20161226225657064?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcXFfMjA5NTQ5NTk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>
<blockquote>
<p><strong>不多说直接上核心代码了:</strong></p>
</blockquote>
<pre class="prettyprint"><code class=" hljs java"><span class="hljs-keyword">package</span> io.z77z.main;
<span class="hljs-keyword">import</span> io.z77z.dao.FunnyMapper;
<span class="hljs-keyword">import</span> io.z77z.entity.Funny;
<span class="hljs-keyword">import</span> java.io.BufferedInputStream;
<span class="hljs-keyword">import</span> java.io.BufferedReader;
<span class="hljs-keyword">import</span> java.io.FileOutputStream;
<span class="hljs-keyword">import</span> java.io.FileReader;
<span class="hljs-keyword">import</span> java.io.IOException;
<span class="hljs-keyword">import</span> java.io.InputStreamReader;
<span class="hljs-keyword">import</span> java.net.HttpURLConnection;
<span class="hljs-keyword">import</span> java.net.URL;
<span class="hljs-keyword">import</span> java.util.Date;
<span class="hljs-keyword">import</span> java.util.UUID;
<span class="hljs-keyword">import</span> javax.script.Invocable;
<span class="hljs-keyword">import</span> javax.script.ScriptEngine;
<span class="hljs-keyword">import</span> javax.script.ScriptEngineManager;
<span class="hljs-keyword">import</span> org.jsoup.Connection;
<span class="hljs-keyword">import</span> org.jsoup.Jsoup;
<span class="hljs-keyword">import</span> org.jsoup.nodes.Document;
<span class="hljs-keyword">import</span> org.jsoup.nodes.Element;
<span class="hljs-keyword">import</span> org.jsoup.select.Elements;
<span class="hljs-keyword">import</span> org.springframework.context.ApplicationContext;
<span class="hljs-keyword">import</span> org.springframework.context.support.ClassPathXmlApplicationContext;
<span class="hljs-keyword">import</span> com.alibaba.fastjson.JSON;
<span class="hljs-keyword">import</span> com.alibaba.fastjson.JSONArray;
<span class="hljs-keyword">import</span> com.alibaba.fastjson.JSONObject;
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TouTiaoCrawler</span> {</span>
<span class="hljs-comment">// 搞笑板块的api地址</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">final</span> String FUNNY = <span class="hljs-string">"http://www.toutiao.com/api/pc/feed/?utm_source=toutiao&amp;widen=1"</span>;
<span class="hljs-comment">// 头条首页地址</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">final</span> String TOUTIAO = <span class="hljs-string">"http://www.toutiao.com"</span>;
<span class="hljs-comment">// 使用"spring.xml"和"spring-mybatis.xml"这两个配置文件创建Spring上下文</span>
<span class="hljs-keyword">static</span> ApplicationContext ac = <span class="hljs-keyword">new</span> ClassPathXmlApplicationContext(
<span class="hljs-string">"spring-mybatis.xml"</span>);
<span class="hljs-comment">// 从Spring容器中根据bean的id取出我们要使用的funnyMapper对象</span>
<span class="hljs-keyword">static</span> FunnyMapper funnyMapper = (FunnyMapper) ac.getBean(<span class="hljs-string">"funnyMapper"</span>);
<span class="hljs-comment">// 接口访问次数</span>
<span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">int</span> refreshCount = <span class="hljs-number">0</span>;
<span class="hljs-comment">// 时间戳</span>
<span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">long</span> time = <span class="hljs-number">0</span>;
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">main</span>(String[] args) {
System.out.println(<span class="hljs-string">"----------开始干活!-----------------"</span>);
<span class="hljs-keyword">while</span> (<span class="hljs-keyword">true</span>) {
crawler(time);
}
}
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">crawler</span>(<span class="hljs-keyword">long</span> hottime) {<span class="hljs-comment">// 传入时间戳,会获取这个时间戳的内容</span>
refreshCount++;
System.out.println(<span class="hljs-string">"----------第"</span> + refreshCount + <span class="hljs-string">"次刷新------返回的请求时间为:"</span>
+ hottime + <span class="hljs-string">"----------"</span>);
String url = FUNNY + <span class="hljs-string">"&amp;max_behot_time="</span> + hottime
+ <span class="hljs-string">"&amp;max_behot_time_tmp="</span> + hottime;
JSONObject param = getUrlParam(); <span class="hljs-comment">// 获取用js代码得到的as和cp的值</span>
<span class="hljs-comment">// 定义接口访问的模块</span>
<span class="hljs-comment">/*
* __all__ : 推荐 news_hot: 热点 funny:搞笑
*/</span>
String module = <span class="hljs-string">"funny"</span>;
url += <span class="hljs-string">"&amp;as="</span> + param.get(<span class="hljs-string">"as"</span>) + <span class="hljs-string">"&amp;cp="</span> + param.get(<span class="hljs-string">"cp"</span>)
+ <span class="hljs-string">"&amp;category="</span> + module;
JSONObject json = <span class="hljs-keyword">null</span>;
<span class="hljs-keyword">try</span> {
json = getReturnJson(url);<span class="hljs-comment">// 获取json串</span>
} <span class="hljs-keyword">catch</span> (Exception e) {
e.printStackTrace();
}
<span class="hljs-keyword">if</span> (json != <span class="hljs-keyword">null</span>) {
time = json.getJSONObject(<span class="hljs-string">"next"</span>).getLongValue(<span class="hljs-string">"max_behot_time"</span>);
JSONArray data = json.getJSONArray(<span class="hljs-string">"data"</span>);
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; data.size(); i++) {
<span class="hljs-keyword">try</span> {
JSONObject obj = (JSONObject) data.get(i);
<span class="hljs-comment">// 判断这条文章是否已经爬过</span>
<span class="hljs-keyword">if</span> (funnyMapper.selectByGroupId((String) obj
.get(<span class="hljs-string">"group_id"</span>)) != <span class="hljs-keyword">null</span>) {
System.out
.println(<span class="hljs-string">"----------此文章已经爬过啦!-----------------"</span>);
<span class="hljs-keyword">continue</span>;
}
<span class="hljs-comment">// 访问页面返回document对象</span>
String url1 = TOUTIAO + <span class="hljs-string">"/a"</span> + obj.getString(<span class="hljs-string">"group_id"</span>);
Document document = getArticleInfo(url1);
System.out.println(<span class="hljs-string">"----------成功访问了文章:"</span> + url1
+ <span class="hljs-string">"-----------------"</span>);
<span class="hljs-comment">// 将document也存入</span>
obj.put(<span class="hljs-string">"document"</span>, document.toString());
<span class="hljs-comment">// 将json对象转换成java Entity对象</span>
Funny funny = JSON.parseObject(obj.toString(), Funny.class);
<span class="hljs-comment">// json入库</span>
funny.setBehotTime(<span class="hljs-keyword">new</span> Date());
funnyMapper.insertSelective(funny);
} <span class="hljs-keyword">catch</span> (Exception e) {
e.printStackTrace();
}
}
} <span class="hljs-keyword">else</span> {
System.out.println(<span class="hljs-string">"----------返回的json列表为空----------"</span>);
}
}
<span class="hljs-comment">// 访问接口,返回json封装的数据格式</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> JSONObject <span class="hljs-title">getReturnJson</span>(String url) {
<span class="hljs-keyword">try</span> {
URL httpUrl = <span class="hljs-keyword">new</span> URL(url);
BufferedReader in = <span class="hljs-keyword">new</span> BufferedReader(<span class="hljs-keyword">new</span> InputStreamReader(
httpUrl.openStream(), <span class="hljs-string">"UTF-8"</span>));
String line = <span class="hljs-keyword">null</span>;
String content = <span class="hljs-string">""</span>;
<span class="hljs-keyword">while</span> ((line = in.readLine()) != <span class="hljs-keyword">null</span>) {
content += line;
}
in.close();
<span class="hljs-keyword">return</span> JSONObject.parseObject(content);
} <span class="hljs-keyword">catch</span> (Exception e) {
System.err.println(<span class="hljs-string">"访问失败:"</span> + url);
e.printStackTrace();
}
<span class="hljs-keyword">return</span> <span class="hljs-keyword">null</span>;
}
<span class="hljs-comment">// 获取网站的document对象</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> Document <span class="hljs-title">getArticleInfo</span>(String url) {
<span class="hljs-keyword">try</span> {
Connection connect = Jsoup.connect(url);
Document document;
document = connect.get();
Elements article = document.getElementsByClass(<span class="hljs-string">"article-content"</span>);
<span class="hljs-keyword">if</span> (article.size() &gt; <span class="hljs-number">0</span>) {
Elements a = article.get(<span class="hljs-number">0</span>).getElementsByTag(<span class="hljs-string">"img"</span>);
<span class="hljs-keyword">if</span> (a.size() &gt; <span class="hljs-number">0</span>) {
<span class="hljs-keyword">for</span> (Element e : a) {
String url2 = e.attr(<span class="hljs-string">"src"</span>);
<span class="hljs-comment">// 下载img标签里面的图片到本地</span>
saveToFile(url2);
}
}
}
<span class="hljs-keyword">return</span> document;
} <span class="hljs-keyword">catch</span> (IOException e) {
System.err.println(<span class="hljs-string">"访问文章页失败:"</span> + url + <span class="hljs-string">" 原因"</span> + e.getMessage());
<span class="hljs-keyword">return</span> <span class="hljs-keyword">null</span>;
}
}
<span class="hljs-comment">// 执行js获取as和cp参数值</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> JSONObject <span class="hljs-title">getUrlParam</span>() {
JSONObject jsonObject = <span class="hljs-keyword">null</span>;
FileReader reader = <span class="hljs-keyword">null</span>;
<span class="hljs-keyword">try</span> {
ScriptEngineManager manager = <span class="hljs-keyword">new</span> ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName(<span class="hljs-string">"javascript"</span>);
String jsFileName = <span class="hljs-string">"toutiao.js"</span>; <span class="hljs-comment">// 读取js文件</span>
reader = <span class="hljs-keyword">new</span> FileReader(jsFileName); <span class="hljs-comment">// 执行指定脚本</span>
engine.eval(reader);
<span class="hljs-keyword">if</span> (engine <span class="hljs-keyword">instanceof</span> Invocable) {
Invocable invoke = (Invocable) engine;
Object obj = invoke.invokeFunction(<span class="hljs-string">"getParam"</span>);
jsonObject = JSONObject.parseObject(obj != <span class="hljs-keyword">null</span> ? obj
.toString() : <span class="hljs-keyword">null</span>);
}
} <span class="hljs-keyword">catch</span> (Exception e) {
e.printStackTrace();
} <span class="hljs-keyword">finally</span> {
<span class="hljs-keyword">try</span> {
<span class="hljs-keyword">if</span> (reader != <span class="hljs-keyword">null</span>) {
reader.close();
}
} <span class="hljs-keyword">catch</span> (IOException e) {
e.printStackTrace();
}
}
<span class="hljs-keyword">return</span> jsonObject;
}
<span class="hljs-comment">// 通过url获取图片并保存在本地</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">saveToFile</span>(String destUrl) {
FileOutputStream fos = <span class="hljs-keyword">null</span>;
BufferedInputStream bis = <span class="hljs-keyword">null</span>;
HttpURLConnection httpUrl = <span class="hljs-keyword">null</span>;
URL url = <span class="hljs-keyword">null</span>;
String uuid = UUID.randomUUID().toString();
String fileAddress = <span class="hljs-string">"d:\\imag/"</span> + uuid;<span class="hljs-comment">// 存储本地文件地址</span>
<span class="hljs-keyword">int</span> BUFFER_SIZE = <span class="hljs-number">1024</span>;
<span class="hljs-keyword">byte</span>[] buf = <span class="hljs-keyword">new</span> <span class="hljs-keyword">byte</span>[BUFFER_SIZE];
<span class="hljs-keyword">int</span> size = <span class="hljs-number">0</span>;
<span class="hljs-keyword">try</span> {
url = <span class="hljs-keyword">new</span> URL(destUrl);
httpUrl = (HttpURLConnection) url.openConnection();
httpUrl.connect();
String Type = httpUrl.getHeaderField(<span class="hljs-string">"Content-Type"</span>);
<span class="hljs-keyword">if</span> (Type.equals(<span class="hljs-string">"image/gif"</span>)) {
fileAddress += <span class="hljs-string">".gif"</span>;
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (Type.equals(<span class="hljs-string">"image/png"</span>)) {
fileAddress += <span class="hljs-string">".png"</span>;
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (Type.equals(<span class="hljs-string">"image/jpeg"</span>)) {
fileAddress += <span class="hljs-string">".jpg"</span>;
} <span class="hljs-keyword">else</span> {
System.err.println(<span class="hljs-string">"未知图片格式"</span>);
<span class="hljs-keyword">return</span>;
}
bis = <span class="hljs-keyword">new</span> BufferedInputStream(httpUrl.getInputStream());
fos = <span class="hljs-keyword">new</span> FileOutputStream(fileAddress);
<span class="hljs-keyword">while</span> ((size = bis.read(buf)) != -<span class="hljs-number">1</span>) {
fos.write(buf, <span class="hljs-number">0</span>, size);
}
fos.flush();
System.out.println(<span class="hljs-string">"图片保存成功!地址:"</span> + fileAddress);
} <span class="hljs-keyword">catch</span> (IOException e) {
e.printStackTrace();
} <span class="hljs-keyword">catch</span> (ClassCastException e) {
e.printStackTrace();
} <span class="hljs-keyword">finally</span> {
<span class="hljs-keyword">try</span> {
fos.close();
bis.close();
httpUrl.disconnect();
} <span class="hljs-keyword">catch</span> (IOException e) {
e.printStackTrace();
} <span class="hljs-keyword">catch</span> (NullPointerException e) {
e.printStackTrace();
}
}
}
}
</code></pre>
<blockquote>
<p><strong>获取as和cp参数的js代码</strong></p>
</blockquote>
<pre class="prettyprint"><code class=" hljs javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getParam</span><span class="hljs-params">()</span>{</span>
<span class="hljs-keyword">var</span> asas;
<span class="hljs-keyword">var</span> cpcp;
<span class="hljs-keyword">var</span> t = <span class="hljs-built_in">Math</span>.floor((<span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>).getTime() / <span class="hljs-number">1e3</span>)
, e = t.toString(<span class="hljs-number">16</span>).toUpperCase()
, i = md5(t).toString().toUpperCase();
<span class="hljs-keyword">if</span> (<span class="hljs-number">8</span> != e.length){
asas = <span class="hljs-string">"479BB4B7254C150"</span>;
cpcp = <span class="hljs-string">"7E0AC8874BB0985"</span>;
}<span class="hljs-keyword">else</span>{
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> n = i.slice(<span class="hljs-number">0</span>, <span class="hljs-number">5</span>), o = i.slice(-<span class="hljs-number">5</span>), a = <span class="hljs-string">""</span>, s = <span class="hljs-number">0</span>; <span class="hljs-number">5</span> &gt; s; s++){
a += n[s] + e[s];
}
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> r = <span class="hljs-string">""</span>, c = <span class="hljs-number">0</span>; <span class="hljs-number">5</span> &gt; c; c++){
r += e[c + <span class="hljs-number">3</span>] + o[c];
}
asas = <span class="hljs-string">"A1"</span> + a + e.slice(-<span class="hljs-number">3</span>);
cpcp= e.slice(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>) + r + <span class="hljs-string">"E1"</span>;
}
<span class="hljs-keyword">return</span> <span class="hljs-string">'{"as":"'</span>+asas+<span class="hljs-string">'","cp":"'</span>+cpcp+<span class="hljs-string">'"}'</span>;
}
!<span class="hljs-function"><span class="hljs-keyword">function</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-pi"> "use strict"</span>;
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">t</span><span class="hljs-params">(e, t)</span> {</span>
<span class="hljs-keyword">var</span> n = (<span class="hljs-number">65535</span> &amp; e) + (<span class="hljs-number">65535</span> &amp; t)
, r = (e &gt;&gt; <span class="hljs-number">16</span>) + (t &gt;&gt; <span class="hljs-number">16</span>) + (n &gt;&gt; <span class="hljs-number">16</span>);
<span class="hljs-keyword">return</span> r &lt;&lt; <span class="hljs-number">16</span> | <span class="hljs-number">65535</span> &amp; n
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">n</span><span class="hljs-params">(e, t)</span> {</span>
<span class="hljs-keyword">return</span> e &lt;&lt; t | e &gt;&gt;&gt; <span class="hljs-number">32</span> - t
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">r</span><span class="hljs-params">(e, r, o, i, a, u)</span> {</span>
<span class="hljs-keyword">return</span> t(n(t(t(r, e), t(i, u)), a), o)
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">o</span><span class="hljs-params">(e, t, n, o, i, a, u)</span> {</span>
<span class="hljs-keyword">return</span> r(t &amp; n | ~t &amp; o, e, t, i, a, u)
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">i</span><span class="hljs-params">(e, t, n, o, i, a, u)</span> {</span>
<span class="hljs-keyword">return</span> r(t &amp; o | n &amp; ~o, e, t, i, a, u)
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">a</span><span class="hljs-params">(e, t, n, o, i, a, u)</span> {</span>
<span class="hljs-keyword">return</span> r(t ^ n ^ o, e, t, i, a, u)
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">u</span><span class="hljs-params">(e, t, n, o, i, a, u)</span> {</span>
<span class="hljs-keyword">return</span> r(n ^ (t | ~o), e, t, i, a, u)
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">s</span><span class="hljs-params">(e, n)</span> {</span>
e[n &gt;&gt; <span class="hljs-number">5</span>] |= <span class="hljs-number">128</span> &lt;&lt; n % <span class="hljs-number">32</span>,
e[(n + <span class="hljs-number">64</span> &gt;&gt;&gt; <span class="hljs-number">9</span> &lt;&lt; <span class="hljs-number">4</span>) + <span class="hljs-number">14</span>] = n;
<span class="hljs-keyword">var</span> r, s, c, l, f, p = <span class="hljs-number">1732584193</span>, d = -<span class="hljs-number">271733879</span>, h = -<span class="hljs-number">1732584194</span>, m = <span class="hljs-number">271733878</span>;
<span class="hljs-keyword">for</span> (r = <span class="hljs-number">0</span>; r &lt; e.length; r += <span class="hljs-number">16</span>)
s = p,
c = d,
l = h,
f = m,
p = o(p, d, h, m, e[r], <span class="hljs-number">7</span>, -<span class="hljs-number">680876936</span>),
m = o(m, p, d, h, e[r + <span class="hljs-number">1</span>], <span class="hljs-number">12</span>, -<span class="hljs-number">389564586</span>),
h = o(h, m, p, d, e[r + <span class="hljs-number">2</span>], <span class="hljs-number">17</span>, <span class="hljs-number">606105819</span>),
d = o(d, h, m, p, e[r + <span class="hljs-number">3</span>], <span class="hljs-number">22</span>, -<span class="hljs-number">1044525330</span>),
p = o(p, d, h, m, e[r + <span class="hljs-number">4</span>], <span class="hljs-number">7</span>, -<span class="hljs-number">176418897</span>),
m = o(m, p, d, h, e[r + <span class="hljs-number">5</span>], <span class="hljs-number">12</span>, <span class="hljs-number">1200080426</span>),
h = o(h, m, p, d, e[r + <span class="hljs-number">6</span>], <span class="hljs-number">17</span>, -<span class="hljs-number">1473231341</span>),
d = o(d, h, m, p, e[r + <span class="hljs-number">7</span>], <span class="hljs-number">22</span>, -<span class="hljs-number">45705983</span>),
p = o(p, d, h, m, e[r + <span class="hljs-number">8</span>], <span class="hljs-number">7</span>, <span class="hljs-number">1770035416</span>),
m = o(m, p, d, h, e[r + <span class="hljs-number">9</span>], <span class="hljs-number">12</span>, -<span class="hljs-number">1958414417</span>),
h = o(h, m, p, d, e[r + <span class="hljs-number">10</span>], <span class="hljs-number">17</span>, -<span class="hljs-number">42063</span>),
d = o(d, h, m, p, e[r + <span class="hljs-number">11</span>], <span class="hljs-number">22</span>, -<span class="hljs-number">1990404162</span>),
p = o(p, d, h, m, e[r + <span class="hljs-number">12</span>], <span class="hljs-number">7</span>, <span class="hljs-number">1804603682</span>),
m = o(m, p, d, h, e[r + <span class="hljs-number">13</span>], <span class="hljs-number">12</span>, -<span class="hljs-number">40341101</span>),
h = o(h, m, p, d, e[r + <span class="hljs-number">14</span>], <span class="hljs-number">17</span>, -<span class="hljs-number">1502002290</span>),
d = o(d, h, m, p, e[r + <span class="hljs-number">15</span>], <span class="hljs-number">22</span>, <span class="hljs-number">1236535329</span>),
p = i(p, d, h, m, e[r + <span class="hljs-number">1</span>], <span class="hljs-number">5</span>, -<span class="hljs-number">165796510</span>),
m = i(m, p, d, h, e[r + <span class="hljs-number">6</span>], <span class="hljs-number">9</span>, -<span class="hljs-number">1069501632</span>),
h = i(h, m, p, d, e[r + <span class="hljs-number">11</span>], <span class="hljs-number">14</span>, <span class="hljs-number">643717713</span>),
d = i(d, h, m, p, e[r], <span class="hljs-number">20</span>, -<span class="hljs-number">373897302</span>),
p = i(p, d, h, m, e[r + <span class="hljs-number">5</span>], <span class="hljs-number">5</span>, -<span class="hljs-number">701558691</span>),
m = i(m, p, d, h, e[r + <span class="hljs-number">10</span>], <span class="hljs-number">9</span>, <span class="hljs-number">38016083</span>),
h = i(h, m, p, d, e[r + <span class="hljs-number">15</span>], <span class="hljs-number">14</span>, -<span class="hljs-number">660478335</span>),
d = i(d, h, m, p, e[r + <span class="hljs-number">4</span>], <span class="hljs-number">20</span>, -<span class="hljs-number">405537848</span>),
p = i(p, d, h, m, e[r + <span class="hljs-number">9</span>], <span class="hljs-number">5</span>, <span class="hljs-number">568446438</span>),
m = i(m, p, d, h, e[r + <span class="hljs-number">14</span>], <span class="hljs-number">9</span>, -<span class="hljs-number">1019803690</span>),
h = i(h, m, p, d, e[r + <span class="hljs-number">3</span>], <span class="hljs-number">14</span>, -<span class="hljs-number">187363961</span>),
d = i(d, h, m, p, e[r + <span class="hljs-number">8</span>], <span class="hljs-number">20</span>, <span class="hljs-number">1163531501</span>),
p = i(p, d, h, m, e[r + <span class="hljs-number">13</span>], <span class="hljs-number">5</span>, -<span class="hljs-number">1444681467</span>),
m = i(m, p, d, h, e[r + <span class="hljs-number">2</span>], <span class="hljs-number">9</span>, -<span class="hljs-number">51403784</span>),
h = i(h, m, p, d, e[r + <span class="hljs-number">7</span>], <span class="hljs-number">14</span>, <span class="hljs-number">1735328473</span>),
d = i(d, h, m, p, e[r + <span class="hljs-number">12</span>], <span class="hljs-number">20</span>, -<span class="hljs-number">1926607734</span>),
p = a(p, d, h, m, e[r + <span class="hljs-number">5</span>], <span class="hljs-number">4</span>, -<span class="hljs-number">378558</span>),
m = a(m, p, d, h, e[r + <span class="hljs-number">8</span>], <span class="hljs-number">11</span>, -<span class="hljs-number">2022574463</span>),
h = a(h, m, p, d, e[r + <span class="hljs-number">11</span>], <span class="hljs-number">16</span>, <span class="hljs-number">1839030562</span>),
d = a(d, h, m, p, e[r + <span class="hljs-number">14</span>], <span class="hljs-number">23</span>, -<span class="hljs-number">35309556</span>),
p = a(p, d, h, m, e[r + <span class="hljs-number">1</span>], <span class="hljs-number">4</span>, -<span class="hljs-number">1530992060</span>),
m = a(m, p, d, h, e[r + <span class="hljs-number">4</span>], <span class="hljs-number">11</span>, <span class="hljs-number">1272893353</span>),
h = a(h, m, p, d, e[r + <span class="hljs-number">7</span>], <span class="hljs-number">16</span>, -<span class="hljs-number">155497632</span>),
d = a(d, h, m, p, e[r + <span class="hljs-number">10</span>], <span class="hljs-number">23</span>, -<span class="hljs-number">1094730640</span>),
p = a(p, d, h, m, e[r + <span class="hljs-number">13</span>], <span class="hljs-number">4</span>, <span class="hljs-number">681279174</span>),
m = a(m, p, d, h, e[r], <span class="hljs-number">11</span>, -<span class="hljs-number">358537222</span>),
h = a(h, m, p, d, e[r + <span class="hljs-number">3</span>], <span class="hljs-number">16</span>, -<span class="hljs-number">722521979</span>),
d = a(d, h, m, p, e[r + <span class="hljs-number">6</span>], <span class="hljs-number">23</span>, <span class="hljs-number">76029189</span>),
p = a(p, d, h, m, e[r + <span class="hljs-number">9</span>], <span class="hljs-number">4</span>, -<span class="hljs-number">640364487</span>),
m = a(m, p, d, h, e[r + <span class="hljs-number">12</span>], <span class="hljs-number">11</span>, -<span class="hljs-number">421815835</span>),
h = a(h, m, p, d, e[r + <span class="hljs-number">15</span>], <span class="hljs-number">16</span>, <span class="hljs-number">530742520</span>),
d = a(d, h, m, p, e[r + <span class="hljs-number">2</span>], <span class="hljs-number">23</span>, -<span class="hljs-number">995338651</span>),
p = u(p, d, h, m, e[r], <span class="hljs-number">6</span>, -<span class="hljs-number">198630844</span>),
m = u(m, p, d, h, e[r + <span class="hljs-number">7</span>], <span class="hljs-number">10</span>, <span class="hljs-number">1126891415</span>),
h = u(h, m, p, d, e[r + <span class="hljs-number">14</span>], <span class="hljs-number">15</span>, -<span class="hljs-number">1416354905</span>),
d = u(d, h, m, p, e[r + <span class="hljs-number">5</span>], <span class="hljs-number">21</span>, -<span class="hljs-number">57434055</span>),
p = u(p, d, h, m, e[r + <span class="hljs-number">12</span>], <span class="hljs-number">6</span>, <span class="hljs-number">1700485571</span>),
m = u(m, p, d, h, e[r + <span class="hljs-number">3</span>], <span class="hljs-number">10</span>, -<span class="hljs-number">1894986606</span>),
h = u(h, m, p, d, e[r + <span class="hljs-number">10</span>], <span class="hljs-number">15</span>, -<span class="hljs-number">1051523</span>),
d = u(d, h, m, p, e[r + <span class="hljs-number">1</span>], <span class="hljs-number">21</span>, -<span class="hljs-number">2054922799</span>),
p = u(p, d, h, m, e[r + <span class="hljs-number">8</span>], <span class="hljs-number">6</span>, <span class="hljs-number">1873313359</span>),
m = u(m, p, d, h, e[r + <span class="hljs-number">15</span>], <span class="hljs-number">10</span>, -<span class="hljs-number">30611744</span>),
h = u(h, m, p, d, e[r + <span class="hljs-number">6</span>], <span class="hljs-number">15</span>, -<span class="hljs-number">1560198380</span>),
d = u(d, h, m, p, e[r + <span class="hljs-number">13</span>], <span class="hljs-number">21</span>, <span class="hljs-number">1309151649</span>),
p = u(p, d, h, m, e[r + <span class="hljs-number">4</span>], <span class="hljs-number">6</span>, -<span class="hljs-number">145523070</span>),
m = u(m, p, d, h, e[r + <span class="hljs-number">11</span>], <span class="hljs-number">10</span>, -<span class="hljs-number">1120210379</span>),
h = u(h, m, p, d, e[r + <span class="hljs-number">2</span>], <span class="hljs-number">15</span>, <span class="hljs-number">718787259</span>),
d = u(d, h, m, p, e[r + <span class="hljs-number">9</span>], <span class="hljs-number">21</span>, -<span class="hljs-number">343485551</span>),
p = t(p, s),
d = t(d, c),
h = t(h, l),
m = t(m, f);
<span class="hljs-keyword">return</span> [p, d, h, m]
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">c</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">var</span> t, n = <span class="hljs-string">""</span>;
<span class="hljs-keyword">for</span> (t = <span class="hljs-number">0</span>; t &lt; <span class="hljs-number">32</span> * e.length; t += <span class="hljs-number">8</span>)
n += <span class="hljs-built_in">String</span>.fromCharCode(e[t &gt;&gt; <span class="hljs-number">5</span>] &gt;&gt;&gt; t % <span class="hljs-number">32</span> &amp; <span class="hljs-number">255</span>);
<span class="hljs-keyword">return</span> n
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">l</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">var</span> t, n = [];
<span class="hljs-keyword">for</span> (n[(e.length &gt;&gt; <span class="hljs-number">2</span>) - <span class="hljs-number">1</span>] = <span class="hljs-keyword">void</span> <span class="hljs-number">0</span>,
t = <span class="hljs-number">0</span>; t &lt; n.length; t += <span class="hljs-number">1</span>)
n[t] = <span class="hljs-number">0</span>;
<span class="hljs-keyword">for</span> (t = <span class="hljs-number">0</span>; t &lt; <span class="hljs-number">8</span> * e.length; t += <span class="hljs-number">8</span>)
n[t &gt;&gt; <span class="hljs-number">5</span>] |= (<span class="hljs-number">255</span> &amp; e.charCodeAt(t / <span class="hljs-number">8</span>)) &lt;&lt; t % <span class="hljs-number">32</span>;
<span class="hljs-keyword">return</span> n
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">f</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">return</span> c(s(l(e), <span class="hljs-number">8</span> * e.length))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">p</span><span class="hljs-params">(e, t)</span> {</span>
<span class="hljs-keyword">var</span> n, r, o = l(e), i = [], a = [];
<span class="hljs-keyword">for</span> (i[<span class="hljs-number">15</span>] = a[<span class="hljs-number">15</span>] = <span class="hljs-keyword">void</span> <span class="hljs-number">0</span>,
o.length &gt; <span class="hljs-number">16</span> &amp;&amp; (o = s(o, <span class="hljs-number">8</span> * e.length)),
n = <span class="hljs-number">0</span>; <span class="hljs-number">16</span> &gt; n; n += <span class="hljs-number">1</span>)
i[n] = <span class="hljs-number">909522486</span> ^ o[n],
a[n] = <span class="hljs-number">1549556828</span> ^ o[n];
<span class="hljs-keyword">return</span> r = s(i.concat(l(t)), <span class="hljs-number">512</span> + <span class="hljs-number">8</span> * t.length),
c(s(a.concat(r), <span class="hljs-number">640</span>))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">d</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">var</span> t, n, r = <span class="hljs-string">"0123456789abcdef"</span>, o = <span class="hljs-string">""</span>;
<span class="hljs-keyword">for</span> (n = <span class="hljs-number">0</span>; n &lt; e.length; n += <span class="hljs-number">1</span>)
t = e.charCodeAt(n),
o += r.charAt(t &gt;&gt;&gt; <span class="hljs-number">4</span> &amp; <span class="hljs-number">15</span>) + r.charAt(<span class="hljs-number">15</span> &amp; t);
<span class="hljs-keyword">return</span> o
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">h</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">return</span> <span class="hljs-built_in">unescape</span>(<span class="hljs-built_in">encodeURIComponent</span>(e))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">m</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">return</span> f(h(e))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">g</span><span class="hljs-params">(e)</span> {</span>
<span class="hljs-keyword">return</span> d(m(e))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">v</span><span class="hljs-params">(e, t)</span> {</span>
<span class="hljs-keyword">return</span> p(h(e), h(t))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">y</span><span class="hljs-params">(e, t)</span> {</span>
<span class="hljs-keyword">return</span> d(v(e, t))
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">b</span><span class="hljs-params">(e, t, n)</span> {</span>
<span class="hljs-keyword">return</span> t ? n ? v(t, e) : y(t, e) : n ? m(e) : g(e)
}
<span class="hljs-string">"function"</span> == <span class="hljs-keyword">typeof</span> define &amp;&amp; define.amd ? define(<span class="hljs-string">"static/js/lib/md5"</span>, [<span class="hljs-string">"require"</span>], <span class="hljs-function"><span class="hljs-keyword">function</span><span class="hljs-params">()</span> {</span>
<span class="hljs-keyword">return</span> b
}) : <span class="hljs-string">"object"</span> == <span class="hljs-keyword">typeof</span> module &amp;&amp; module.exports ? module.exports = b : e.md5 = b
}(<span class="hljs-keyword">this</span>)</code></pre>
<h2 id="五最后">五.最后</h2>
<blockquote>
<p>我还发现了头条有个简约版,研究后发现这个简约版应该更好爬一些。</p>
</blockquote>
<p><img src="http://img.blog.csdn.net/20161226230850348?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcXFfMjA5NTQ5NTk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>
<blockquote>
<p>访问的格式是p+页码,直接读取每页里面的链接,就可以进行爬取了,就不再通过json串来获取文章地址,也不需要传什么限制参数,在本项目上稍加改动就可以了</p>
</blockquote>
<p><img src="http://img.blog.csdn.net/20161226230958084?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcXFfMjA5NTQ5NTk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>
<p><img src="http://img.blog.csdn.net/20161226231623684?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcXFfMjA5NTQ5NTk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>
<h2 id="六just-do-it">六.JUST DO IT</h2>
<blockquote>
<p>。。。。。。。。。。。。。。。。。。。。。。</p>
</blockquote></div></body>
</html>
Java
1
https://gitee.com/z77z/TouTiaoCrawler.git
git@gitee.com:z77z/TouTiaoCrawler.git
z77z
TouTiaoCrawler
TouTiaoCrawler
master

搜索帮助