慕课网日志分析项目
- 输入格式
218.75.35.226 - - [11/05/2017:08:07:35 +0800] "POST /api3/getadv HTTP/1.1" 200 407 "http://www.imooc.com/article/17891" "-" cid=0×tamp=1455254555&uid=5844555
- 输出格式:“date url traffic ip”
2017-05-11 08:07:35 http://www.imooc.com/article/17891 407 218.75.35.226
- 输入格式:“date url traffic ip”
2017-05-11 08:07:35 http://www.imooc.com/article/17891 407 218.75.35.226
- 输出格式:“url courseType courseId traffic ip city time day”
http://www.imooc.com/article/17891 article 17891 407 218.75.35.226 北京 08:07:35 2017-05-11
(1) 打包
- File → Project Structure → Artifacts → “+” → JAR → From modules with dependencies → Main Class:CleanYarn → OK
- Build → Build Artifacts → Build
(2) 上传"Imooc_SparkSQL.jar、ipDatabase.csv、ipResgion.xlsx、format日志文件"到linux的/home/hadoop/imooc/
(3) 启动hadoop,上传format日志文件到hdfs的/imooc/input/
(4) 导入hadoop路径
- 方式一:执行命令export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop/etc/hadoop
- 方式二:spark/conf/spark-env.sh配置文件中添加:export YARN_CONF_DIR=/home/hadoop/apps/hadoop/etc/hadoop
(5) 提交作业
[hadoop@mini1 spark]$ bin/spark-submit
--class main.CleanYarn
--name CleanYarn
--master yarn
--executor-memory 1G
--num-executors 1
--files /home/hadoop/imooc/ipDatabase.csv,/home/hadoop/imooc/ipRegion.xlsx
/home/hadoop/imooc/Imooc_SparkSQL.jar
hdfs://mini1:9000/imooc/input/*
hdfs://mini1:9000/imooc/clean
(6) hdfs查看运行结果,即产生clean目录文件
(1) 打包
- Build → Build Artifacts → Edit → Main Class:TopNYarn → OK
- Build → Build Artifacts → Rebuild
(2) 删除旧jar包,重新上传新jar包到linux的/home/hadoop/imooc/
(3) linux中启动mysql,创建数据库和相应表
(4) 提交作业
[hadoop@mini1 spark]$ bin/spark-submit
--class main.TopNYarn
--name TopNYarn
--master yarn
--executor-memory 1G
--num-executors 1
/home/hadoop/imooc/Imooc_SparkSQL.jar
hdfs://mini1:9000/imooc/clean 2017-05-11
(5) linux的mysql中查看运行结果,即插入数据到表
- yarn mini1:8088
- hdfs mini1:50070
D:\ipdatabase>mvn install:install-file
-Dfile=D:\ipdatabase\target\ipdatabase-1.0-SNAPSHOT.jar
-DgroupId=com.ggstar
-DartifactId=ipdatabase
-Dversion=1.0
-Dpackaging=jar
<groupId>com.ggstar</groupId>
<artifactId>ipdatabase</artifactId>
<version>1.0</version>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.14</version>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.14</version>
mysql> create database imooc;
mysql> create table day_top (
day varchar(10) not null,
courseId bigint(10) not null,
times bigint(10) not null,
primary key (day,courseId)
);
mysql> create table city_top(
day varchar(10) not null,
courseId bigint(10) not null,
city varchar(10) not null,
times bigint(10) not null,
timesRank int not null,
primary key (day,courseId,city)
);
mysql> create table traffic_top(
day varchar(10) not null,
courseId bigint(10) not null,
traffics bigint(10) not null,
primary key (day, courseId)
);
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。