1 Star 0 Fork 3

AllenBric / streaming-edu360

forked from 轩少 / streaming-edu360 
Create your Gitee Account
Explore and code with more than 8 million developers,Free private repositories !:)
Sign up
Clone or Download
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README.md

大数据实战训练系列丛书--实战案例玩转Spark--StreamingETL项目


Ngxin日志采集到Kafka,Streming消费到hdfs。

  • 1.logstash采集
  • 2.缓冲kafka
  • 3.streaming消费,etl部分grok正则。
  • 4.存hdfs
  • 5.hive外部表指向hdfs路径,按天分区。

参考

1.携程hangout

2.在线正则grok


相关命令

1.打包:sbt clean assembly

2.提交:spark-submit --queue root.bigdata.streaming --class "cn.edu360.streaming.NginxLogToHive" --name "edu360streaming3" --executor-cores 1 --num-executors 20 --master yarn-cluster streaming-edu360-assembly-1.0.jar NginxLogToHive

3.killjob:yarn application -kill applicationId

4.查看zk中offsets信息:kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper zkhost --topic topicName --group groupId

5.手动更新zk中offsets值:set /consumers/groupId/offsets/topicName/分区 值


Repository Comments ( 0 )

Sign in to post a comment

About

大数据实战训练系列丛书--实战案例玩转Spark--StreamingETL项目 expand collapse
Scala
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
Scala
1
https://git.oschina.net/AllenBric/streaming-edu360.git
git@git.oschina.net:AllenBric/streaming-edu360.git
AllenBric
streaming-edu360
streaming-edu360
master

Search