3 Star 7 Fork 2

轩少 / streaming-edu360

Create your Gitee Account
Explore and code with more than 8 million developers,Free private repositories !:)
Sign up
The current repository's web page is accessible only to members. If you need to make it public, please ask author 轩少 to submit applications.
Clone or Download
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README.md

大数据实战训练系列丛书--实战案例玩转Spark--StreamingETL项目


Ngxin日志采集到Kafka,Streming消费到hdfs。

  • 1.logstash采集
  • 2.缓冲kafka
  • 3.streaming消费,etl部分grok正则。
  • 4.存hdfs
  • 5.hive外部表指向hdfs路径,按天分区。

参考

1.携程hangout

2.在线正则grok


相关命令

1.打包:sbt clean assembly

2.提交:spark-submit --queue root.bigdata.streaming --class "cn.edu360.streaming.NginxLogToHive" --name "edu360streaming3" --executor-cores 1 --num-executors 20 --master yarn-cluster streaming-edu360-assembly-1.0.jar NginxLogToHive

3.killjob:yarn application -kill applicationId

4.查看zk中offsets信息:kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper zkhost --topic topicName --group groupId

5.手动更新zk中offsets值:set /consumers/groupId/offsets/topicName/分区 值


Repository Comments ( 0 )

Sign in to post a comment

About

大数据实战训练系列丛书--实战案例玩转Spark--StreamingETL项目 expand collapse
Scala
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
Scala
1
https://git.oschina.net/wangzhixuan/streaming-edu360.git
git@git.oschina.net:wangzhixuan/streaming-edu360.git
wangzhixuan
streaming-edu360
streaming-edu360
master

Search

103111 552b83b3 1850385 103110 ed87a847 1850385