Score
0
Watch 85 Star 230 Fork 62

GVP科学大数据开源社区 / PiFlowScalaBSD-2-Clause

Join us
Explore and code with more than 2 million developers,Free private repositories !:)
Sign up
混合型科学大数据流水线系统,包含丰富的处理器组件,提供Shell、DSL、Web配置界面、任务调度、任务监控等功能 spread retract

https://github.com/cas-bigdatalab/piflow

Clone or download
Loading...
README.md


GitHub releases GitHub stars GitHub forks GitHub downloads GitHub issues GitHub license

πFlow is an easy to use, powerful big data pipeline system.

Table of Contents

Features

  • Easy to use
    • provide a WYSIWYG web interface to configure data flow
    • monitor data flow status
    • check the logs of data flow
    • provide checkpoints
  • Strong scalability:
    • Support customized development of data processing components
  • Superior performance
    • based on distributed computing engine Spark
  • Powerful
    • 100+ data processing components available
    • include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、json,etc.

Architecture

Requirements

  • JDK 1.8 or newer
  • Apache Maven 3.1.0 or newer
  • Git Client (used during build process by 'bower' plugin)
  • Spark-2.1.0
  • Hadoop-2.6.0
  • Hive-1.2.1

Getting Started

To Build: mvn clean package -Dmaven.test.skip=true

      [INFO] Replacing original artifact with shaded artifact.
      [INFO] Replacing /opt/project/piflow/piflow-server/target/piflow-server-0.9.jar with /opt/project/piflow/piflow-server/target/piflow-server-0.9-shaded.jar
      [INFO] ------------------------------------------------------------------------
      [INFO] Reactor Summary:
      [INFO] 
      [INFO] piflow-project ..................................... SUCCESS [  4.602 s]
      [INFO] piflow-core ........................................ SUCCESS [ 56.533 s]
      [INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
      [INFO] piflow-server ...................................... SUCCESS [03:01 min]
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD SUCCESS
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time: 06:18 min
      [INFO] Finished at: 2018-12-24T16:54:16+08:00
      [INFO] Final Memory: 41M/812M
      [INFO] ------------------------------------------------------------------------

To Run Piflow Server:

  • run piflow server on intellij:

    • edit config.properties
    • build piflow to generate piflow-server.jar
    • main class is cn.piflow.api.Main(remember to set SPARK_HOME)
  • run piflow server by release version:

  • how to configure config.properties

    #server ip and port
    server.ip=10.0.86.191
    server.port=8002
    h2.port=50002
    
    #spark and yarn config
    spark.master=yarn
    spark.deploy.mode=cluster
    yarn.resourcemanager.hostname=10.0.86.191
    yarn.resourcemanager.address=10.0.86.191:8032
    yarn.access.namenode=hdfs://10.0.86.191:9000
    yarn.stagingDir=hdfs://10.0.86.191:9000/tmp/
    yarn.jars=hdfs://10.0.86.191:9000/user/spark/share/lib/*.jar
    yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/
    
    #hive config
    hive.metastore.uris=thrift://10.0.86.191:9083
    
    #piflow-server.jar path
    piflow.bundle=/opt/piflowServer/piflow-server-0.9.jar
    
    #checkpoint hdfs path
    checkpoint.path=hdfs://10.0.86.89:9000/piflow/checkpoints/
    
    #debug path
    debug.path=hdfs://10.0.88.191:9000/piflow/debug/
    
    #yarn url
    yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/
    
    #the count of data shown in log
    data.show=10
    
    #h2 db port
    h2.port=50002

To Run Piflow Web:

To Use:

  • command line
    • flow config example

      {
        "flow":{
        "name":"test",
        "uuid":"1234",
        "checkpoint":"Merge",
        "stops":[
        {
          "uuid":"1111",
          "name":"XmlParser",
          "bundle":"cn.piflow.bundle.xml.XmlParser",
          "properties":{
              "xmlpath":"hdfs://10.0.86.89:9000/xjzhu/dblp.mini.xml",
              "rowTag":"phdthesis"
          }
        },
        {
          "uuid":"2222",
          "name":"SelectField",
          "bundle":"cn.piflow.bundle.common.SelectField",
          "properties":{
              "schema":"title,author,pages"
          }
      
        },
        {
          "uuid":"3333",
          "name":"PutHiveStreaming",
          "bundle":"cn.piflow.bundle.hive.PutHiveStreaming",
          "properties":{
              "database":"sparktest",
              "table":"dblp_phdthesis"
          }
        },
        {
          "uuid":"4444",
          "name":"CsvParser",
          "bundle":"cn.piflow.bundle.csv.CsvParser",
          "properties":{
              "csvPath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.csv",
              "header":"false",
              "delimiter":",",
              "schema":"title,author,pages"
          }
        },
        {
          "uuid":"555",
          "name":"Merge",
          "bundle":"cn.piflow.bundle.common.Merge",
          "properties":{
            "inports":"data1,data2"
          }
        },
        {
          "uuid":"666",
          "name":"Fork",
          "bundle":"cn.piflow.bundle.common.Fork",
          "properties":{
            "outports":"out1,out2,out3"
          }
        },
        {
          "uuid":"777",
          "name":"JsonSave",
          "bundle":"cn.piflow.bundle.json.JsonSave",
          "properties":{
            "jsonSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.json"
          }
        },
        {
          "uuid":"888",
          "name":"CsvSave",
          "bundle":"cn.piflow.bundle.csv.CsvSave",
          "properties":{
            "csvSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis_result.csv",
            "header":"true",
            "delimiter":","
          }
        }
      ],
      "paths":[
        {
          "from":"XmlParser",
          "outport":"",
          "inport":"",
          "to":"SelectField"
        },
        {
          "from":"SelectField",
          "outport":"",
          "inport":"data1",
          "to":"Merge"
        },
        {
          "from":"CsvParser",
          "outport":"",
          "inport":"data2",
          "to":"Merge"
        },
        {
          "from":"Merge",
          "outport":"",
          "inport":"",
          "to":"Fork"
        },
        {
          "from":"Fork",
          "outport":"out1",
          "inport":"",
          "to":"PutHiveStreaming"
        },
        {
          "from":"Fork",
          "outport":"out2",
          "inport":"",
          "to":"JsonSave"
        },
        {
          "from":"Fork",
          "outport":"out3",
          "inport":"",
          "to":"CsvSave"
        }
      ]

      } }

    • curl -0 -X POST http://10.0.86.191:8002/flow/start -H "Content-type: application/json" -d 'this is your flow json'

  • piflow web: try with "http://piflow.ml/piflow-web", user/password: admin/admin

Welcome to join PiFlow User Group! Contact:吴老师 18910263390 wzs@cnic.cn

Comments ( 9 )

Sign in for post a comment

Scala
1
https://git.oschina.net/opensci/piflow.git
git@git.oschina.net:opensci/piflow.git
opensci
piflow
PiFlow
master

Help Search

191139_cd20d5fd_5186603 191143_ebef6f8d_5186603