12 Star 88 Fork 30

奇虎360 / XLearning

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
configure.md 9.93 KB
一键复制 编辑 原始数据 按行查看 历史
jiarunying 提交于 2019-05-14 19:23 . update the version of the doc

Configuration

中文文档

The default value of each configuration can be modified by setting the corresponding properties in the $XLEARNING_HOME/conf/xlearning-site.xml at the XLearning client or the parameter of --conf when submitting the application.

Application Configuration

Property Name Default Meaning
xlearning.am.memory 1024MB amount of memory to use for the AM process
xlearning.am.cores 1 number of cores to use for the AM process
xlearning.worker.num 1 number of worker containers to use for the application
xlearning.worker.memory 1024MB amount of memory to use for the worker process
xlearning.worker.cores 1 number of cores to use for the worker process
xlearning.chief.worker.memory 1024 amount of memory for chief worker,especially for the index 0 worker of the TensorFlow application, default as the setting of the worker memory.
xlearning.evaluator.worker.memory 1024 amount of memory for evaluator worker, especially for the TensorFlow Estimator application, default as the setting of the worker memory.
xlearning.ps.num 0 number of ps containers to use for the application
xlearning.ps.memory 1024MB amount of memory to use for the ps process
xlearning.ps.cores 1 number of cores to use for the ps process
xlearning.app.queue DEFAULT the queue which application submitted to
xlearning.app.priority 3 the priority of the application, divided into level 0 to 5, corresponding to DEFAULT, VERY_LOW, LOW, NORMAL, HIGH, VERY_HIGH
xlearning.input.strategy DOWNLOAD loading strategy of input file, including DOWNLOAD, STREAM, PLACEHOLDER
xlearning.inputfile.rename false whether to rename the download file in the DOWNLOAD strategy of input file
xlearning.stream.epoch 1 the number of the input file loading in the STREAM strategy of input file
xlearning.input.stream.shuffle false whether to shuffle the input splits in the STREAM strategy of input file
xlearning.inputformat.class org.apache.hadoop.mapred.TextInputFormat.class which inputformat implementation to use in the STREAM strategy of input file
xlearning.inputformat.cache false whether cache the inputformat file to local when the stream epoch longer than 1
xlearning.inputformat.cachefile.name inputformatCache.gz the local cache file name for inputformat
xlearning.inputformat.cachesize.limit 100*1024 the limit size of the local cache file (in MB)
xlearning.output.local.dir output If the local output path is not specified, the local directory of the output file is the default value.
xlearning.output.strategy UPLOAD loading strategy of output file, including DOWNLOAD, STREAM
xlearning.outputformat.class TextMultiOutputFormat.class which outputformat implementation to use in the STREAM strategy of output file
xlearning.interresult.dir /interResult_ specify the HDFS subdirectory that the intermediate output file upload to
xlearning.interresult.upload.timeout 30 * 60 * 1000 upload timeout to save the intermediate output (in milliseconds)
xlearning.interresult.save.inc false increment upload the intermediate output file, default not (upload all output file each time)
xlearning.tf.evaluator false whether to set the last worker as evaluator of the distributed TensorFlow job type for the estimator api
xlearning.tf.distribution.strategy false whether use the distribution strategy API for the TensorFlow, default as false

Board Service Configuration

Property Name Default Meaning
xlearning.tf.board.enable true If set to false, Board service is not necessary
xlearning.tf.board.worker.index 0 the index of the worker which start the service of Board
xlearning.tf.board.log.dir eventLog the directory saving TensorBoard event log
xlearning.tf.board.history.dir /tmp/XLearning/eventLog specify the HDFS path which the TensorBoard event log upload to
xlearning.tf.board.reload.interval 1 how often the backend should load more data of event log (in seconds) for tensorboard
xlearning.board.modelpb "" model proto in ONNX format for VisualDL
xlearning.board.cache.timeout 20 memory cache timeout duration in seconds for VisualDL
xlearning.tf.board.path tensorboard the path of the tensorboard
xlearning.board.path visualDL the path of the visualDL

System Configuration

Property Name Default Meaning
xlearning.container.extra.java.opts "" A string of extra JVM options to pass to ApplicationMaster to launch container
xlearning.allocate.interval 1000ms interval between the AM get the container assigned state from RM
xlearning.status.update.interval 1000ms interval between the AM report the state to RM
xlearning.task.timeout 5 * 60 * 1000 communication timeout between the AM and container (in milliseconds)
xlearning.task.timeout.check.interval 3 * 1000 how often the AM check the timeout of the container (in milliseconds)
xlearning.localresource.timeout 5 * 60 * 1000 set the timeout of the download the localResources (in milliseconds)
xlearning.messages.len.max 1000 Maximum size (in bytes) of message queue
xlearning.execute.node.limit 200 Maximum number of nodes that application use
xlearning.staging.dir /tmp/XLearning/staging HDFS directory that application local resources upload to
xlearning.cleanup.enable true whether delete the resources after the application finished
xlearning.container.maxFailures.rate 0.5 maximum percentage of the failure containers
xlearning.download.file.retry 3 Maximum number of retries for the input file download when the strategy of input file is DOWNLOAD
xlearning.download.file.thread.nums 10 number of download threads of the input file in the strategy of DOWNLOAD
xlearning.upload.output.thread.nums 10 number of upload threads of the output file in the strategy of UPLOAD
xlearning.container.heartbeat.interval 10 * 1000 interval between each container to the AM (in milliseconds)
xlearning.container.heartbeat.retry 3 Maximum number of retries for the container send the heartbeat to the AM
xlearning.container.update.appstatus.interval 3 * 1000 how often the containers get the state of the application process (in milliseconds)
xlearning.container.auto.create.output.dir true If set to true, the containers create the local output path automatically
xlearning.log.pull.interval 10000 interval between the client get the log output of the AM (in milliseconds)
xlearning.user.classpath.first true whether user job jar should be the first one on class path or not.
xlearning.worker.mem.autoscale 0.5 automatic memory scale ratio of worker when application retry after failed.
xlearning.ps.mem.autoscale 0.2 automatic memory scale ratio of ps when application retry after failed.
xlearning.app.max.attempts 1 the number of application attempts, default not retry after failed.
xlearning.report.container.status true whether the client report the status of the container.
xlearning.env.maxlength 102400 the maximum length of environment variable when container execute the user program.
xlearning.am.env.[EnvironmentVariableName] (none) Add the environment variable specified by EnvironmentVariableName to the AM process. The user can specify multiple of these to set multiple environment variables.
xlearning.container.env.[EnvironmentVariableName] (none) Add the environment variable specified by EnvironmentVariableName to the Container process. The user can specify multiple of these to set multiple environment variables.
xlearning.am.nodeLabelExpression (none) A YARN node label expression that restricts the set of nodes AM will be scheduled on.
xlearning.worker.nodeLabelExpression (none) A YARN node label expression that restricts the set of nodes Worker will be scheduled on.
xlearning.ps.nodeLabelExpression (none) A YARN node label expression that restricts the set of nodes PS will be scheduled on.

History Configuration

Property Name Default Meaning
xlearning.history.log.dir /tmp/XLearning/history the HDFS directory that saves the history log
xlearning.history.log.delete-monitor-time-interval 24 * 60 * 60 * 1000 set the time interval by which the application history logs will be checked to clean (in milliseconds)
xlearning.history.log.max-age-ms 24 * 60 * 60 * 1000 how long the history log can be saved (in milliseconds)
xlearning.history.port 10021 port for the history service
xlearning.history.address 0.0.0.0:10021 address for the history service
xlearning.history.webapp.port 19886 port for the history http web service
xlearning.history.webapp.address 0.0.0.0:19886 address for the history http web service
xlearning.history.webapp.https.port 19885 port for the history https web service
xlearning.history.webapp.https.address 0.0.0.0:19885 address for the history https web service

MPI Configuration

Property Name Default Meaning
xlearning.mpi.install.dir /usr/local/openmpi the installation path of the openmpi
xlearning.mpi.extra.ld.library.path (none) the extra library path that openmpi need
xlearning.mpi.container.update.status.retry 3 the retry times for the container status update

Docker Configuration

Property Name Default Meaning
xlearning.container.type yarn container running type
xlearning.docker.registry.host (none) docker register host
xlearning.docker.registry.port (none) docker register port
xlearning.docker.image (none) docker image name
xlearning.docker.worker.dir /work the work dir of the docker container
Java
1
https://gitee.com/qihoo360/XLearning.git
git@gitee.com:qihoo360/XLearning.git
qihoo360
XLearning
XLearning
master

搜索帮助