13 Star 93 Fork 32

PaddlePaddle / PaddleRec

 / 详情

TypeError: 'NoneType' object is not iterable

待办的
创建于  
2020-10-15 16:44

eRec/models/rank/dnn/data/get_slot_data.py
错误信息如下:
cat train_data/part-0 | python get_slot_data.py > slot_train_data/part-0
Traceback (most recent call last):
File "get_slot_data.py", line 70, in
d.run_from_stdin()
File "./miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/incubate/data_generator/init.py", line 128, in run_from_stdin
for user_parsed_line in line_iter():
TypeError: 'NoneType' object is not iterable

辛苦帮忙看一下

评论 (3)

xmuyong 创建了任务
xmuyong 关联仓库设置为PaddlePaddle/PaddleRec
xmuyong 修改了描述
展开全部操作日志

可以提供一下运行模型的yaml文件吗~尝试复现一下

我都是从 gitee 上直接 copy 过来的,我详细说一下:

  1. ./PaddleRec/models/rank/dnn/config.yaml 内容:

Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

workspace

workspace: "models/rank/dnn"

list of dataset

dataset:

  • name: dataloader_train # name of dataset to distinguish different datasets
    batch_size: 2
    type: DataLoader # or QueueDataset
    data_path: "{workspace}/data/sample_data/train"
    sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
    dense_slots: "dense_var:13"
  • name: dataset_train # name of dataset to distinguish different datasets
    batch_size: 2
    type: QueueDataset # or DataLoader
    data_path: "{workspace}/data/sample_data/train"
    sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
    dense_slots: "dense_var:13"
  • name: dataset_infer # name
    batch_size: 2
    type: DataLoader # or QueueDataset
    data_path: "{workspace}/data/sample_data/train"
    sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
    dense_slots: "dense_var:13"

hyper parameters of user-defined network

hyper_parameters:

optimizer config

optimizer:
class: Adam
learning_rate: 0.001
strategy: async

user-defined <key, value> pairs

sparse_inputs_slots: 27
sparse_feature_number: 1000001
sparse_feature_dim: 9
dense_input_dim: 13
fc_sizes: [512, 256, 128, 32]
distributed_embedding: 0

select runner by name

mode: [single_cpu_train, single_cpu_infer]

config of each runner.

runner is a kind of paddle training class, which wraps the train/infer process.

runner:

  • name: single_cpu_train
    class: train

    num of epochs

    epochs: 4

    device to run training or infer

    device: cpu
    save_checkpoint_interval: 2 # save model interval of epochs
    save_inference_interval: 4 # save inference
    save_checkpoint_path: "increment_dnn" # save checkpoint path
    save_inference_path: "inference" # save inference path
    save_inference_feed_varnames: [] # feed vars of save inference
    save_inference_fetch_varnames: [] # fetch vars of save inference
    print_interval: 10
    phases: [phase1]

  • name: single_cpu_infer
    class: infer

    num of epochs

    epochs: 1

    device to run training or infer

    device: cpu
    init_model_path: "increment_dnn" # load model path
    phases: [phase2]

  • name: ps_cluster
    class: cluster_train
    epochs: 2
    device: cpu
    fleet_mode: ps
    save_checkpoint_interval: 1 # save model interval of epochs
    save_checkpoint_path: "increment_dnn" # save checkpoint path
    init_model_path: "" # load model path
    print_interval: 1
    phases: [phase1]

  • name: online_learning_cluster
    class: cluster_train
    runner_class_path: "{workspace}/online_learning_runner.py"
    epochs: 2
    device: cpu
    fleet_mode: ps
    save_checkpoint_interval: 1 # save model interval of epochs
    save_checkpoint_path: "increment_dnn" # save checkpoint path
    init_model_path: "" # load model path
    print_interval: 1
    phases: [phase1]

  • name: collective_cluster
    class: cluster_train
    epochs: 2
    device: gpu
    fleet_mode: collective
    save_checkpoint_interval: 1 # save model interval of epochs
    save_checkpoint_path: "increment_dnn" # save checkpoint path
    init_model_path: "" # load model path
    print_interval: 1
    phases: [phase1]

  • name: single_multi_gpu_train
    class: train

    num of epochs

    epochs: 1

    device to run training or infer

    device: gpu
    selected_gpus: "0,1" # 选择多卡执行训练
    save_checkpoint_interval: 1 # save model interval of epochs
    save_inference_interval: 4 # save inference
    save_step_interval: 1
    save_checkpoint_path: "increment_dnn" # save checkpoint path
    save_inference_path: "inference" # save inference path
    save_step_path: "step_save"
    save_inference_feed_varnames: [] # feed vars of save inference
    save_inference_fetch_varnames: [] # fetch vars of save inference
    print_interval: 1
    phases: [phase1]

runner will run all the phase in each epoch

phase:

  • name: phase1
    model: "{workspace}/model.py" # user-defined model
    dataset_name: dataloader_train # select dataset by name
    thread_num: 1

  • name: phase2
    model: "{workspace}/model.py" # user-defined model
    dataset_name: dataset_infer # select dataset by name
    thread_num: 1

  1. 我的复现路径
    2.1. cd ./PaddleRec/models/rank/dnn/data/
    2.2. sh run.sh # run.sh 内容如下
    sh download.sh
    mkdir slot_train_data_full
    for i in ls ./train_data_full
    do
    cat train_data_full/$i | python get_slot_data.py > slot_train_data_full/$i
    done
    mkdir slot_test_data_full
    for i in ls ./test_data_full
    do
    cat test_data_full/$i | python get_slot_data.py > slot_test_data_full/$i
    done
    mkdir slot_train_data
    for i in ls ./train_data
    do
    cat train_data/$i | python get_slot_data.py > slot_train_data/$i
    done
    mkdir slot_test_data
    for i in ls ./test_data
    do
    cat test_data/$i | python get_slot_data.py > slot_test_data/$i
    done
    2.3 执行如下命令时候出错 (run.sh中的一个命令,相当于 run.sh 不能成功执行,不能准备好训练需要的数据)
    cat train_data/part-0 | python get_slot_data.py > slot_train_data/part-0
    错误信息
    cat train_data/part-0 | python get_slot_data.py > slot_train_data/part-0
    Traceback (most recent call last):
    File "get_slot_data.py", line 70, in
    d.run_from_stdin()
    File "./miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/incubate/data_generator/init.py", line 128, in run_from_stdin
    for user_parsed_line in line_iter():
    TypeError: 'NoneType' object is not iterable

您好,在run.sh中,我看您处理train_data_full和test_data_full的时候没有报错,这两个是训练和测试的全量数据集,报错的train_data是在downlkoad.sh中从train_data_full中截取出来的小数据集。您可以直接从全量数据集中将part-0,part-1复制过来使用。

登录 后才可以发表评论

状态
负责人
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
参与者(3)
Python
1
https://gitee.com/paddlepaddle/PaddleRec.git
git@gitee.com:paddlepaddle/PaddleRec.git
paddlepaddle
PaddleRec
PaddleRec

搜索帮助

14c37bed 8189591 565d56ea 8189591