61 Star 644 Fork 252

PaddlePaddle / PaddleDetection

 / 详情

模型训练结果

Backlog
Opened this issue  
2022-05-19 12:24

请问模型训练好了之后,关于模型判断的参数在哪里可以看得到呢

Comments (4)

Vivian created任务

大佬们,帮忙看看
我采用的模型:visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml
刚刚开始训练就出现下面错误:
loading annotations into memory...
Done (t=0.19s)
creating index...
index created!
W0911 16:54:24.747865 8921 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0911 16:54:24.752705 8921 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[09/11 16:54:26] ppdet.utils.download INFO: Downloading ppyoloe_crn_l_300e_coco.pdparams from https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
100%|█████████████████████████████████| 211693/211693 [00:21<00:00, 9961.79KB/s]
[09/11 16:54:49] ppdet.utils.checkpoint INFO: ['yolo_head.anchor_points', 'yolo_head.stride_tensor'] in pretrained weight is not used in the model, and its will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight yolo_head.pred_cls.0.bias is unmatched with the shape [3] in model yolo_head.pred_cls.0.bias. And the weight yolo_head.pred_cls.0.bias will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: The shape [80, 768, 3, 3] in pretrained weight yolo_head.pred_cls.0.weight is unmatched with the shape [3, 768, 3, 3] in model yolo_head.pred_cls.0.weight. And the weight yolo_head.pred_cls.0.weight will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight yolo_head.pred_cls.1.bias is unmatched with the shape [3] in model yolo_head.pred_cls.1.bias. And the weight yolo_head.pred_cls.1.bias will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: The shape [80, 384, 3, 3] in pretrained weight yolo_head.pred_cls.1.weight is unmatched with the shape [3, 384, 3, 3] in model yolo_head.pred_cls.1.weight. And the weight yolo_head.pred_cls.1.weight will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight yolo_head.pred_cls.2.bias is unmatched with the shape [3] in model yolo_head.pred_cls.2.bias. And the weight yolo_head.pred_cls.2.bias will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: The shape [80, 192, 3, 3] in pretrained weight yolo_head.pred_cls.2.weight is unmatched with the shape [3, 192, 3, 3] in model yolo_head.pred_cls.2.weight. And the weight yolo_head.pred_cls.2.weight will not be loaded
[09/11 16:54:49] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/aistudio/.cache/paddle/weights/ppyoloe_crn_l_300e_coco.pdparams
[09/11 16:54:51] ppdet.engine INFO: Epoch: [0] [ 0/5564] learning_rate: 0.000000 loss: 5.018298 loss_cls: 1.015438 loss_iou: 0.968017 loss_dfl: 3.165634 loss_l1: 5.039592 eta: 4 days, 12:22:53 batch_cost: 2.3375 data_cost: 0.0005 ips: 0.8556 images/s
Found inf or nan, current scale is: 1024.0, decrease to: 1024.00.5
Found inf or nan, current scale is: 512.0, decrease to: 512.0
0.5
Found inf or nan, current scale is: 256.0, decrease to: 256.00.5
Found inf or nan, current scale is: 128.0, decrease to: 128.0
0.5
Found inf or nan, current scale is: 64.0, decrease to: 64.00.5
Found inf or nan, current scale is: 32.0, decrease to: 32.0
0.5
Found inf or nan, current scale is: 16.0, decrease to: 16.00.5
Found inf or nan, current scale is: 8.0, decrease to: 8.0
0.5
Found inf or nan, current scale is: 4.0, decrease to: 4.00.5
Found inf or nan, current scale is: 2.0, decrease to: 2.0
0.5
Found inf or nan, current scale is: 1.0, decrease to: 1.00.5
Found inf or nan, current scale is: 0.5, decrease to: 0.5
0.5
Found inf or nan, current scale is: 0.25, decrease to: 0.250.5
Found inf or nan, current scale is: 0.125, decrease to: 0.125
0.5
Found inf or nan, current scale is: 0.0625, decrease to: 0.06250.5
Found inf or nan, current scale is: 0.03125, decrease to: 0.03125
0.5
Found inf or nan, current scale is: 0.015625, decrease to: 0.0156250.5
Found inf or nan, current scale is: 0.0078125, decrease to: 0.0078125
0.5
Found inf or nan, current scale is: 0.00390625, decrease to: 0.003906250.5
[09/11 16:55:52] ppdet.engine INFO: Epoch: [0] [ 100/5564] learning_rate: 0.000001 loss: 4.645183 loss_cls: 1.564811 loss_iou: 0.728613 loss_dfl: 2.748396 loss_l1: 3.467703 eta: 1 day, 0:28:35 batch_cost: 0.5101 data_cost: 0.0003 ips: 3.9207 images/s
Found inf or nan, current scale is: 0.001953125, decrease to: 0.001953125
0.5
Found inf or nan, current scale is: 0.0009765625, decrease to: 0.00097656250.5
Found inf or nan, current scale is: 0.00048828125, decrease to: 0.00048828125
0.5
Found inf or nan, current scale is: 0.000244140625, decrease to: 0.0002441406250.5
Found inf or nan, current scale is: 0.0001220703125, decrease to: 0.0001220703125
0.5
Found inf or nan, current scale is: 6.103515625e-05, decrease to: 6.103515625e-050.5
Found inf or nan, current scale is: 3.0517578125e-05, decrease to: 3.0517578125e-05
0.5
Found inf or nan, current scale is: 1.52587890625e-05, decrease to: 1.52587890625e-050.5
Found inf or nan, current scale is: 7.62939453125e-06, decrease to: 7.62939453125e-06
0.5
Found inf or nan, current scale is: 3.814697265625e-06, decrease to: 3.814697265625e-060.5
Found inf or nan, current scale is: 1.9073486328125e-06, decrease to: 1.9073486328125e-06
0.5
Found inf or nan, current scale is: 9.5367431640625e-07, decrease to: 9.5367431640625e-070.5
Found inf or nan, current scale is: 4.76837158203125e-07, decrease to: 4.76837158203125e-07
0.5
Found inf or nan, current scale is: 2.384185791015625e-07, decrease to: 2.384185791015625e-070.5
Found inf or nan, current scale is: 1.1920928955078125e-07, decrease to: 1.1920928955078125e-07
0.5
Found inf or nan, current scale is: 5.960464477539063e-08, decrease to: 5.960464477539063e-080.5
Found inf or nan, current scale is: 2.9802322387695312e-08, decrease to: 2.9802322387695312e-08
0.5
Found inf or nan, current scale is: 1.4901161193847656e-08, decrease to: 1.4901161193847656e-080.5
Found inf or nan, current scale is: 7.450580596923828e-09, decrease to: 7.450580596923828e-09
0.5
Found inf or nan, current scale is: 3.725290298461914e-09, decrease to: 3.725290298461914e-090.5
Found inf or nan, current scale is: 1.862645149230957e-09, decrease to: 1.862645149230957e-09
0.5
Found inf or nan, current scale is: 9.313225746154785e-10, decrease to: 9.313225746154785e-100.5
Found inf or nan, current scale is: 4.656612873077393e-10, decrease to: 4.656612873077393e-10
0.5
Found inf or nan, current scale is: 2.3283064365386963e-10, decrease to: 2.3283064365386963e-100.5
Found inf or nan, current scale is: 1.1641532182693481e-10, decrease to: 1.1641532182693481e-10
0.5
Found inf or nan, current scale is: 5.820766091346741e-11, decrease to: 5.820766091346741e-110.5
Found inf or nan, current scale is: 2.9103830456733704e-11, decrease to: 2.9103830456733704e-11
0.5
Found inf or nan, current scale is: 1.4551915228366852e-11, decrease to: 1.4551915228366852e-110.5
Found inf or nan, current scale is: 7.275957614183426e-12, decrease to: 7.275957614183426e-12
0.5
Found inf or nan, current scale is: 3.637978807091713e-12, decrease to: 3.637978807091713e-120.5
Found inf or nan, current scale is: 1.8189894035458565e-12, decrease to: 1.8189894035458565e-12
0.5
Found inf or nan, current scale is: 9.094947017729282e-13, decrease to: 9.094947017729282e-130.5
Found inf or nan, current scale is: 4.547473508864641e-13, decrease to: 4.547473508864641e-13
0.5
Found inf or nan, current scale is: 2.2737367544323206e-13, decrease to: 2.2737367544323206e-130.5
Found inf or nan, current scale is: 1.1368683772161603e-13, decrease to: 1.1368683772161603e-13
0.5
Found inf or nan, current scale is: 5.684341886080802e-14, decrease to: 5.684341886080802e-140.5
Found inf or nan, current scale is: 2.842170943040401e-14, decrease to: 2.842170943040401e-14
0.5
[09/11 16:56:52] ppdet.engine INFO: Epoch: [0] [ 200/5564] learning_rate: 0.000001 loss: nan loss_cls: nan loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 eta: 23:59:42 batch_cost: 0.5079 data_cost: 0.0003 ips: 3.9374 images/s

采用模型visdrone/ppyoloe_plus_crn_l_largesize_80e_visdrone.yml 也同样报错。报错如下:
loading annotations into memory...
Done (t=0.20s)
creating index...
index created!
W0911 11:41:50.710125 4053 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0911 11:41:50.714862 4053 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[09/11 11:41:53] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight yolo_head.pred_cls.0.bias is unmatched with the shape [3] in model yolo_head.pred_cls.0.bias. And the weight yolo_head.pred_cls.0.bias will not be loaded
[09/11 11:41:53] ppdet.utils.checkpoint INFO: The shape [80, 768, 3, 3] in pretrained weight yolo_head.pred_cls.0.weight is unmatched with the shape [3, 768, 3, 3] in model yolo_head.pred_cls.0.weight. And the weight yolo_head.pred_cls.0.weight will not be loaded
[09/11 11:41:53] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight yolo_head.pred_cls.1.bias is unmatched with the shape [3] in model yolo_head.pred_cls.1.bias. And the weight yolo_head.pred_cls.1.bias will not be loaded
[09/11 11:41:53] ppdet.utils.checkpoint INFO: The shape [80, 384, 3, 3] in pretrained weight yolo_head.pred_cls.1.weight is unmatched with the shape [3, 384, 3, 3] in model yolo_head.pred_cls.1.weight. And the weight yolo_head.pred_cls.1.weight will not be loaded
[09/11 11:41:53] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight yolo_head.pred_cls.2.bias is unmatched with the shape [3] in model yolo_head.pred_cls.2.bias. And the weight yolo_head.pred_cls.2.bias will not be loaded
[09/11 11:41:53] ppdet.utils.checkpoint INFO: The shape [80, 192, 3, 3] in pretrained weight yolo_head.pred_cls.2.weight is unmatched with the shape [3, 192, 3, 3] in model yolo_head.pred_cls.2.weight. And the weight yolo_head.pred_cls.2.weight will not be loaded
[09/11 11:41:53] ppdet.utils.checkpoint INFO: Finish loading model weights: models/ppyoloe_plus_crn_l_80e_coco.pdparams
[09/11 11:41:55] ppdet.engine INFO: Epoch: [0] [ 0/5564] learning_rate: 0.000000 loss: 5.066185 loss_cls: 1.017579 loss_iou: 0.979015 loss_dfl: 3.202135 loss_l1: 6.590621 eta: 4 days, 4:15:48 batch_cost: 2.1624 data_cost: 0.0044 ips: 0.9249 images/s
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].
Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion p_in_data[idx] >= 0 && p_in_data[idx] < depth failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [24276], but received [140098075152656].

上面问题,请大佬们帮忙看看啊。

如果我把模型换成 /ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml 。训练两个批次,也会包错。
如果是脏数据,那么不应该在两个批次后再报错,还是在损失函数或者梯度上没有设定好造成的。

DONE (t=2.77s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.007
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.019
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.019
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
[09/12 04:03:31] ppdet.engine INFO: Total sample number: 2170, averge FPS: 15.359691011400209
[09/12 04:03:31] ppdet.engine INFO: Best test bbox ap is 0.000.
[09/12 04:03:41] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_x_80e_coco
[09/12 04:03:42] ppdet.engine INFO: Epoch: [1] [ 0/1084] learning_rate: 0.000020 loss: 4.214671 loss_cls: 1.757411 loss_iou: 0.617268 loss_dfl: 1.700031 loss_l1: 1.388732 eta: 3:51:55 batch_cost: 0.4325 data_cost: 0.0008 ips: 18.4964 images/s
[09/12 04:04:39] ppdet.engine INFO: Epoch: [1] [ 100/1084] learning_rate: 0.000020 loss: 4.059474 loss_cls: 1.690021 loss_iou: 0.598096 loss_dfl: 1.689508 loss_l1: 1.351786 eta: 3:51:54 batch_cost: 0.4590 data_cost: 0.0008 ips: 17.4296 images/s
[09/12 04:05:35] ppdet.engine INFO: Epoch: [1] [ 200/1084] learning_rate: 0.000020 loss: 4.161784 loss_cls: 1.699200 loss_iou: 0.602575 loss_dfl: 1.713835 loss_l1: 1.350305 eta: 3:51:11 batch_cost: 0.4448 data_cost: 0.0005 ips: 17.9872 images/s
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.
Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but recieved nan.

Sign in to comment

Status
Assignees
Milestones
Pull Requests
Successfully merging a pull request will close this issue.
Branches
Planed to start   -   Planed to end
-
Top level
Priority
参与者(2)
Python
1
https://gitee.com/paddlepaddle/PaddleDetection.git
git@gitee.com:paddlepaddle/PaddleDetection.git
paddlepaddle
PaddleDetection
PaddleDetection

Search