代码拉取完成,页面将自动刷新
26
MaskRCNN OPs stream optimization
已合并
PR types
Performance optimization
PR changes
OPs
Describe
-
GetLengthLoD
,GPUDistFpnProposalsHelper
should run on context stream - remove two unnecessary context wait (no data is sent between host and device)
- sub_lod_data can be memcpy in batch, reduce multiple times sychronization
- The is ~1% e2e performance gain on trt-fp16/maskrcnn inference