2 Star 0 Fork 1

rainrime / parellel-learn

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
BSD-3-Clause

超像素算法 (SLIC) 优化

依赖

  • 仅在 linux(x64) 下使用 gcc 测试过,目前不需要其他依赖。
    • 注意:由于用到 libmvec,Glibc 版本需要 >= 2.22
  • SLIC 以外文件夹的其他代码可能需要 tbb、openblas、eigen 等,暂时不需要。

编译及运行

原程序在 SLIC_BK 下,添加了 Makefile

# 以下代码均以本项目根目录为初始路径
cd SLIC_BK
make clean && make
./SLIC.out

优化后程序在 SLIC 目录下,utils.h 中的宏 #define THE_THREAD_NUMS 确定了线程数,请根据本机情况自行调整。

cd SLIC
make clean && make
./SLIC.out

超算集群上运行方式如下,仅使用单台机器,对 NUMA 的架构进行了一定访存优化。

export OMP_PLACES=cores
srun -p amd_256 -N 1 -t 10 ./SLIC.out

注:

  • 如果平台默认不是 Glibc >= 2.22 ,请自行琢磨链接执行方式,测试期间本人直接在本机 (wsl ubuntu 20.04 lts) 上静态链接,仅将可执行文件上传执行。

运行效果

AMD EPYC 7452 32-Core Processor (2 sockets),即双路 32 核共 64 核 64 线程 的机器上运行。

原始程序执行时间约为 5700ms,32 线程时执行时间约为 78ms,整体加速比约为 73,62 线程时执行时间约为 57 ms,整体加速比约为 100。(注:并非仅运用并行带来的加速比,实际上运用了一些单线程优化方法后,并行加速比并不可观)

指定 62 线程,使用环境变量 OMP_PLACES=cores ,执行效果如下:

$ export OMP_PLACES=cores
$ chmod +x ./SLIC.out 
$ srun -p amd_256 -N 1 -t 10 ./SLIC.out
srun: job 555293 queued and waiting for resources
srun: job 555293 has been allocated resources
width = 2599, height = 3898
sz = 10130902
Initial time = 0 ms
Conversion time = 20 ms
DeleteEdges and Get_Seeds time = 0 ms
numk = 196
Dist iter time=4(4) ms 
Dist iter time=6(2) ms 
Dist iter time=8(2) ms 
Dist iter time=10(2) ms 
Dist iter time=12(2) ms 
Dist iter time=14(2) ms 
Dist iter time=16(2) ms 
Dist iter time=18(2) ms 
Dist iter time=20(2) ms 
Dist iter time=22(2) ms 
Computing time=28 ms
STEP = 227
Segmentation time = 29 ms
EC1 time=0 ms
EC2 time=3 ms
EC3 time=0 ms
EC4 time=1 ms
EnforceLabelConnectivity time = 6 ms
Computing time=57 ms
There are 0 points' labels are different from original file.

原始效果如下:

$ srun -p amd_256 -N 1 -t 10 ./SLIC.out
srun: job 438538 queued and waiting for resources
srun: job 438538 has been allocated resources
Computing time=5780 ms
There are 0 points' labels are different from original file.

优化过程中某一阶段如下:

$ srun -p amd_256 -N 1 -t 10 ./SLIC.out
srun: job 431514 queued and waiting for resources
srun: job 431514 has been allocated resources
width = 2599, height = 3898
sz = 10130902
Initial time = 3 ms
Conversion time = 80 ms
DeleteEdges and Get_Seeds time = 17 ms
numk = 196
Dist iter time=18(18) ms        Dist iter time=0(0) ms
Dist iter time=28(10) ms        Dist iter time=0(0) ms
Dist iter time=38(10) ms        Dist iter time=0(0) ms
Dist iter time=48(10) ms        Dist iter time=0(0) ms
Dist iter time=56(8) ms         Dist iter time=0(0) ms
Dist iter time=66(10) ms        Dist iter time=0(0) ms
Dist iter time=75(9) ms         Dist iter time=0(0) ms
Dist iter time=77(2) ms         Dist iter time=0(0) ms
Dist iter time=79(2) ms         Dist iter time=0(0) ms
Dist iter time=81(2) ms         Dist iter time=0(0) ms
STEP = 227
Segmentation time = 194 ms
EnforceLabelConnectivity time = 125 ms
Computing time=424 ms
There are 0 points' labels are different from original file.
BSD 3-Clause License Copyright (c) 2021, rainrime All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

简介

学习并行编程 展开 收起
BSD-3-Clause
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
C++
1
https://gitee.com/rainrime/parallel-learn.git
git@gitee.com:rainrime/parallel-learn.git
rainrime
parallel-learn
parellel-learn
master

搜索帮助