【标题】
hashjoin细节优化
【实现内容】:
1、删掉多余的CHECK_FOR_INTERRUPTS,之前修改执行器时漏了,微略影响性能
2、ExecHashGetHashValue计算了两次hash,改用更高效的murmurhash32(agg hashagg也有类似的问题,pg同样使用murmurhash32,但影响好几个测试结果的顺序,暂时没有修改)

3、合并pg builtins 指令的优化(hash扩容的时候才用,感觉对性能影响不大,但感觉可以考虑合入,基础设施,后续可能被其它优化引用。
对bitmap的运算、gist索引相关,可能有提升,但没具体测试过)
ommit 02a6a54ecd6632f974b1b4eebfb2373363431084
Author: Tom Lane tgl@sss.pgh.pa.us
Date: Fri Feb 15 23:22:27 2019 -0500

Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.

Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.

On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.

I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.

While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.

David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane

Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com

【根因分析】:
-- hashjoin测试用例
create table tbl1(id int8);
insert into tbl1(id) select generate_series(1,1000*10000);

create table tbl2(id int8);
insert into tbl2(id) select generate_series(1,1000*10000);

analyze;
vacuum;

select * from tbl1 inner join tbl2 on(tbl1.id*2=tbl2.id) where tbl1.id+tbl2.id<0;

--pg14结果
Time: 11414.236 ms (00:11.414)

--og优化前的结果
Time: 14589.786 ms

--og优化后的结果
Time: 12666.348 ms

tpch9,在本机器从14589.786 ms优化到12666.348 ms,pg14测为11414.236 ms

【修改方案】:

  1. xxx
  2. xxx

【关联issue】:

【开发自验报告】:

  1. 请附上自验结果(内容或者截图)
  2. 是否可以添加fastcheck测试用例,如是,请补充fastcheck用例
  3. 是否涉及资料修改,如是,在docs仓库补充资料
  4. 是否考虑支撑升级和在线扩容等扩展场景
  5. 是否考虑异常场景/并发场景/前向兼容/性能场景
  6. 是否对其他模块产生影响