mov x23, x0 // [R601] [R232]
ldr x10, [x0,104] // [R232]
str x10, [sp,208] // SPILLcolor vreg: 602
ldr x21, [sp,184] // RELOADcolor vreg: 603
str wzr, [sp,216] // SPILLcolor vreg: 605
str wzr, [sp,440] // SPILL vreg: 606 for caller save in BB 2
str wzr, [sp,220] // SPILLcolor vreg: 607
ldr w1, [x0,96] // [R232] [R233]
cbz x3, .L.2344__500 // [R10610]
mov x0, x1 // [R622] [R230]
b .L.2344__501
.L.2344__500: //label order 2192
mov w1, wzr
bl Perl_gv_add_by_type
ldr x0, [x0,16] // [R623] [R622]
.L.2344__501: //label order 2193
ldr x10, [x0] // [R622]
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
A futurewei ticket has been filed against ME for this.
No longer an issue.
cmp var, 0
cset res, eq
instead xor can be used, as used by gcc.
eor res, var, 1
A futurewei ticket 331 has been filed against ME for this.
PR 903 has the fix.
brfalse @@605 (lt i32 a64 (regread ptr %403, regread a64 %30))
@@605 LOC 52 4912
futurewei bug 287
PR 912
Maple switch code has more instructions. The switch table uses 8 byte entries.
add w2, w0, 0 // [R10620] [R2260]
adrp x1, .LB_S_regmatch2 // [R10621]
add x1, x1, #:lo12:.LB_S_regmatch2 // [R10621] [R10621]
ldr x2, [x1,x2,LSL 3] // [R10621] [R10620] [R10623]
add x1, x1, x2 // [R10622] [R10621] [R10623]
br x1 // [R10622]
.LB_S_regmatch2:
.quad .L.2344__100 - .LB_S_regmatch2
.quad .L.2344__330 - .LB_S_regmatch2
Gcc uses adr with less instructions. The switch table entries are smaller.
ldrh w0, [x19,w1,uxtw 1]
adr x2, .Lrtx1788
add x0, x2, w0, sxth 2
br x0
.Lrtx1788:
.section .rodata
.align 0
.align 2
.L1788:
.2byte (.L1893 - .Lrtx1788) / 4
.2byte (.L1803 - .Lrtx1788) / 4
ldr x17, [sp,504] // RELOAD vreg: 642 for caller save in BB 1473
str x17, [sp,520] // SPILL vreg: 6558 for caller save in BB 1473
mov x12, x17 // [R6558] [R642]
b .L.2344__161
Here, x12 can replace x17 and eliminate the mov.
and w1, w1, 31 // [R10647] [R10646]
sxtw x1, w1 // [R10648] [R10647]
w1 & 0x1f will not need sxtw.
ldr w13, [sp,480] // RELOAD vreg: 598 for caller save in BB 44
uxtb w0, w13 // [R2813] [R598]
ldr can be converted to ldrb (unsigned load byte).
Fred wants to improve icache utilization. Will be working on issue
Improve mplme's me_bb_layout.cpp to maximize icache locality.
futurewei bug 335
ldrb w0, [x26] // [R3294] [R10799]
add x26, x26, 1 // [R3294] [R3294]
Can combine
ldrb w0, [x26], 1
I am not able to get this pattern in regexec.s
This can be a general case worth looking into, such as
*++p
*p++
and can be any integral types. I'll investigate a bit.
FW Bug 340 with testcase created.
next = scan + NEXT_OFF(scan);
preprocess shows this.
next = scan + ((scan)->next_off); <- next_off is U16
Maple IR after ME is this.
@@1331 regassign u32 %33 (iread u32 <* <$regnode>> 3 (regread ptr %6397))
regassign i32 %34 (cvt i32 u16 (regread u32 %33))
regassign i32 %35 (mul i32 (regread i32 %34, constval i32 4))
regassign ptr %274 (add ptr (
regread ptr %6397,
cvt ptr i32 (regread i32 %35)))
mplcg generated this.
ldrh w0, [x13,2] // [R6597] [R233]
lsl w0, w0, 2 // [R235] [R233]
sxtw x0, w0 // [R10654] [R235]
add x0, x13, x0 // [R474] [R6597] [R10654]
gcc generated this.
ldrh w21, [x22, 2]
add x21, x22, x21, lsl 2
PR 905 fixed the bug of combining lsl->sxt->add into add with shift since under some circumstances lsl can set the sign bit and sxt will change the result of the single instruction add with shift, since add with shift will extend first then shift, violating the original intent.
submitted futurewei issue 338 for ME.
See related futurewei issue 332 and PR 905.
Fred has filed issue against mplfe, I4BEL1
mov x12, -1229782938247303442
movk x12, 0xeeef, lsl 0
if (! HAS_TEXT(next) && ! JUMPABLE(next)) {
Here HAS_TEXT() gets expanded into
! ( (PL_regkind[((next)->type)] == 31) || PL_regkind[((next)->type)] == 51 )
gcc short circuited this into
cmp w0, 31
cset w2, ne
cmp w0, 51
ccmp w2, 0, 4, ne
beq .L2562
Maple uses the conventional cmp and branch.
cmp w0, 31 // [R434]
beq .L.2344__2127
cmp w0, 51 // [R434]
beq .L.2344__2128
Maple has less instructions, but gcc has less branches.
many cset for a comparison in .s are useless. It is due to "regassign i32 %5919 (eq i32 i32 (regread u32 %214, constval i32 31))" the like in mpl from FE. It was said that china team knows of this.
ideally, we should be able to get something like:
cmp w0, 31
ccmp w0, 51, 0, ne
beq .L2562
In addition, gcc inlined S_regmatch() into its only caller S_regtry()
ldrh w0, [x24, 76]
ldrh w0, [x2, x0, lsl 1]
maple
ldrh w2, [x21,#76] // [R636] [R1383]
lsl x2, x2, #1 // [R583] [R1383]
ldrh w0, [x0,x2] // [R265] [R583] [R585]
FYE looking into this one
ubfiz x0, x0, 2, 16
to extract the u16 offset for use in address calculation.
LOC 52 6590
regassign i32 %5753 (ge i32 u32 (regread u32 %1407, regread u32 %1353))
brfalse @Lshortcircuit.1567 (ge i32 u32 (regread u32 %1407, regread u32 %1353))
regassign i32 %5753 (ne i32 i64 (
iread i64 <* <$regexp_paren_pair>> 2 (add a64 (
iread a64 <* <$regexp>> 22 (regread ptr %377),
mul a64 (
cvt a64 i32 (regread u32 %1353),
constval a64 24))),
constval i64 -1))
@Lshortcircuit.1567 LOC 52 111115
regassign i32 %5756 (ne i32 i32 (regread i32 %5753, constval i32 0))
regassign u8 %399 (ne u32 i32 (regread i32 %5756, constval i32 0))
%5756 is always a single def then a single use like above. The value %5753 is an earlier compare result. If %5753 is 1, then %5756 is (1 != 0) 1, else 0. Therefore, the definition of %5756 is redundant.
futurewei bug 345
登录 后才可以发表评论