main.cpp. Note that I am using the trunk version of 9th January 2026 and compiled LLVM with the debug build.
int foo(int z, int x, int y) {
return z + (x * y);
}
With the following command, we can obtain the LLVM IR: ./clang --target=riscv64 -emit-llvm -S -O3 ./main.cpp -o main.ll
; ModuleID = './main.cpp'
source_filename = "./main.cpp"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64"
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define dso_local noundef signext i32 @_Z3fooiii(i32 noundef signext %z, i32 noundef signext %x, i32 noundef signext %y) local_unnamed_addr #0 {
entry:
%mul = mul nsw i32 %y, %x
%add = add nsw i32 %mul, %z
ret i32 %add
}
attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+c,+i,+m,+relax,+zaamo,+zalrsc,+zca,+zmmul,-b,-d,-e,-experimental-p,-experimental-smpmpmt,-experimental-svukte,-experimental-xqccmp,-experimental-xrivosvisni,-experimental-xrivosvizip,-experimental-xsfmclic,-experimental-xsfsclic,-experimental-zalasr,-experimental-zibi,-experimental-zicfilp,-experimental-zicfiss,-experimental-zvbc32e,-experimental-zvfbfa,-experimental-zvfofp8min,-experimental-zvkgs,-experimental-zvqdotq,-f,-h,-q,-sdext,-sdtrig,-sha,-shcounterenw,-shgatpa,-shlcofideleg,-shtvala,-shvsatpa,-shvstvala,-shvstvecd,-smaia,-smcdeleg,-smcntrpmf,-smcsrind,-smctr,-smdbltrp,-smepmp,-smmpm,-smnpm,-smrnmi,-smstateen,-ssaia,-ssccfg,-ssccptr,-sscofpmf,-sscounterenw,-sscsrind,-ssctr,-ssdbltrp,-ssnpm,-sspm,-ssqosid,-ssstateen,-ssstrict,-sstc,-sstvala,-sstvecd,-ssu64xl,-supm,-svade,-svadu,-svbare,-svinval,-svnapot,-svpbmt,-svvptc,-v,-xandesbfhcvt,-xandesperf,-xandesvbfhcvt,-xandesvdot,-xandesvpackfph,-xandesvsinth,-xandesvsintload,-xcvalu,-xcvbi,-xcvbitmanip,-xcvelw,-xcvmac,-xcvmem,-xcvsimd,-xmipscbop,-xmipscmov,-xmipsexectl,-xmipslsp,-xqci,-xqcia,-xqciac,-xqcibi,-xqcibm,-xqcicli,-xqcicm,-xqcics,-xqcicsr,-xqciint,-xqciio,-xqcilb,-xqcili,-xqcilia,-xqcilo,-xqcilsm,-xqcisim,-xqcisls,-xqcisync,-xsfcease,-xsfmm128t,-xsfmm16t,-xsfmm32a16f,-xsfmm32a32f,-xsfmm32a8f,-xsfmm32a8i,-xsfmm32t,-xsfmm64a64f,-xsfmm64t,-xsfmmbase,-xsfvcp,-xsfvfbfexp16e,-xsfvfexp16e,-xsfvfexp32e,-xsfvfexpa,-xsfvfexpa64e,-xsfvfnrclipxfqf,-xsfvfwmaccqqq,-xsfvqmaccdod,-xsfvqmaccqoq,-xsifivecdiscarddlone,-xsifivecflushdlone,-xsmtvdot,-xtheadba,-xtheadbb,-xtheadbs,-xtheadcmo,-xtheadcondmov,-xtheadfmemidx,-xtheadmac,-xtheadmemidx,-xtheadmempair,-xtheadsync,-xtheadvdot,-xventanacondops,-xwchc,-za128rs,-za64rs,-zabha,-zacas,-zama16b,-zawrs,-zba,-zbb,-zbc,-zbkb,-zbkc,-zbkx,-zbs,-zcb,-zcd,-zce,-zcf,-zclsd,-zcmop,-zcmp,-zcmt,-zdinx,-zfa,-zfbfmin,-zfh,-zfhmin,-zfinx,-zhinx,-zhinxmin,-zic64b,-zicbom,-zicbop,-zicboz,-ziccamoa,-ziccamoc,-ziccif,-zicclsm,-ziccrse,-zicntr,-zicond,-zicsr,-zifencei,-zihintntl,-zihintpause,-zihpm,-zilsd,-zimop,-zk,-zkn,-zknd,-zkne,-zknh,-zkr,-zks,-zksed,-zksh,-zkt,-ztso,-zvbb,-zvbc,-zve32f,-zve32x,-zve64d,-zve64f,-zve64x,-zvfbfmin,-zvfbfwma,-zvfh,-zvfhmin,-zvkb,-zvkg,-zvkn,-zvknc,-zvkned,-zvkng,-zvknha,-zvknhb,-zvks,-zvksc,-zvksed,-zvksg,-zvksh,-zvkt,-zvl1024b,-zvl128b,-zvl16384b,-zvl2048b,-zvl256b,-zvl32768b,-zvl32b,-zvl4096b,-zvl512b,-zvl64b,-zvl65536b,-zvl8192b" }
!llvm.module.flags = !{!0, !1, !2, !4}
!llvm.ident = !{!5}
!llvm.errno.tbaa = !{!6}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"target-abi", !"lp64"}
!2 = !{i32 6, !"riscv-isa", !3}
!3 = !{!"rv64i2p1_m2p0_a2p1_c2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zca1p0"}
!4 = !{i32 8, !"SmallDataLimit", i32 0}
!5 = !{!"clang version 22.0.0git ([email protected]:kper/llvm-project.git e82399dac2f1f09319243dc39d9e05c7b9b8c6d2)"}
!6 = !{!7, !7, i64 0}
!7 = !{!"int", !8, i64 0}
!8 = !{!"omnipotent char", !9, i64 0}
!9 = !{!"Simple C++ TBAA"}
Since I compiled LLVM in debug mode, my llc has an option --debug which prints useful information to me. I use the following command to debug the instruction selection: ./llc -debug ./main.ll 2> debug.txt I can omit the target triple in the invocation because the
.ll already has the target triple defined.
The last selectionDAG, which was legalized and optimized, before the instruction selection is printed in
debug.txt and looks like that
Optimized legalized selection DAG: %bb.0 '_Z3fooiii:entry'
SelectionDAG has 17 nodes:
t0: ch,glue = EntryToken
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t12: i64 = AssertSext t6, ValueType:ch:i32
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: i64 = AssertSext t4, ValueType:ch:i32
t20: i64 = mul t12, t10
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t8: i64 = AssertSext t2, ValueType:ch:i32
t23: i64 = add t20, t8
t24: i64 = sign_extend_inreg t23, ValueType:ch:i32
t18: ch,glue = CopyToReg t0, Register:i64 $x10, t24
t19: ch = RISCVISD::RET_GLUE t18, Register:i64 $x10, t18:1
How to read these lines?
First, let's print a graphical representation of this DAG with./llc -view-isel-dags ./main.ll
Each line corresponds to a node in this graph. For example
t23: i64 = add t20, t8
t23:refers to the name of the nodei64is the output type of the nodeaddis the operation of the node. These are predefined in LLVM.t20, t8are the operands of the addition.
A single node is showing the same information as the text.
- The first row indicates the number of operands. In the graph below, the node has two inputs.
- The second row indicates the type of the node. LLVM IR is transformed into a SelectionDAG and its node types are predetermined. In this case, the node is a multiplication node. However, backends can create their own nodes and use them instead. For example, the selectionDAG graph from before has a node
RISCVISD::RET_GLUE, which was created by the backend. - The third row shows the name of the node.
- And finally, the last row represents the outputs of the node. For the multiplication node, the output is a
i64value. Of course, some nodes can return multiple values.
The edges in the graph represent dependencies.
- The black edge represents a data dependency. For example, the addition node has an edge from the 0th operand to the multiplication node. Therefore, the multiplication's computation is used as an input for the addition.
- The blue edge represents a chain dependency.
- And the red edge represents a glue dependency.
ch represents a chain. Both ch and glue define dependencies in the resulting assembly. The distinction between both was mentioned here.
Instruction Selection
The initial question of this blog post was in which order these nodes get selected. When opening the previously generateddebug.txt then we can see the evaluation order.
ISEL: Starting selection on root node: t19: ch = RISCVISD::RET_GLUE t18, Register:i64 $x10, t18:1 ISEL: Starting pattern match Morphed node: t19: ch = PseudoRET Register:i64 $x10, t18, t18:1 ISEL: Match complete!This indicates that it started with the last instruction of the program which is probably meant with "bottom up" in the Nvidia presentation. It continues with the next nodes which aren't that interesting.
ISEL: Starting selection on root node: t18: ch,glue = CopyToReg t0, Register:i64 $x10, t24 ISEL: Starting selection on root node: t24: i64 = sign_extend_inreg t23, ValueType:ch:i32 ISEL: Starting pattern match Initial Opcode index to 1284951 Match failed at index 1284954 Continuing at 1284986 Skipped scope entry (due to false predicate) at index 1284997, continuing at 1285004 Creating constant: t27: i64 = TargetConstant<0> Morphed node: t24: i64 = ADDIW t23, TargetConstant:i64<0> ISEL: Match complete!It then tries to match the multiplication operation. This is actually more interesting since it is important to know what comes next.
ISEL: Starting selection on root node: t23: i64 = add t20, t8 ISEL: Starting pattern match Initial Opcode index to 39213 Skipped scope entry (due to false predicate) at index 39223, continuing at 39255 Skipped scope entry (due to false predicate) at index 39256, continuing at 39288 Skipped scope entry (due to false predicate) at index 39289, continuing at 39321 Skipped scope entry (due to false predicate) at index 39322, continuing at 39354 Skipped scope entry (due to false predicate) at index 39355, continuing at 39458 Skipped scope entry (due to false predicate) at index 39459, continuing at 39491 Skipped scope entry (due to false predicate) at index 39492, continuing at 39524 Skipped scope entry (due to false predicate) at index 39525, continuing at 39557 Match failed at index 39221 ... Skipped scope entry (due to false predicate) at index 43164, continuing at 43176 Morphed node: t23: i64 = ADDW t20, t8 ISEL: Match complete!And that the addition is matched before the multiplication makes perfect sense. Imagine, you would have a fused-multiply-add instruction then you would expect that it is checked before the individual patterns for multiplication and addition. This is ensured by ranking the instruction selection patterns and evaluating a fused-multiply-add before its constituents. Therefore, the resulting order is
t19 -> t18 -> t24 -> t23 -> t20 -> t12 -> t10 -> t8 -> t6 -> t4 -> t2 -> t17 -> t5 -> t3 -> t1 -> t0.
Resulting in this selection:
Selected selection DAG: %bb.0 '_Z3fooiii:entry'
SelectionDAG has 12 nodes:
t0: ch,glue = EntryToken
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t20: i64 = MULW t6, t4
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t23: i64 = ADDW t20, t2
t18: ch,glue = CopyToReg t0, Register:i64 $x10, t23
t19: ch = PseudoRET Register:i64 $x10, t18, t18:1