Evaluation order in LLVM's SelectionDAG

I've been reading both Eli's LLVM Code Generator and Nvidia's Beginner's Guide to SelectionDAG and what striked me as odd is the sentence on slide 64: "DAG traversed in topological order “bottom up” ensures operator always selected before any operands.". It confused me because I think "bottom up" refers to the order in the original source program and not the graph itself. I would like to dedicate this blog post to the evaluation order. Consider, we have the following input program main.cpp. Note that I am using the trunk version of 9th January 2026 and compiled LLVM with the debug build.

int foo(int z, int x, int y) {
	return z + (x * y);
}

With the following command, we can obtain the LLVM IR: ./clang --target=riscv64 -emit-llvm -S -O3 ./main.cpp -o main.ll

; ModuleID = './main.cpp'
source_filename = "./main.cpp"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define dso_local noundef signext i32 @_Z3fooiii(i32 noundef signext %z, i32 noundef signext %x, i32 noundef signext %y) local_unnamed_addr #0 {
entry:
  %mul = mul nsw i32 %y, %x
  %add = add nsw i32 %mul, %z
  ret i32 %add
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+c,+i,+m,+relax,+zaamo,+zalrsc,+zca,+zmmul,-b,-d,-e,-experimental-p,-experimental-smpmpmt,-experimental-svukte,-experimental-xqccmp,-experimental-xrivosvisni,-experimental-xrivosvizip,-experimental-xsfmclic,-experimental-xsfsclic,-experimental-zalasr,-experimental-zibi,-experimental-zicfilp,-experimental-zicfiss,-experimental-zvbc32e,-experimental-zvfbfa,-experimental-zvfofp8min,-experimental-zvkgs,-experimental-zvqdotq,-f,-h,-q,-sdext,-sdtrig,-sha,-shcounterenw,-shgatpa,-shlcofideleg,-shtvala,-shvsatpa,-shvstvala,-shvstvecd,-smaia,-smcdeleg,-smcntrpmf,-smcsrind,-smctr,-smdbltrp,-smepmp,-smmpm,-smnpm,-smrnmi,-smstateen,-ssaia,-ssccfg,-ssccptr,-sscofpmf,-sscounterenw,-sscsrind,-ssctr,-ssdbltrp,-ssnpm,-sspm,-ssqosid,-ssstateen,-ssstrict,-sstc,-sstvala,-sstvecd,-ssu64xl,-supm,-svade,-svadu,-svbare,-svinval,-svnapot,-svpbmt,-svvptc,-v,-xandesbfhcvt,-xandesperf,-xandesvbfhcvt,-xandesvdot,-xandesvpackfph,-xandesvsinth,-xandesvsintload,-xcvalu,-xcvbi,-xcvbitmanip,-xcvelw,-xcvmac,-xcvmem,-xcvsimd,-xmipscbop,-xmipscmov,-xmipsexectl,-xmipslsp,-xqci,-xqcia,-xqciac,-xqcibi,-xqcibm,-xqcicli,-xqcicm,-xqcics,-xqcicsr,-xqciint,-xqciio,-xqcilb,-xqcili,-xqcilia,-xqcilo,-xqcilsm,-xqcisim,-xqcisls,-xqcisync,-xsfcease,-xsfmm128t,-xsfmm16t,-xsfmm32a16f,-xsfmm32a32f,-xsfmm32a8f,-xsfmm32a8i,-xsfmm32t,-xsfmm64a64f,-xsfmm64t,-xsfmmbase,-xsfvcp,-xsfvfbfexp16e,-xsfvfexp16e,-xsfvfexp32e,-xsfvfexpa,-xsfvfexpa64e,-xsfvfnrclipxfqf,-xsfvfwmaccqqq,-xsfvqmaccdod,-xsfvqmaccqoq,-xsifivecdiscarddlone,-xsifivecflushdlone,-xsmtvdot,-xtheadba,-xtheadbb,-xtheadbs,-xtheadcmo,-xtheadcondmov,-xtheadfmemidx,-xtheadmac,-xtheadmemidx,-xtheadmempair,-xtheadsync,-xtheadvdot,-xventanacondops,-xwchc,-za128rs,-za64rs,-zabha,-zacas,-zama16b,-zawrs,-zba,-zbb,-zbc,-zbkb,-zbkc,-zbkx,-zbs,-zcb,-zcd,-zce,-zcf,-zclsd,-zcmop,-zcmp,-zcmt,-zdinx,-zfa,-zfbfmin,-zfh,-zfhmin,-zfinx,-zhinx,-zhinxmin,-zic64b,-zicbom,-zicbop,-zicboz,-ziccamoa,-ziccamoc,-ziccif,-zicclsm,-ziccrse,-zicntr,-zicond,-zicsr,-zifencei,-zihintntl,-zihintpause,-zihpm,-zilsd,-zimop,-zk,-zkn,-zknd,-zkne,-zknh,-zkr,-zks,-zksed,-zksh,-zkt,-ztso,-zvbb,-zvbc,-zve32f,-zve32x,-zve64d,-zve64f,-zve64x,-zvfbfmin,-zvfbfwma,-zvfh,-zvfhmin,-zvkb,-zvkg,-zvkn,-zvknc,-zvkned,-zvkng,-zvknha,-zvknhb,-zvks,-zvksc,-zvksed,-zvksg,-zvksh,-zvkt,-zvl1024b,-zvl128b,-zvl16384b,-zvl2048b,-zvl256b,-zvl32768b,-zvl32b,-zvl4096b,-zvl512b,-zvl64b,-zvl65536b,-zvl8192b" }

!llvm.module.flags = !{!0, !1, !2, !4}
!llvm.ident = !{!5}
!llvm.errno.tbaa = !{!6}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"target-abi", !"lp64"}
!2 = !{i32 6, !"riscv-isa", !3}
!3 = !{!"rv64i2p1_m2p0_a2p1_c2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zca1p0"}
!4 = !{i32 8, !"SmallDataLimit", i32 0}
!5 = !{!"clang version 22.0.0git ([email protected]:kper/llvm-project.git e82399dac2f1f09319243dc39d9e05c7b9b8c6d2)"}
!6 = !{!7, !7, i64 0}
!7 = !{!"int", !8, i64 0}
!8 = !{!"omnipotent char", !9, i64 0}
!9 = !{!"Simple C++ TBAA"}

Since I compiled LLVM in debug mode, my llc has an option --debug which prints useful information to me. I use the following command to debug the instruction selection: ./llc -debug ./main.ll 2> debug.txt
I can omit the target triple in the invocation because the .ll already has the target triple defined.

The last selectionDAG, which was legalized and optimized, before the instruction selection is printed in debug.txt and looks like that

Optimized legalized selection DAG: %bb.0 '_Z3fooiii:entry'
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
            t6: i64,ch = CopyFromReg t0, Register:i64 %2
          t12: i64 = AssertSext t6, ValueType:ch:i32
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t10: i64 = AssertSext t4, ValueType:ch:i32
        t20: i64 = mul t12, t10
          t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t8: i64 = AssertSext t2, ValueType:ch:i32
      t23: i64 = add t20, t8
    t24: i64 = sign_extend_inreg t23, ValueType:ch:i32
  t18: ch,glue = CopyToReg t0, Register:i64 $x10, t24
  t19: ch = RISCVISD::RET_GLUE t18, Register:i64 $x10, t18:1

How to read these lines?

First, let's print a graphical representation of this DAG with ./llc -view-isel-dags ./main.ll
selectionDAG

Each line corresponds to a node in this graph. For example t23: i64 = add t20, t8

t23: refers to the name of the node
i64 is the output type of the node
add is the operation of the node. These are predefined in LLVM.
t20, t8 are the operands of the addition.

A single node is showing the same information as the text.

The first row indicates the number of operands. In the graph below, the node has two inputs.
The second row indicates the type of the node. LLVM IR is transformed into a SelectionDAG and its node types are predetermined. In this case, the node is a multiplication node. However, backends can create their own nodes and use them instead. For example, the selectionDAG graph from before has a node RISCVISD::RET_GLUE, which was created by the backend.
The third row shows the name of the node.
And finally, the last row represents the outputs of the node. For the multiplication node, the output is a i64 value. Of course, some nodes can return multiple values.

The edges in the graph represent dependencies.

The black edge represents a data dependency. For example, the addition node has an edge from the 0th operand to the multiplication node. Therefore, the multiplication's computation is used as an input for the addition.
The blue edge represents a chain dependency.
And the red edge represents a glue dependency.

ch represents a chain. Both ch and glue define dependencies in the resulting assembly. The distinction between both was mentioned here.

Instruction Selection

The initial question of this blog post was in which order these nodes get selected. When opening the previously generated debug.txt then we can see the evaluation order.

ISEL: Starting selection on root node: t19: ch = RISCVISD::RET_GLUE t18, Register:i64 $x10, t18:1
ISEL: Starting pattern match
  Morphed node: t19: ch = PseudoRET Register:i64 $x10, t18, t18:1
ISEL: Match complete!

This indicates that it started with the last instruction of the program which is probably meant with "bottom up" in the Nvidia presentation. It continues with the next nodes which aren't that interesting.

ISEL: Starting selection on root node: t18: ch,glue = CopyToReg t0, Register:i64 $x10, t24

ISEL: Starting selection on root node: t24: i64 = sign_extend_inreg t23, ValueType:ch:i32
ISEL: Starting pattern match
  Initial Opcode index to 1284951
  Match failed at index 1284954
  Continuing at 1284986
  Skipped scope entry (due to false predicate) at index 1284997, continuing at 1285004
Creating constant: t27: i64 = TargetConstant<0>
  Morphed node: t24: i64 = ADDIW t23, TargetConstant:i64<0>
ISEL: Match complete!

It then tries to match the multiplication operation. This is actually more interesting since it is important to know what comes next.

ISEL: Starting selection on root node: t23: i64 = add t20, t8
ISEL: Starting pattern match
  Initial Opcode index to 39213
  Skipped scope entry (due to false predicate) at index 39223, continuing at 39255
  Skipped scope entry (due to false predicate) at index 39256, continuing at 39288
  Skipped scope entry (due to false predicate) at index 39289, continuing at 39321
  Skipped scope entry (due to false predicate) at index 39322, continuing at 39354
  Skipped scope entry (due to false predicate) at index 39355, continuing at 39458
  Skipped scope entry (due to false predicate) at index 39459, continuing at 39491
  Skipped scope entry (due to false predicate) at index 39492, continuing at 39524
  Skipped scope entry (due to false predicate) at index 39525, continuing at 39557
  Match failed at index 39221
  ... 
  Skipped scope entry (due to false predicate) at index 43164, continuing at 43176
  Morphed node: t23: i64 = ADDW t20, t8
ISEL: Match complete!

And that the addition is matched before the multiplication makes perfect sense. Imagine, you would have a fused-multiply-add instruction then you would expect that it is checked before the individual patterns for multiplication and addition. This is ensured by ranking the instruction selection patterns and evaluating a fused-multiply-add before its constituents. Therefore, the resulting order is t19 -> t18 -> t24 -> t23 -> t20 -> t12 -> t10 -> t8 -> t6 -> t4 -> t2 -> t17 -> t5 -> t3 -> t1 -> t0.

Resulting in this selection:

Selected selection DAG: %bb.0 '_Z3fooiii:entry'
SelectionDAG has 12 nodes:
  t0: ch,glue = EntryToken
        t6: i64,ch = CopyFromReg t0, Register:i64 %2
        t4: i64,ch = CopyFromReg t0, Register:i64 %1
      t20: i64 = MULW t6, t4
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
    t23: i64 = ADDW t20, t2
  t18: ch,glue = CopyToReg t0, Register:i64 $x10, t23
  t19: ch = PseudoRET Register:i64 $x10, t18, t18:1