Solution 4

6.8

You are requried to draw a figure to show the pipelines, highlight the active portions of the datapaths in your figure.

lw     $10, 20($1)
sub   $11, $2, $3
and  $12, $4, $5
or     $3, $6, $7
add  $14, $8, $9
 
IM ID/EX Reg ID/EX ALU EX/MEM DM MEM/WB Reg
lw h(right) h h(right) h h h h(right) h h(left)
sub h(right) h h(right) h h h h h(left)
and h(right) h h(right) h h h h h(left)
or h(right) h h(right) h h h h h(left)
add h(right) h h(right) h h h h h(left)

6.15

    add  $5, $6, $7    | IF | ID | EX | ME | WB |
    lw   $6, 100($7)        | IF | ID | EX | ME | WB |
    sub  $7, $6, $8              | IF | ID | EX | ME | WB |
In a diagram like that of Figure 6.44 on page 489, there will be a backward arrow from the pipeline register MEM/WB corresponding to lw instruction to an input to the ALU corresponding to sub instruction. To resolve the dependency problem, like Figure 6.45 on page 491, a bubble is introduced:
    add  $5, $6, $7    | IF | ID | EX | ME | WB |
    lw   $6, 100($7)        | IF | ID | EX | ME | WB |
    sub  $7, $6, $8              | IF | ID |bub.| EX | ME | WB |

6.28

    slt   $1,  $8,  $9      # $1=1 if $8<$9,  $1=0 otherwise
    movn  $10, $9,  $1      # copy $9 into $10 if $1=1
    movz  $10, $8,  $1      # copy $8 into $10 if $1=0

6.30

One of many possible ways of rescheduling is:
  Loop: lw    $t0, 0($s1)
        lw    $t1, -4($s1)
        addu  $t0, $t0, $s2
        addu  $t1, $t1, $s2
        sw    $t0, 0($s1)
        sw    $t1, -4($s1)
        addi  $s1, $s1, -8
        bne   $s1, $zero, Loop
Suppose that $s1 is initially 8n.
Also, assume that branch resolution is completed in the MEM stage.
The above code requires n iterations and each iteration takes 11 cycles (no stalls after lw and three delay slots after bne). It will finish on cycle 11n.
The original code on page 513 requires 2n iterations and each iteration takes 9 cycles, since there is one-cycle stall after the load and three delay slots after bne. It will finish on cycle 18n.
Thus the code after loop unrolling and rescheduling is 18/11=1.64 times as fast as the original.