Roadmap1

Roadmap 1: Threads and Context Switch

In this roadmap, by tracing ThreadTest(), we study the following issues:

Data structures of thread;

Execution stack, what happens when a new thread gets run, what happens when a thread finishes;

SWITCH.

This roadmap is intended to assist you in reading the nachos code for the Threads assignment. It is an attempt to guide you through the code while highlighting important points. Prior to reading this document, it is a good idea to review the basic concepts of C++ (if you are not familiar with the language). For example, read "A quick introduction to C++" (nachos/c++example/c++.ps). In particular, make sure you understand how Class objects are created and deleted.

One of the first things that you will notice when you go through the nachos code is that it is broken down into many different files. The name of each file identifies its associated code. For example, all the data structures for Threads are defined in threads/thread.h and its function members are defined in threads/thread.cc. This practice of grouping greatly improves the organization of code, and therefore its readability.

You will need to view the code from the following files, all in threads/ directory, while progressing through this roadmap.

kernel.*
main.*
thread.*
switch.*
scheduler.*

The first task for you is to review the Nachos kernel header file kernel.h. Most of the data structure information for the Nachos code is contained in this file. Note that it also makes references to other header files as shown below

lib/utility.h
threads/thread.h
threads/scheduler.h
machine/interrupt.h
machine/stats.h
machine/alarm.h
machine/machine.h
filesys/filesys.h

Let’s start with the file threads/main.cc. Here are a few lines of code from this routine.


int
main(int argc, char **argv)
{
    ...
    kernel = new Kernel(argc, argv);
    kernel->Initialize();

    CallOnUserAbort(Cleanup);           // if user hits ctl-C

    kernel->ThreadSelfTest();

    ...
    return 0;
}

The kernel object is created and it runs Initialize():

threads/kernel.cc


void
Kernel::Initialize()
{
    stats = new Statistics();           // collect statistics
    interrupt = new Interrupt;          // start up interrupt handling
    scheduler = new Scheduler();        // initialize the ready queue
    alarm = new Alarm(randomSlice);     // start up time slicing
    machine = new Machine(debugUserProg);

    ...
    interrupt->Enable();
}

It initializes: stats, interrupt, scheduler, alarm, and machine.

threads/kernel.cc


void
Kernel::ThreadSelfTest()
{
    ...
    currentThread->SelfTest();   // test thread switching
    ...
}

The currentThread, which is a part of the kernel, run the thread function SelfTest();

threads/thread.cc


void
Thread::SelfTest()
{
    DEBUG(dbgThread, "Entering Thread::SelfTest");

    Thread *t = new Thread("forked thread");

    t->Fork((VoidFunctionPtr) SimpleThread, (void *) 1);
    kernel->currentThread->Yield();
    SimpleThread(0);
}

In this function, it first constructs an object of class Thread. Then it calls Fork() and passes the function SimpleThread() to the new thread (identified by 1). Now we have two threads in the nachos process. One is associated with main(), another is associated with SimpleThread(). Both threads share the UNIX nachos process address space. Then the parent thread relinquishes the cpu. When Yield() returns, the main thread also calls SimpleThread(0) (identified by 0).

threads/thread.cc


//----------------------------------------------------------------------
// SimpleThread
//      Loop 5 times, yielding the CPU to another ready thread
//      each iteration.
//
//      "which" is simply a number identifying the thread, for debugging
//      purposes.
//----------------------------------------------------------------------

static void
SimpleThread(int which)
{
    int num;

    for (num = 0; num < 5; num++) {
        cout << "*** thread " << which << " looped " << num << " times\n";
        kernel->currentThread->Yield();
    }
}

NOTE

As shown in the code above, it simply prints a message, and then calls Yield(). Before looking into Yield(), we study the structure of a thread.

threads/thread.*


class Thread {
  private:
    // NOTE: DO NOT CHANGE the order of these first two members.
    // THEY MUST be in this position for SWITCH to work.
    int* stackTop;			 // the current stack pointer
    int machineState[MachineStateSize];  // all registers except for stackTop

  public:
    ...

  private:
    // some of the private data for this class is listed above
    
    int* stack; 	 		// Bottom of the stack 
					// NULL if this is the main thread
					// (If NULL, don't deallocate stack)
    ThreadStatus status;		// ready, running or blocked
    char* name;
    ...
}

NOTE

In addition to the data members and functions, note the constructor (Thread) and destructor (~Thread) functions. Can you identify their purposes?

Note that only portions of the Thread class are shown above. Make sure you review thread.h in its entirety to understand all of its data and function members. The mental picture of the thread data structure should look something like this (only some of the data members are shown)


        |------------------|
        |  Stacktop ---->  | - pointer to top of stack
        |------------------|
        | InitialPCState   | S0  ---
        |------------------|        |
        | InitialArgState  | S1     |
        |------------------|        |
        | WhenDoneState    | S2     |
        |------------------|        |
        | StartUpPCState   | S3     |
        |------------------|        |
        |                  | S4     |----MachineState[MachineStateSize]
        |------------------|        |    -array of integers to hold
        |                  | S5     |     CPU register values for a thread
        |------------------|        |
        |                  | S6     |
        |------------------|        |
        |                  | S7     |
        |------------------|        |
        |     FPState      | FP     |
        |------------------|        |
        |     PCState      | PC ----
        |------------------|
        |      stack       |   pointer to the bottom of stack
        |------------------|
        |     status       |
        |------------------|
        |      name        |
        |------------------|

Thread::Thread(char* threadName)
{
    name = threadName;
    stackTop = NULL;
    stack = NULL;
    status = JUST_CREATED;
    ...
}

NOTE

As shown above, the thread constructor initializes an empty stack and sets the status to JUST_CREATED. (why?)

When an object of the class thread is constructed, it has an empty stack. A thread is an execution stream. Typically, a thread executes a function in a program. As in Thread::SelfTest(), the function Fork() allocates the function SimpleThread() on the stack of the new thread. Then both threads run SimpleThread().

threads/thread.cc


void 
Thread::Fork(VoidFunctionPtr func, void *arg)
{
    ...
    StackAllocate(func, arg);

    oldLevel = interrupt->SetLevel(IntOff);
    scheduler->ReadyToRun(this);	// ReadyToRun assumes that interrupts 
					// are disabled!
    (void) interrupt->SetLevel(oldLevel);
}

NOTE

Can you identify the purpose of the arguments func and arg?

"this" is a reserved word in C++, its value is a pointer to the member function's class-object argument. In this case, it is the pointer to a Thread class-object (the thread).

As shown above, Fork() makes a call to StackAllocate(). You should first review the Scheduler class data structures and function members in threads/scheduler.*. Note that before making a call to the function ReadyToRun(), the interrupts are turned off (why?). The purpose of this function is to make the thread ready to run by placing it in the ready list. Note that ReadyToRun() does not actually start running the thread, it simply changes the status of the thread to READY and appends it to the end of the Ready List. This thread will only start executing once the scheduler chooses it.

Now let's study StackAllocate(), which allocates a function on the stack of a thread.

threads/thread.cc


void
Thread::StackAllocate(VoidFunctionPtr func, int arg)
{
    stack = (int *) AllocBoundedArray(StackSize * sizeof(int));

    ...

#ifdef DECMIPS
    stackTop = stack + StackSize - 4;	// -4 to be on the safe side!
    *stack = STACK_FENCEPOST;
#endif
    ...

#ifdef PARISC
    ...
#else
    machineState[PCState] = ThreadRoot;
    machineState[StartupPCState] = ThreadBegin;
    machineState[InitialPCState] =  func;
    machineState[InitialArgState] = arg;
    machineState[WhenDonePCState] = ThreadFinish;
#endif
}

NOTE

This function initializes the execution stack of a new thread. Specifically, it initializes the program counter (PC) to point to the routine ThreadRoot. This forces execution to begin with ThreadRoot instead of the user-supplied function. (Why?) Then it assigns ThreadBegin, which actually enables interrupts. In addition, it loads the user-supplied function and its arguments if any. Finally, it assigns the function ThreadFinish(), which is really Thread::Finish() (in threads/thread.*), so upon the completion of the execution of the user-supplied function, the thread runs Thread::Finish() (What does it do?)

ThreadRoot is an assembly code. It is machine dependent. Here we use MIPS as an example. The following are the symbolic definitions.

threads/switch.s


#ifdef DECMIPS

/* Symbolic register names */
#define z       $0      /* zero register */
#define a0      $4      /* argument registers */
#define a1      $5
#define s0      $16     /* callee saved */
#define s1      $17
#define s2      $18
#define s3      $19
#define s4      $20
#define s5      $21
#define s6      $22
#define s7      $23
#define sp      $29     /* stack pointer */
#define fp      $30     /* frame pointer */
#define ra      $31     /* return address */

...
#endif  // DECMIPS

threads/switch.h


#ifdef DECMIPS

/* Registers that must be saved during a context switch.
 * These are the offsets from the beginning of the Thread object,
 * in bytes, used in switch.s
 */
#define SP 0
#define S0 4
#define S1 8
#define S2 12
#define S3 16
#define S4 20
#define S5 24
#define S6 28
#define S7 32
#define FP 36
#define PC 40

/* To fork a thread, we set up its saved register state, so that
 * when we switch to the thread, it will start running in ThreadRoot.
 *
 * The following are the initial registers we need to set up to
 * pass values into ThreadRoot (for instance, containing the procedure
 * for the thread to run).  The first set is the registers as used
 * by ThreadRoot; the second set is the locations for these initial
 * values in the Thread object -- used in Thread::AllocateStack().
 */

#define InitialPC       s0
#define InitialArg      s1
#define WhenDonePC      s2
#define StartupPC       s3

#define PCState         (PC/4-1)
#define FPState         (FP/4-1)
#define InitialPCState  (S0/4-1)
#define InitialArgState (S1/4-1)
#define WhenDonePCState (S2/4-1)
#define StartupPCState  (S3/4-1)

#endif  // DECMIPS

NOTE

SP, S0, S1, ... are the offsets (in bytes) to the beginning of the thread object. So, the stack pointer (SP) must be the first.

InitialPC, InitialArg, WhenDonePC, StartupPC are defined as the callee saved registers s0, s1, s2, s3, which are used for the pointers to the function, argument, finish routine and startup procedure respectively, so ThreadRoot can jump to these functions.

PCState, FPState, ... are the indices to the array MachineState.

After the initialization, the structure of the thread is depicted as follows


        |------------------|
        |  Stacktop        |   pointer to top of stack
        |------------------|
        | InitialPCState   |  | - pointer to the fucntion
        |------------------| 
        | InitialArgState  |  | - argument
        |------------------| 
        | WhenDoneState    |  | - ThreadFinish
        |------------------| 
        | StartUpPCState   |  | - ThreadBegin
        |------------------|
        |                  | 
        |------------------|
        |                  |
        |------------------|
        |                  |
        |------------------|
        |                  |
        |------------------|
        |     FPState      |
        |------------------|
        |     PCState      |  | - ThreadRoot
        |------------------|
        |      stack       |   pointer to the bottom of stack
        |------------------|
        |     status       |
        |------------------|
        |      name        |
        |------------------|

Now, we take a look at ThreadRoot.

threads/switch.s


        .text
        .align  2

        .globl ThreadRoot
        .ent    ThreadRoot,0
ThreadRoot:
        or      fp,z,z          # Clearing the frame pointer here
                                # makes gdb backtraces of thread stacks
                                # end here (I hope!)

        jal     StartupPC       # call startup procedure
        move    a0, InitialArg
        jal     InitialPC       # call main procedure
        jal     WhenDonePC      # when we're done, call clean up procedure

        # NEVER REACHED
        .end ThreadRoot

NOTE

ThreadRoot basically does the following three things: 1) Calls ThreadBegin that enables interrupts, 2) Calls the user supplied function, passing it the supplied argument and 3) Calls ThreadFinish to terminate the thread (clean up).

At this point you should understand

Data structures of the Thread object;

Why stackTop must be the first variable;

How a thread is set up (thread creation);

What ThreadRoot does.

Now we go back to Yield() called by SimpleThread() to study context switch.

threads/thread.*


void
Thread::Yield()
{
    Thread *nextThread;
    IntStatus oldLevel = kernel->interrupt->SetLevel(IntOff);

    ASSERT(this == kernel->currentThread);

    DEBUG(dbgThread, "Yielding thread: " << name);

    nextThread = kernel->scheduler->FindNextToRun();
    if (nextThread != NULL) {
        kernel->scheduler->ReadyToRun(this);
        kernel->scheduler->Run(nextThread, FALSE);
    }
    (void) kernel->interrupt->SetLevel(oldLevel);
}

Note

Thread::Yield() basically suspends the calling thread and selects a new one for execution. The first thing to notice is that interrupts are turned off (Why?). Next, a call is made to FindNextToRun (threads/scheduler.*) to find the next thread to run. The current thread is put back on the Ready queue by calling ReadyToRun (threads/scheduler.*). Finally, Run() is called to run the next thread.
How does currentThread relinquish the CPU and nextThread picked up by the kernel scheduler get the CPU?

threads/scheduler.*


void
Scheduler::Run(Thread *nextThread, bool finishing)
{
    Thread *oldThread = kernel->currentThread;

    ASSERT(kernel->interrupt->getLevel() == IntOff);

    if (finishing) {    // mark that we need to delete current thread
         ASSERT(toBeDestroyed == NULL);
         toBeDestroyed = oldThread;
    }

#ifdef USER_PROGRAM                     // ignore until running user programs
    ...
#endif

    oldThread->CheckOverflow();         // check if the old thread
                                        // had an undetected stack overflow

    kernel->currentThread = nextThread;  // switch to the next thread
    nextThread->setStatus(RUNNING);      // nextThread is now running

    ...

    // This is a machine-dependent assembly language routine defined
    // in switch.s.  You may have to think
    // a bit to figure out what happens after this, both from the point
    // of view of the thread and from the perspective of the "outside world".

    SWITCH(oldThread, nextThread);

    // we're back, running oldThread
    ...
}

NOTE

The first significant event happening here is that nextThread is set to the RUNNING state. Next a call is made to SWITCH() in switch.s. This function (written in assembly code) actually switches the CPU from the old Thread (oldThread) with the new one (nextThread).

threads/switch.s


        # a0 -- pointer to old Thread
        # a1 -- pointer to new Thread
        .globl SWITCH
        .ent    SWITCH,0
SWITCH:
        sw      sp, SP(a0)              # save new stack pointer
        sw      s0, S0(a0)              # save all the callee-save registers
        sw      s1, S1(a0)
        sw      s2, S2(a0)
        sw      s3, S3(a0)
        sw      s4, S4(a0)
        sw      s5, S5(a0)
        sw      s6, S6(a0)
        sw      s7, S7(a0)
        sw      fp, FP(a0)              # save frame pointer
        sw      ra, PC(a0)              # save return address

        lw      sp, SP(a1)              # load the new stack pointer
        lw      s0, S0(a1)              # load the callee-save registers
        lw      s1, S1(a1)
        lw      s2, S2(a1)
        lw      s3, S3(a1)
        lw      s4, S4(a1)
        lw      s5, S5(a1)
        lw      s6, S6(a1)
        lw      s7, S7(a1)
        lw      fp, FP(a1)
        lw      ra, PC(a1)              # load the return address

        j       ra
        .end SWITCH

In SWITCH, the first argument a0 points to the old thread (to be switched out), the second argument a1 points to the new thread (to be switched in). As it is mentioned in the code, you will probably have to do a little thinking to figure out exactly how SWITCH Works. In particular, keep the following questions in mind.

Why stackTop must be the first variable to make SWITCH work?
What address should we save for the program counter PC of the switched out thread? In other words, where do we want it to continue execution when the oldThread returns from SWITCH?
At what point can we say the Context Switch has taken place?
If the oldThread has finished, who kills it?

Finally, putting all things together,

Can you trace Thread::SelfTest(), the execution stacks, by hand?
What happens after thread 0 returns from SimpleThread(0)?
What happens after thread 1 returns from SimpleThread(1)?