For convenience, the design will be based mostly on components from the Am2900 family of bit-slice components rather than small- and mdium-scale integrated circuits (SSI and MSI) such as simple gates, multiplexers, and registers. Specifically, the Am2903 4-bit slice was chosen because it allows expansion of the on-chip 16x4 register file and can implement 3-address operations (two source registers and one destination register). The CPU will be microprogrammed but most arithmetic and logic instructions should require only a single microinstruction. Instructions that access memory, change the program flow, or affect the state of the CPU will use multiple microinstructions.
The design will employ a writable control store allowing the microcode to be loaded and examined by a maintenance processor. Support will also be provided for halting and single-stepping the microcode sequencer, as well as examining and modifying the state of the CPU.
RISC Architecture
Single Instruction Size
All instructions will be encoded in one 32-bit word. This simplifies instruction decoding since we always know the size of each instruction and therefore the location of the next instruction in advance. Loading an immediate value will require one instruction to load the least significant half of a register with a 16-bit value taken from the instruction word. The most significant half of the register will be filled with zeroes or the sign extension of the 16-bit immediate value. If necessary a second instruction is used to load the most significant half of the register.
Load-Store Architecture
Only load and store instructions will interract with memory. Most remaining instructions operate on up to three registers (e.g. Rc = Ra + Rb) and are not encumbered with complex addressing modes. This also reduces the number of data dependency hazards, since register-to-register operators are never blocked waiting for an operand to be available.
Memory Hazard Detection
A memory hazard occurs when an instruction requires a value from a register which was the destination of a prior load instruction but is still waiting for data. A special hazard register will maintain a bit for each register, with a 1 in that bit signifying the corresponding register is waiting for data. If either Ra or Rb corresponds to a bit set in the hazard register, the instruction must block until data arrives. When data arrives the corresponding bit in the hazard register is cleared.
Load & Store Queues
All effective address calculations will occur in the ALU. One the address is calculated, it will be dispatched to the load or store queue along with sufficient data to complete or abort the operation. For a load, the additional information is the register to be loaded. For a store, the additional information is the data to be written. For loads, a bit is set in the hazard register indicating the register is waiting for data.
Simplified Instruction Decoding
To simplify instruction decoding, the same bit fields in the instruction will always be used to specify the source and destination registers. This will allow those bits to be routed directly to the Am2903 bit slices.
Performance
Instructions that only operate on registers will be executed at a rate of one instruction per cycle assuming no memory hazards. This includes all simple arithmetic and logic instructions as well as instructions that load an immediate value into a register. Only complex arithmetic instructions (multiply and divide) and multi-bit shifts will require multiple cycles.
Memory Management
The design will implement a memory management unit (MMU) supporting virtual memory. It is not practical to implement a typical Translation Look-Aside Buffer (TLB) since existing content-addressable memory (CAM) devices are too slow. We will use a set associative translation cache and microcode will walk the page tables in the event of a cache miss. To improve performance, one set may be reserved for operating system (kernel) use.
Instruction and Data Caches
The design will implement separate instruction and data caches. To facilitate symmeric multi-processing (SMP) each cache will perform bus snooping to maintain cache coherency.
Memory Bus
The design will implement a synchronous memory bus similar to the VAX-11/780 Synchronous Backplane Interconnect (SBI) but with all potential bus masters having equal priority. The bus will implement geographic addressing by encoding the slot number on each slot in the backplane. Each slot will be assigned a region of the address space based on the slot number.
Direct Memory Access
The design will implement at least one DMA channel in hardware. This channel may be located on an I/O processor rather than the main CPU. The CPU may also implement block transfer or block initialize instructions in microcode.
Counter/Timer
The design will implement at least one 32-bit counter and timer to provide periodic interrupts.
Input/Output Processor
Designing peripherals is not a goal of this project as the designer already has significant experience in that area; therefore, most Input/Output (I/O) operations will be delegated to a coprocessor. This coprocessor will also double as the maintencance processor. This may be based on an existing single board computer (SBC) such as Raspberry Pi, Arduino, or Beaglebone, or an existing reference design. In that case the circuit board providing the memory bus adapter and other peripherals will appear as a HAT or shield to the SBC.
NXP provides reference designs for their microprocssors. These designs have existing Linux support.