Modern
Computers (Circa 2007)
Computing
machines are very common in a modern industrialized society.
The
number of functions performed by these devices is almost endless.
Here is a partial list.
1. General numerical computation, involving both
integers and real numbers.
2. Device automation and control.
3. Message switching, including routers and
firewalls on the Internet.
4. Computer–generated graphics.
5. Graphics–based computer games.
6. Computer–enhanced video.
(How about those extra lines
superimposed on football fields?)
Computers come in two broad classes:
General purpose these are adaptable to a wide
variety of programs.
Special purpose these are designed for one purpose only;
e.g. routers.
Special–purpose computers are usually limited to high
volume markets. It is
often easier to adapt a general–purpose computer to do the job.
General
Purpose Computers
This course will focus on general purpose computers,
also called “Stored
Program Computers” or “Von Neumann
Machines”.
In a stored program computer, a program and its
starting data are read into
the primary memory of a computer and then executed. Early computers
had no memory into which programs could be stored.
The first stored program computer designed was the EDVAC, designed by John
Von Neumann (hence the name), John Mauchley, and J. Presper Eckert. The
“Electronic Discrete Variable Automatic Computer” was described in a paper,
published on June 30, 1945 with Von Neumann as the sole author.
The first stored program computer to become
operational was the EDSAC
(Electronic Delay Storage Automatic Computer), completed May 6, 1949.
This was developed by Maurice Wilkes of
The first stored program computer that contained all
of the components of a
modern computer was the MIT Whirlwind,
first demonstrated on April 20,
1951. It was the first to use magnetic
core memory.
Components
of a Stored Program Computer
The four major components of a modern stored program
computer are:
1. The Central Processing Unit (CPU)
2. The Primary Memory (also called “core memory”
or “main memory”)
3. The Input / Output system
4. One or more system busses to allow the
components to communicate.
Major
Components Defined
The system
memory (of which this computer has 512 MB) is used for transient
storage of programs and data. This is
accessed much like an array, with the
memory address serving the function
of an array index.
The Input /
Output system (I/O System) is used for the computer to save data
and programs and for it to accept input data and communicate output data.
Technically the hard drive is an I/O
device.
The Central
Processing Unit (CPU) handles execution of the program.
It has four main components:
1. The
ALU (Arithmetic Logic Unit), which
performs all of the arithmetic
and logical operations of the
CPU, including logic tests for branching.
2. The
Control Unit, which causes the CPU
to follow the instructions
found in the assembly language
program being executed.
3. The
register file, which stores data internally in the CPU. There are user
registers and special purpose
registers used by the Control Unit.
4. A
set of 3 internal busses to allow the CPU units to communicate.
A System Level
Bus, which allows the top–level components to communicate.
Reality
Intrudes (Part 1 of Many)
The design on the previous slide is logically correct,
but IT WON’T WORK.
IT IS TOO SLOW.
Problem: A single system level bus cannot handle the load.
Modern gamers demand fast video; this requires a fast
bus to the video chip.
The memory system is always a performance
bottleneck. We need a dedicated
memory bus in order to allow acceptable performance.
Here is a refinement of the above diagram.
This design is getting closer to reality. At least, it acknowledges two of the devices
requiring high data rates in access to the CPU.
Reality
Intrudes (Part 2 of Many)
We now turn to commercial realities, specifically legacy I/O devices.
When upgrading a computer, most users do not want to
buy all new I/O devices
(expensive) to replace older devices that still function well.
The I/O system must provide a number of busses of
different speeds, addressing
capabilities, and data widths, to accommodate this variety of I/O devices.
Here we show the main I/O bus connecting the CPU to
the I/O Control Hub
(ICH), which is connected to two I/O busses:
one for slower (older) devices
one for faster (newer) devices.
The Memory
Component
The memory stores the instructions and data for an
executing program.
Memory is characterized by the smallest addressable
unit:
Byte
addressable the smallest unit is
an 8–bit byte.
Word
addressable the smallest unit is a
word, usually 16 or 32 bits in length.
Most modern computers are byte addressable,
facilitating access to character data.
Logically, computer memory should be considered as an
array.
The index into this array is called the address
or “memory address”.
A logical view of such a byte addressable memory might
be written in code as:
Const MemSize =
byte Memory[MemSize] // Indexed
0 … (MemSize – 1)
The CPU has two registers dedicated to handling
memory.
The MAR (Memory Address Register) holds the
address being accessed.
The MBR (Memory Buffer Register) holds the data being written to the
memory or
being read from the memory. This is
sometimes
called the
Memory Data Register.
Multi–Level
Memory
What we want is a very large memory, in which each
memory element is
fabricated from very fast components.
But fast means expensive.
What we can afford is a very large memory, in which
each memory element
is fabricated from moderately fast, but inexpensive, components.
Modern computers achieve good performance from a
large, moderately fast,
main memory by using two levels of cache
memories, called L1 and L2.
These work due to an observed property of programs, called the locality principle.
A typical arrangement would have a large L2 cache and
a split L1 cache.
The L1 cache has an Instruction Cache and a Data Cache.
Note that the Instruction Cache (I Cache) does not
write back to the L2 cache.
Organization
of Primary Memory
We turn our attention again to the primary
memory. When we left it, we had
a linear view with an N–to–2N decoder.
We shall study decoders in a later class. At present, it should be obvious that
construction of a 32–to–4,294,967,296 decoder would be very difficult.
Memory on all modern computers is obviously built from
smaller chips. Each
of these chips will be constructed from a number of smaller chips.
For example, a 1 GB memory might have four 256 MB
memory modules.
Each 32 MB chip would be organized as eight 32 Mb
chips.
Each 32 Mb chip is organized as an 8,192–by–4,096
array.
Primary
Memory
Also called “core memory”, “store”, or “storage”.
Beginning
with the MIT Whirlwind and continuing for about 30 years, the
basic technology for primary memory involved “cores” of magnetic material.
Virtual
Memory
All modern computer systems use virtual memory. At various
times in the
course, we shall give a precise definition, but here is the common setup.
In
MS–Windows, the area of the system disk that handles virtual memory is
called the paging file. My system has a 768 MB paging file.
Memory
Creeps onto the CPU Chip
Modern computers, such as the P4, have placed both L1
caches and the L2
cache on the CPU chip itself. Here is a
picture of the P4 chip, annotated by Intel.
In older computers, the main difference between CPU
registers and memory
was that the registers were on the chip and memory was not. This no longer holds.
Memory on
the CPU Chip (Part 2)
With two L1 caches (the I
cache and the D cache) and the L2 cache on the
CPU chip, we look for another difference to distinguish user registers from
memory.
The main difference is historical. It has to do with the way that the assembly
language program accesses the device.
There are register–specific instructions and
memory–specific instructions.
A modern computer (Pentium series excepted) will have
between 8 and 32
user registers. These store temporary
results for computations.
The Pentium register set (EAX, EBX, ECX, and EDX) is
rather unusual and
would be cheerfully ignored were the Pentium not such an important design.
Modern computer architecture usually involves a series
of design tradeoffs.
Question: Should we
place more general–purpose registers on the CPU
chip or have a larger
L1 Data Cache?
Answer: Each
provides about the same improvement in performance.
Flip a coin or use
some other criterion.
The ALU
(Arithmetic Logic Unit)
The ALU performs all of the arithmetic and logical
operations for the CPU.
These include the following:
Arithmetic: addition, subtraction, negation, etc.
Logical: AND, OR, NOT, Exclusive OR, etc.
This symbol has been used for the ALU since the mid
1950’s.
It shows to inputs and one output.
The reason for two inputs is the fact that many
operations, such as addition
and logical AND, are dyadic; that
is, they take two input arguments.
Historical
Summary
Reflecting on the last 60 years of the history of
computing machines, we see
a development constrained by the available technology and economics.
We see a constant move towards devices with
less cost and physical size
more performance and reliability
(longer time between failures).
As an example, the ENIAC seldom functioned for more
than a few hours
continuously before it suffered a failure.
Memory technology is a good example. We have four stages.
1. No
memory (ENIAC).
2. Very
unreliable memory, such as mercury delay lines and Williams tubes.
3. Very
reliable memory, specifically magnetic core memory.
4. Very
reliable and inexpensive memory, specifically solid state devices.
We now begin a look at the computer from a logical
view.
The
Fetch–Execute Cycle
This cycle is the logical basis of all stored program computers.
Instructions are stored in memory as machine language.
Instructions are fetched
from memory and then executed.
The common fetch cycle can be expressed in the
following control sequence.
MAR ¬
PC. // The PC contains
the address of the instruction.
READ. //
Put the address into the MAR and read memory.
IR ¬
MBR. // Place the
instruction into the MBR.
This cycle is described in many different ways, most
of which serve to highlight
additional steps required to execute the instruction. Examples of additional steps
are: Decode the Instruction, Fetch the Arguments, Store the Result, etc.
A stored program computer is often called a “von
Neumann Machine” after one
of the originators of the EDVAC.
This Fetch–Execute cycle is often called the “von Neumann bottleneck”, as the
necessity for fetching every instruction from memory slows the computer.
What Is
Executed? The Idea of Multilevel Machines.
In discussing the fetch–execute cycle, we claimed that
each instruction is fetched
and executed. We now ask about the type
of instruction.
In order to answer this question more precisely, we
introduce the idea of a
multilevel machine and multiple levels of computer languages.
We begin this discussion by discussing three levels of
languages.
High–Level Language English–like
statements Z = X + Y
Assembly Language Mnemonic
codes Load X
Add Y
Store
Z
Machine Language Binary
numbers 0x1100
(Here
shown in 0x3101
hexadecimal
form) 0x2102
The machine language used in this example is the MARIE
design (CPSC 2105)
The
Multilevel Machine
Following Andrew Tanenbaum(1),
we define a four–level machine.
Each level of the machine corresponds to a language level.
Machine Language Language Type
M3 L3 High level language such as C++ or Java
M2 L2 Assembly language
M1 L1 Binary machine language
M0 Control Signals Microarchitecture level
Following Tanenbaum, we define a virtual machine as a hypothetical computer
that directly executes language at its level.
For example, M3 as a virtual machine
directly executes high level language programs.
The student should be aware that there is another,
very important, use of the term
virtual machine, with an entirely different definition. We use that later.
(1) Structured
Computer Organization (5th Edition) by Andrew S. Tanenbaum.
ISBN 0 – 13 –
148521 – 0. Dr. Tanenbaum defines
six levels.
Options for
Executing a High Level Language Program
There are three options for executing a L3
program. Each has been tried.
Direct
Execution. This has been tried with the
This is
much less flexible than the other two approaches,
much
more difficult to implement, and less efficient.
Translation Translate
the L3 program to a lower level language, such
as L2 or
L1. The lower level languages are much
more
based on
the computer hardware, and easier to execute.
For a
HLL, this step is called compilation.
Interpretation Write
a program in a lower level language, either L2 or
L1, that takes the L3 program as input data and causes
the
computer to achieve the desired effect.
Example: The
JVM (Java Virtual Machine) is a virtual machine that
appears
to execute the Java program directly. In
actual
fact, it
translates the Java code into byte code and
interprets
that byte code.
Levels from
the “Bottom Up”
The lowest levels of the computer were not shown on
the above diagram.
These are the digital logic level
and the analog devices upon which the
level is based.
The microarchitecture
level, the first real level, shows all of the components
of the CPU (ALU, Control Unit, internal busses, user registers, control
registers),
the set of control signals, as well
as the method of generating these signals.
At this level, the registers are connected to the ALU
to form a data path, over
which the data flow: registers to ALU, then ALU back to a register.
At this level, the basic design question is how to
build the control unit.
The ISA
(Instruction Set Architecture), the next level up, describes the
binary machine language instructions, their mnemonic representations, and
the general purpose registers that can be accessed by a machine language
program.
The Higher
Level Language level, the top level, represents the view of the
Instruction Set Architecture as seen through the compiler or interpreter for
the higher level language.
Modern
Design Realities
Some assumptions that drive current design practice
include:
1. The fact that most programs are written in
high–level compiled languages.
2. The fact that all modern compilers are
designed to emit fairly simple
machine language instructions,
assuming a simple ISA.
3. The fact that a simpler instruction set
implies a smaller control unit,
thus freeing chip area for more
registers and on–chip cache.
4. The fact that current CPU clock cycle times
(0.25 – 0.50 nanoseconds)
are much faster than memory
devices, either cache or primary memory.
5. The considerable experience in writing
sophisticated compilers that can
handle very complex constructs
and emit very efficient machine code.
NOTE: The appearance of a new memory technology
with significantly
enhanced performance would
require a completely new design
approach. This would be welcome, but quite a challenge.