Gabriel Wu's Reading Notes
This site contains my notes of a few computer science & programming books.
List of Books
- Engineering a Compiler (3rd Edition)
- Category Theory for Programmers
- The Calculus of Computation
- Geometric Data Structures for Computer Graphics
- Patterns for Parallel Programming
- Structured Parallel Programming
About Me
I am Gabriel Wu (a.k.a lucifer1004), a Ph.D student majoring in electromagnetic scattering and remote sensing at Peking University, Beijing, China.
Engineering a Compiler (3rd Edition)
This book contains my notes on Engineering a Compiler (3rd Edition) by Keith D. Cooper and Linda Torczon.
Chapter 1 Overview of Compilation
- 1.1 Introduction
- 1.2 Compiler Structure
- 1.3 Overview of Translation
- 1.4 Engineering
- 1.5 Summary and Perspective
- Exercises
1.1 Introduction
A compiler is a computer program that translates other computer programs. It contains the front end, an optional optimizer, and the back end. Connecting the front end and the back end is an intermediate representation (IR).
A source-to-source translator targets other programming languages rather than the instruction set of a computer.
AOT is the traditional approach where translation occurs in a separate step from execution.
JIT is a more recent approach where translation occurs at run-time. In other word, the translation is delayed or lazy.
A VM is a simulator for the instruction set architecture (ISA) of an idealized processor.
Sometimes, practical things are much more complex. For example, JAVA programs are compiled into JAVA bytecode by an AOT compiler, then interpreted by a JAVA Virtual Machine (JVM) which usually has a JIT integrated.
1.2 Compiler Structure
flowchart LR S([Source Program])-->FE subgraph C[Compiler] FE[Front End]-->|IR|O[Optimizer] O-->|IR|BE[Back End] end BE-->T([Target Program]) style S fill:#f9f,stroke-width:0 style T fill:#9f9,stroke-width:0
1.3 Overview of Translation
1.3.1 The Front End
The front end contains a scanner, a parser, and an elaborator.
- The scanner converts the stream of characters from the input code into a stream of words.
- The parser fits the words from the scanner to a rule-based model of the input languages's syntax, called a grammar. During the incremental parsing process, the parser may call on the elaborator to perform additional computation.
flowchart LR S([Source Program])-->|characters|A subgraph FE[Front End] A[Scanner]-->|words|B[Parser] B<-->C[Elaborator] end B-->IR([IR]) style S fill:#f9f,stroke-width:0 style IR fill:#9f9,stroke-width:0
To check the syntax of the input program, the parser must compare the program's structure against the grammar that defines the language.
If the parser finds a derivation (using defined grammar rules), it is proved that the input program is syntactically correct.
1.3.2 The Optimizer
Most optimizations consist of an analysis and a transformation.
-
The analysis determines whether the compiler can safely and profitably apply the transformation.
- Data-flow analysis
- Dependence analysis
-
The transformation rewrites the code into a more efficient form.
1.3.3 The Back End
The back end solves at least three major problems.
- Instruction selection — Converting the IR into operations in the target processor’s ISA.
- Instruction scheduling — Selecting an execution order for the operations.
- Register allocation — Deciding which values should reside in registers at each point in the code.
1.4 Engineering
Compiler construction is an exercise in engineering — design and implementations under constraints.
1.5 Summary and Perspective
Exercises
Ex 1.1
Qualities I consider most important in a compiler
- As a user — correctness, safety, runtime speed & resource consumption, user-friendliness, compilation speed & resource consumption
- As a writer — correctness, safety, runtime speed & resource consumption, maintainability, compilation speed & resource consumption
Ex 1.2
I think it should be seen as interpretation instead of compilation, since the displayed graphics is a result, instead of a program.
Ex 1.3
- The delay should be unsensible.
- Resources are limited in the embedded environment.
- Speed is not so important, while simplicity and clearness weigh more.
- Parallel.
- Compatibility.
Chapter 2 Scanners
- 2.1 Introduction
- 2.2 Recognizing Words
- 2.3 Regular Expressions
- 2.4 From Regular Expression to Scanner
2.1 Introduction
Scanner construction has a strong foundation in formal language theory. The compiler writer specifies the lexical structure of the input language using a set of regular expressions.
Keywords, or reserved words, are words that match the rule for an identifier but have special meanings.
2.2 Recognizing Words
2.2.1 A Formalism for Recognizers
Formally, transition diagrams can be viewed as finite automata.
A finite automaton is a 5-tuple: \((S,\Sigma,\delta,s_0,S_A)\):
- \(S\) is the finite set of states, including the error state \(s_e\).
flowchart LR E((s_e)) E-->|any character|E
- \(\Sigma\) is the finite alphabet.
- \(\delta(s,c)\) is the transition function.
- \(s_0\) is the start state.
- \(S_A\) is the set of accepting states.
2.2.2 Recognizing More Complex Words
Question 2.2.1 Identifier FA
flowchart LR S((s_0)) M(((s_1))) T1(((s_2))) T2(((s_3))) T3(((s_4))) T4(((s_5))) T5(((s_6))) S-->|a...z,A...Z|M M-->|0...9|T1 T1-->|0...9|T2 T2-->|0...9|T3 T3-->|0...9|T4 T4-->|0...9|T5
Question 2.2.2 PASCAL comment FA
flowchart LR S((s_0)) M((s_1)) T(((s_2))) S-->|"{"|M M-->|"∑-}"|M M-->|"}"|T
2.3 Regular Expressions
The set of words accepted by a finite automaton \(F\) forms a language \(L(F)\). Any language described by an RE is considered a regular language.
2.3.1 Formalizing the Notation
Three basic operations:
- Alternation: commutative.
- Concatenation
- Closure: A Kleene closure (*) denotes zero or more repetitions of the preceding expression. Closures with an upper bound for repetitions are called finite closures.
Precedence: Parentheses > Closure > Concatenation > Alternation
2.3.2 Examples of Regular Expressions
The cost of operating an FA is proportional to the length of the input, not to the complexity of the RE or the number of states in the FA.
2.3.3 Closure Properties of REs
- REs are closed under concatenation, union, and closure.
2.4 From Regular Expression to Scanner
2.4.1 Nondeterministic Finite Automata
In Nondeterministic Finite Automata (NFA), transition functions can be multivalued and can include \(\epsilon\)-transitions.
There are two distinct models for NFAs, which are different in the behavior when the NFA must make a nondeterministic choice:
- Follow the transition that leads to an accepting state.
- Clone itself to pursue each possible transition.
2.4.2 RE to NFA: Thompson's Construction
Thompson's construction uses a simple, template-driven process to build up an NFA from smaller NFAs.
2.4.3 NFA to DFA: The Subset Construction
The subset construction is actually a Breadth-First Search (BFS).
Fixed-Point Computations
The validity of the subsect construction algorithm is based on the fact that set union is both commutative and associative.
2.4.4 DFA to Minimal DFA
Category Theory for Programmers
This book contains my notes on Category Theory for Programmers by Bartosz Milewski.
Chapter 1 Category: The Essence of Composition
- 1.1 Arrows as Functions
- 1.2 Properties of Composition
- 1.3 Composition is the Essence of Programming
- 1.4 Challenges
1.1 Arrows as Functions
Arrows can be composed.
1.2 Properties of Composition
- Composition is associative.
- For every object there is a unit arrow (called identity).
1.3 Composition is the Essence of Programming
The information we need in order to compose chunks ("surface area") has to increase slower than the information we need in order to implement them ("volume").
Category theory is extreme in the sense that it actively discourages us from looking inside the objects. An object in category theory is an abstract nebulous entity. All you can ever know about it is how it relates to other objects — how it connects with them using arrows.
1.4 Challenges
Challenge 1.1 Implement identify
Identity function in Rust:
fn id<A>(a: A) -> A { a } fn main() { assert_eq!(id(1), 1); assert_eq!(id(1.0), 1.0); assert_eq!(id("1"), "1"); }
Challenge 1.2 Implement composition
Composition function in Rust:
macro_rules! compose { ( $last:expr ) => { $last }; ( $head:expr, $($tail:expr), +) => { compose_two($head, compose!($($tail),+)) }; } fn compose_two<A, B, C>(f: impl Fn(A) -> B, g: impl Fn(B) -> C) -> impl Fn(A) -> C { move |x| g(f(x)) } fn main() { let f = |x| x + 1; let g = |x| x * 2; let h = |x| x ^ 100; let fgh = compose!(f, g, h); assert_eq!(fgh(3), 108); }
Challenge 1.3 Validate composition
macro_rules! compose { ( $last:expr ) => { $last }; ( $head:expr, $($tail:expr), +) => { compose_two($head, compose!($($tail),+)) }; } fn compose_two<A, B, C>(f: impl Fn(A) -> B, g: impl Fn(B) -> C) -> impl Fn(A) -> C { move |x| g(f(x)) } fn id<A>(a: A) -> A { a } fn main() { let f = |x| x + 4; let f_then_id = compose!(f, id); let id_then_f = compose!(id, f); assert_eq!(f_then_id(5), id_then_f(5)); }
Challenge 1.4 World-wide web
It depends on how we define the arrows whether the world-wide web can be considered a category. The web pages are objects without doubt. If we define the arrows as links, then A->B
and B->C
does not necessarily imply A->C
because there might not be a direct link from A
to C
, and in this case the world-wide web is not a category. However, if we define the arrows as sequences of links, we can safely assume that A->B
and B->C
implies A->C
, and thus the world-wide web can be a category.
See also Challenge 1.6.
Challenge 1.5 Facebook
Facebook is not a category because friendship cannot be composed. Having A->B
and B->C
does not necessarily imply A->C
.
Challenge 1.6 Directed graph
A directed graph is a category if every node has a self link, and for every triplets A, B, C
, if there is an edge A->B
and an edge B->C
, then there must be an edge A->C
.
Chapter 2 Types and Functions
- 2.1 Who Needs Types
- 2.2 Types Are About Composability
- 2.3 What Are Types?
- 2.4 Why Do We Need a Mathematical Model?
- 2.5 Pure and Dirty Functions
- 2.6 Examples of Types
- 2.7 Challenges
2.1 Who Needs Types
2.2 Types Are About Composability
2.3 What Are Types?
Here, sets can be finite or infinite.
Ideally we can consider types to be sets and functions to be mathematical functions between sets. But we will encounter the halting problem since there are functions that never terminate. That is why we introduce \(\bot\), which means a non-terminating computation. So a function with the following signature:
fn f(x: bool) -> bool {}
may return true
, false
, or \(\bot\).
For example, the following program:
fn f(x: bool) -> bool {
!f(x)
}
fn main() {
println!("{}", f(true));
}
compiles but never terminate.
Such functions are called partial functions, as opposed to total functions, which return valid results for every possible argument.
2.4 Why Do We Need a Mathematical Model?
The problem is that it's very hard to prove things about programs using operational semantics.
An alternative is denotational semantics, which is based on math.
Difficulty: computational effects (e.g., I/O)
Solution: mapping to monads (by Eugenio Moggi)
One of the important advantages of having a mathematical model for programming is that it’s possible to perform formal proofs of correctness of software.
2.5 Pure and Dirty Functions
A pure function is a function which is guaranteed to produce the same output every time it's called with the same input, and has no side effects.
2.6 Examples of Types
Empty set: Haskell Void
(not C++ void
, which is actually unit
).
Singleton set: C++ void
, Haskell/Rust ()
.
A function without explicit arguments does not mean it takes nothing. This is best shown in Haskell:
f44 :: () -> Integer
f44 () = 44
The function f44
above obviously takes ()
, which is the only element in the singleton unit set.
Functions that can be implemented with the same formula for any type are called parametrically polymorphic.
#![allow(unused)] fn main() { fn unit<T>(x: T) {} }
Two-element set: C++/Rust bool
, Haskell Bool
. Functions to these two-element sets are called predicates.
2.7 Challenges
Challenge 2.1 Higher-order memoize
use std::collections::HashMap; use std::hash::Hash; fn memoize<A, B, F>(f: F) -> impl FnMut(A) -> B where F: Fn(A) -> B, A: Eq + Hash + Clone, B: Clone, { let mut cache: HashMap<A, B> = HashMap::new(); move |arg| { if let Some(result) = cache.get(&arg) { result.clone() } else { println!("Evaluated!"); let result = f(arg.clone()); cache.insert(arg, result.clone()); result } } } fn main() { let mut f = memoize(|n| 2 * n); println!("{}", f(20)); // First-time call, evaluated println!("{}", f(20)); // Memoization used }
Challenge 2.2 Memoize RNG
This cannot be reached.
Challenge 2.3 Memoize RNG with seed
extern crate rand;
extern crate rand_chacha;
use rand::prelude::*;
use rand_chacha::ChaCha8Rng;
fn main() {
let mut memoized_rng = |seed: u64| {
let mut rng = ChaCha8Rng::seed_from_u64(seed);
rng.gen::<u64>()
};
println!("{}", memoized_rng(0));
println!("{}", memoized_rng(0));
println!("{}", memoized_rng(1));
println!("{}", memoized_rng(1));
}
Challenge 2.4 Which is pure
The factorial function is pure. Other functions have side effects (input, output, global state).
Challenge 2.5 Bool to Bool
#![allow(unused)] fn main() { fn id(x: bool) -> bool { x } fn not(x: bool) -> bool { !x } fn always_true(x: bool) -> bool { true } fn always_false(x: bool) -> bool { false } }
Challenge 2.6 Draw category
flowchart LR Void Unit Unit -->|id| Unit Bool Bool -->|unit| Unit Unit -->|true| Bool Unit -->|false| Bool Bool -->|id| Bool Bool -->|not| Bool Bool -->|true| Bool Bool -->|false| Bool
The Calculus of Computation
This book contains my notes on The Calculus of Computation by Aaron R. Bradley and Zohar Manna.
Chapter 1 Propositional Logic
- 1.1 Syntax
- 1.2 Semantics
- 1.3 Satisfiability and Validity
- 1.4 Equivalence and Implication
- 1.5 Substitution
- 1.6 Normal Forms
- 1.7 Decision Procedures for Satisfiability
- 1.8 Summary
- Exercises
1.1 Syntax
- not: \(\neg F\)
- and: \(F_1\land F_2\)
- or: \(F_1\lor F_2\)
- implication: \(F_1\implies F_2\)
- if and only if: \(F_1\iff F_2\)
Precedence from highest to lowest: \(\neg\), \(\land\), \(\lor\), \(\implies\), \(\iff\)
Associativity of binary operators: \(\land\) and \(\lor\) are left-associative, \(\implies\) and \(\iff\) are right-associative.
1.2 Semantics
An interpretation:
\[I: P\mapsto\mathrm{true}, Q\mapsto\mathrm{false},\dots\]
A truth table:
\(F\) | \(\neg F\) |
---|---|
0 | 1 |
1 | 0 |
Given that all symbols used are included in the interpretation, we can evaluate an arbitrary expression.
Constants:
\[\forall I, I\vDash\top, I\nvDash\bot\]
1.3 Satisfiability and Validity
1.3.1 Truth Tables
1.3.2 Semantic Arguments
1.4 Equivalence and Implication
1.5 Substitution
1.6 Normal Forms
1.7 Decision Procedures for Satisfiability
1.7.1 Simple Decision Procedures
1.7.2 Reconsidering the Truth-Table Method
1.7.3 Conversion to an Equisatisfiable Formula in CNF
1.7.4 The Resolution Procedure
1.7.5 DPLL
1.8 Summary
Exercises
Geometric Data Structures for Computer Graphics
This book contains my notes on Geometric Data Structures for Computer Graphics by Elmar Langetepe and Gabriel Zachmann.
Chapter 1 Quadtrees and Octrees
- 1.1 Definition
- 1.2 Complexity and Construction
- 1.3 Height Field Visualization
- 1.4 Isosurface Generation
- 1.5 Ray Shooting
- 1.6 3D Octree
- 1.7 5D Octree
1.1 Definition
Binary tree -> Quadtree -> Octree.
1.2 Complexity and Construction
The depth of a quadtree for a set \(P\) of points in the plane is at most \(\log(s/c)+\frac{3}{2}\), where \(c\) is the smallest distance between any two points in \(P\) and \(s\) is the side length of the initial square.
Neighbor finding: \(\mathcal{O}(d+1)\) time
Balancing a quadtree: \(\mathcal{O}((d + 1)n)\) time and \(\mathcal{O}(n)\) space, see de Berg et al., 2008.
1.3 Height Field Visualization
Example: Digital Elevation Model (DEM).
Height fields can be stored as Triangular Irregular Networks (TINs). TINs are efficient but not visualization-friendly.
Use quadtrees.
Problem: neighboring fine and coarse cells. This is called T-vertices.
Solution: triangulate each node. When subdividing one triangle, also subdivide the one on the otherside. This leads to a 4-8
division.
1.4 Isosurface Generation
Curvilinear grids (physical space) -> regular grids (computational space).
Octrees offer a simple way to computer isosurfaces efficiently. It helps to discard large parts of the volume where the isosurface is guaranteed to not be.
Each leaf holds the data range information for the eight nodes of the cell.
If the isosurface crosses an edge of a cell, that edge will be visited exactly four times. Use a hash table to memoize. And delete the entry after the fourth visit to keep the hash table small.
1.5 Ray Shooting
Partition the universe into a regular grid to avoid checking the ray against all objects.
1.6 3D Octree
Ray shooting with octrees:
- Bottom-up: starts at the leaf that contains the origin of the ray, and tries to find the next neighbor cell.
- Top-down: starts at the root and tries to find all the nodes that are stabbed by the ray.
A top-down method:
- Parameterize the ray as: \(\vec{r}(t) = \vec{p} + t\vec{d}\)
- Determine on which side the ray enters the current cell
- Determine the first subcell to visit
Negative directions can be handled efficiently by an XOR operation.
1.7 5D Octree
Rays are essentially static objects, just like the geometry of the scene!
Discretize the rays based on the direction cube. There is a one-to-one mapping between direction vectors and points on all six sides of the cube.
We need six 5D octrees, one for each side. For a given side, the \(xyz\)-intervals define a box, and the remaining \(uv\) intervals define a cone.
Patterns for Parallel Programming
This book contains my notes on Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill.
Chapter 1 A Pattern Language for Parallel Programming
- 1.1 Introduction
- 1.2 Parallel Programming
- 1.3 Design Patterns and Pattern Languages
- 1.4 A Pattern Language for Parallel Programming
1.1 Introduction
1.2 Parallel Programming
The key to parallel computing is exploitable concurrency.
Even when a parallel program is "correct", it may fail to deliver the anticipated performance improvement from exploiting concurrency. Care must be taken to ensure that the overhead incurred by managing the concurrency does not overwhelm the program runtime
1.3 Design Patterns and Pattern Languages
1.4 A Pattern Language for Parallel Programming
flowchart TB FC[Finding Concurrency] <--> AS[Algorithm Structure] AS <--> SS[Supporting Structures] SS <--> IM[Implementation Mechanisms]
Chapter 2 Background and Jargon of Parallel Computing
- 2.1 Concurrency in Parallel Programs versus Operating Systems
- 2.2 Parallel Architectures: A Brief Introduction
- 2.3 Parallel Programming Environments
- 2.4 The Jargon of Parallel Computing
- 2.5 A Quantitative Look at Parallel Computation
- 2.6 Communication
- 2.7 Summary
2.1 Concurrency in Parallel Programs versus Operating Systems
2.2 Parallel Architectures: A Brief Introduction
2.2.1 Flynn's Taxonomy
Four possibilities:
- SISD (Single Instruction, Single Data)
- SIMD (Single Instruction, Multiple Data)
- MISD: mentioned for completeness
- MIMD
2.2.2 A Further Breakdown of MIMD
- Shared memory
- SMP (Symmetric Multiprocessors)
- NUMA (Non-Uniform Memory Access)
- Distributed memory
- MPP (Massively Parallel Processors)
- Clusters
- Hybrid
2.2.3 Summary
2.3 Parallel Programming Environments
Before: a long, long list...
Now:
- OpenMP: a set of extensions to sequential programming languages
- MPI: a library of routines to be called from programs written in a sequential programming language
- Hybrid: OpenMP for each node and MPI between nodes
- Built-in: Languages with built-in features to support parallel programming
2.4 The Jargon of Parallel Computing
- Task
- Unit of execution (UE)
- Processing element (PE)
- Load balancing
- Synchronization
- Synchronous versus asynchronous
- Race conditions
- Deadlocks
2.5 A Quantitative Look at Parallel Computation
A few formulae for measuring the efficiency.
2.6 Communication
2.6.1 Latency and Bandwidth
Fixed cost (latency) plus cost proportional to the amount of data (data / bandwidth).
2.6.2 Overlapping Communication and Computation and Latency Hiding
Continue computation instead of waiting for the message idly.
2.7 Summary
Chapter 3 The Finding Concurrency Design Space
- 3.1 About the Design Space
- 3.2 The Task Decomposition Pattern
- 3.3 The Data Decomposition Pattern
- 3.4 The Group Tasks Pattern
- 3.5 The Order Tasks Pattern
- 3.6 The Data Sharing Pattern
- 3.7 The Design Evaluation Pattern
- 3.8 Summary
3.1 About the Design Space
Parallel programs attempt to solve bigger problems in less time by simultaneously solving different parts of the problem on different processing elements. This can only work, however, if the problem contains exploitable concurrency, that is, multiple activities or tasks that can execute at the same time.
3.1.1 Overview
flowchart LR subgraph D[Decomposition] direction TB TD[Task Decomposition] DD[Data Decomposition] end subgraph DA[Dependency Analysis] GT[Group Tasks]<-->OT[Order Tasks] OT<-->DS[Data Sharing] end D<-->DA DA<-->DE[Design Evaluation]
3.1.2 Using the Decomposition Patterns
3.1.3 Background for Examples
Medical Imaging
PET (Positron Emission Tomography)
- TD: Associating each trajectory with a task
- DD: Partition the body into sections
Linear Algebra
Molecular Dynamics
Pick a distance beyond which the force contribution is so small that it can be ignored.
3.2 The Task Decomposition Pattern
How can a problem be decomposed into tasks that can execute concurrently?
- A good understanding of the problem being solved.
- Which are the computationally intensive parts of the problem?
- What are the key data structures?
- How does the data unfold?
- Define the tasks that make up the problem.
- How do they imply the data decomposition?
Design principles:
- Flexibility
- Efficiency
- Simplicity
Solution: first identify as many tasks as possible and merge them later on.
Where are the tasks?
- Functional decomposition: a task for each function call.
- Loop splitting: map each iteration onto a task.
- Decompose a large data structure into individual chunks.
The next step will be data decomposition.
Many problems can be decomposed primarily in terms of data or primarily in terms of tasks. If a task-based decomposition avoids the need to break up and distribute complex data structures, it will be a much simpler program to write and debug. On the other hand, if memory and/or network bandwidth is a limiting factor, a decomposition that focuses on the data might be more effective.
A detailed example (Figure 3.3) is given for molecular dynamics.
3.3 The Data Decomposition Pattern
How can a problem's data be decomposed into units that can be operated on relatively independently?
A data-based decomposition is good if:
- The most computationally intensive part is organized around the manipulation of a large data structure.
- Similar operations are being applied to independent parts of the data structure.
For shared-memory environments, the data decomposition is often implied by the task decomposition. But for distributed-memory environments, the data decomposition must be done by hand.
Look at the central data structures, instead of the tasks.
- There might be some symmetry in the data structure which can be exploited.
- In the distributed-memory case, some data can be replicated on each node to save communication.
3.4 The Group Tasks Pattern
3.5 The Order Tasks Pattern
3.6 The Data Sharing Pattern
3.7 The Design Evaluation Pattern
3.8 Summary
Structured Parallel Programming
This book contains my notes on Structured Parallel Programming by Michael McCool, Arch D. Robinson and James Reinders.
Chapter 1 Introduction
- 1.1 Think Parallel
- 1.2 Performance
- 1.3 Motivation: Pervasive Parallelism
- 1.4 Structured Pattern-Based Programming
- 1.5 Parallel Programming Models
- 1.6 Organization of this Book
- 1.7 Summary
All computers are now parallel. Specifically, all modern computers support parallelism in hardware through at least one parallel feature, including vector instructions, multithreaded cores, multicore processors, multiple processors, graphics engines, and parallel co-processors.
In short, parallel programming is programming.
1.1 Think Parallel
Operations need to be done simultaneously. If the original iterative form is not suitable, we need to convert them to the appropriate parallel structure.
1.2 Performance
Two problems:
- Computation may not be the bottleneck. Instead, access to memory or communication may be.
- The potential for scaling performance is constrained by the algorithm's span (longest sequential-only part, or the critical path).
This book focuses on the shared memory machine model.
1.3 Motivation: Pervasive Parallelism
1.3.1 Hardware Trends Encouraging Parallelism
Three walls:
- Power wall
- Instruction-level parallelism (ILP) wall
- Memory wall
1.3.2 Observed Historical Trends in Parallelism
1.3.3 Need for Explicit Parallel Programming
Serial traps: the long-sustained serial illusion has built traps into our tools and ways of thinking. For example, pointers work well in serial programs but become a nightmare in parallel programs.
So we need to:
- Create appropriate tools.
- Think in parallel.
Two types of parallelism:
- Mandatory parallelism
- Optional parallelism