Gabriel Wu's Reading Notes

This site contains my notes of a few computer science & programming books.

About Me

I am Gabriel Wu (a.k.a lucifer1004), a Ph.D student majoring in electromagnetic scattering and remote sensing at Peking University, Beijing, China.

Engineering a Compiler (3rd Edition)

This book contains my notes on Engineering a Compiler (3rd Edition) by Keith D. Cooper and Linda Torczon.

Chapter 1 Overview of Compilation

A compiler is a computer program that translates other computer programs. It contains the front end, an optional optimizer, and the back end. Connecting the front end and the back end is an intermediate representation (IR).

What is a source-to-source translator?

A source-to-source translator targets other programming languages rather than the instruction set of a computer.

What is ahead-of-time (AOT)?

AOT is the traditional approach where translation occurs in a separate step from execution.

What is just-in-time (JIT)?

JIT is a more recent approach where translation occurs at run-time. In other word, the translation is delayed or lazy.

What is a virtual machine (VM)?

A VM is a simulator for the instruction set architecture (ISA) of an idealized processor.

Sometimes, practical things are much more complex. For example, JAVA programs are compiled into JAVA bytecode by an AOT compiler, then interpreted by a JAVA Virtual Machine (JVM) which usually has a JIT integrated.

1.2 Compiler Structure

flowchart LR
    S([Source Program])-->FE
    subgraph C[Compiler]
        FE[Front End]-->|IR|O[Optimizer]
        O-->|IR|BE[Back End]
    end
    BE-->T([Target Program])
    style S fill:#f9f,stroke-width:0
    style T fill:#9f9,stroke-width:0

1.3 Overview of Translation

1.3.1 The Front End

The front end contains a scanner, a parser, and an elaborator.

The scanner converts the stream of characters from the input code into a stream of words.
The parser fits the words from the scanner to a rule-based model of the input languages's syntax, called a grammar. During the incremental parsing process, the parser may call on the elaborator to perform additional computation.

flowchart LR
    S([Source Program])-->|characters|A
    subgraph FE[Front End]
        A[Scanner]-->|words|B[Parser]
        B<-->C[Elaborator]
    end
    B-->IR([IR])
    style S fill:#f9f,stroke-width:0
    style IR fill:#9f9,stroke-width:0

To check the syntax of the input program, the parser must compare the program's structure against the grammar that defines the language.

If the parser finds a derivation (using defined grammar rules), it is proved that the input program is syntactically correct.

1.3.2 The Optimizer

Most optimizations consist of an analysis and a transformation.

The analysis determines whether the compiler can safely and profitably apply the transformation.
- Data-flow analysis
- Dependence analysis
The transformation rewrites the code into a more efficient form.

1.3.3 The Back End

The back end solves at least three major problems.

Instruction selection — Converting the IR into operations in the target processor’s ISA.
Instruction scheduling — Selecting an execution order for the operations.
Register allocation — Deciding which values should reside in registers at each point in the code.

1.4 Engineering

Quote

Compiler construction is an exercise in engineering — design and implementations under constraints.

1.5 Summary and Perspective

Exercises

Ex 1.1

Qualities I consider most important in a compiler

As a user — correctness, safety, runtime speed & resource consumption, user-friendliness, compilation speed & resource consumption
As a writer — correctness, safety, runtime speed & resource consumption, maintainability, compilation speed & resource consumption

Ex 1.2

I think it should be seen as interpretation instead of compilation, since the displayed graphics is a result, instead of a program.

Ex 1.3

The delay should be unsensible.
Resources are limited in the embedded environment.
Speed is not so important, while simplicity and clearness weigh more.
Parallel.
Compatibility.

Chapter 2 Scanners

2.1 Introduction

Scanner construction has a strong foundation in formal language theory. The compiler writer specifies the lexical structure of the input language using a set of regular expressions.

Keywords, or reserved words, are words that match the rule for an identifier but have special meanings.

2.2 Recognizing Words

2.2.1 A Formalism for Recognizers

Formally, transition diagrams can be viewed as finite automata.

A finite automaton is a 5-tuple: \((S,\Sigma,\delta,s_0,S_A)\):

\(S\) is the finite set of states, including the error state \(s_e\).

flowchart LR
    E((s_e))
    E-->|any character|E

\(\Sigma\) is the finite alphabet.
\(\delta(s,c)\) is the transition function.
\(s_0\) is the start state.
\(S_A\) is the set of accepting states.

2.2.2 Recognizing More Complex Words

Question 2.2.1 Identifier FA

flowchart LR
    S((s_0))
    M(((s_1)))
    T1(((s_2)))
    T2(((s_3)))
    T3(((s_4)))
    T4(((s_5)))
    T5(((s_6)))
    S-->|a...z,A...Z|M
    M-->|0...9|T1
    T1-->|0...9|T2
    T2-->|0...9|T3
    T3-->|0...9|T4
    T4-->|0...9|T5

Question 2.2.2 PASCAL comment FA

flowchart LR
    S((s_0))
    M((s_1))
    T(((s_2)))
    S-->|"{"|M
    M-->|"∑-}"|M
    M-->|"}"|T

2.3 Regular Expressions

The set of words accepted by a finite automaton \(F\) forms a language \(L(F)\). Any language described by an RE is considered a regular language.

2.3.1 Formalizing the Notation

Three basic operations:

Alternation: commutative.
Concatenation
Closure: A Kleene closure (*) denotes zero or more repetitions of the preceding expression. Closures with an upper bound for repetitions are called finite closures.

Precedence: Parentheses > Closure > Concatenation > Alternation

2.3.2 Examples of Regular Expressions

Quote

The cost of operating an FA is proportional to the length of the input, not to the complexity of the RE or the number of states in the FA.

2.3.3 Closure Properties of REs

REs are closed under concatenation, union, and closure.

2.4 From Regular Expression to Scanner

2.4.1 Nondeterministic Finite Automata

In Nondeterministic Finite Automata (NFA), transition functions can be multivalued and can include \(\epsilon\)-transitions.

There are two distinct models for NFAs, which are different in the behavior when the NFA must make a nondeterministic choice:

Follow the transition that leads to an accepting state.
Clone itself to pursue each possible transition.

Quote

Any DFA is a special case of an NFA.
Any NFA can be simulated by a DFA.

2.4.2 RE to NFA: Thompson's Construction

Thompson's construction uses a simple, template-driven process to build up an NFA from smaller NFAs.

2.4.3 NFA to DFA: The Subset Construction

The subset construction is actually a Breadth-First Search (BFS).

Fixed-Point Computations

The validity of the subsect construction algorithm is based on the fact that set union is both commutative and associative.

2.4.4 DFA to Minimal DFA

Category Theory for Programmers

This book contains my notes on Category Theory for Programmers by Bartosz Milewski.

Chapter 1 Category: The Essence of Composition

What makes a category?

A category consists of objects and arrows that go between them.

1.1 Arrows as Functions

Arrows can be composed.

1.2 Properties of Composition

Composition is associative.
For every object there is a unit arrow (called identity).

1.3 Composition is the Essence of Programming

What are the right chunks for composition?

The information we need in order to compose chunks ("surface area") has to increase slower than the information we need in order to implement them ("volume").

Quote

Category theory is extreme in the sense that it actively discourages us from looking inside the objects. An object in category theory is an abstract nebulous entity. All you can ever know about it is how it relates to other objects — how it connects with them using arrows.

1.4 Challenges

Challenge 1.1 Implement identify

Identity function in Rust:

fn id<A>(a: A) -> A {
    a
}

fn main() {
    assert_eq!(id(1), 1);
    assert_eq!(id(1.0), 1.0);
    assert_eq!(id("1"), "1");
}

Challenge 1.2 Implement composition

Composition function in Rust:

macro_rules! compose {
    ( $last:expr ) => { $last };
    ( $head:expr, $($tail:expr), +) => {
        compose_two($head, compose!($($tail),+))
    };
}

fn compose_two<A, B, C>(f: impl Fn(A) -> B, g: impl Fn(B) -> C) -> impl Fn(A) -> C {
    move |x| g(f(x))
}

fn main() {
    let f = |x| x + 1;
    let g = |x| x * 2;
    let h = |x| x ^ 100;
    let fgh = compose!(f, g, h);
    assert_eq!(fgh(3), 108);
}

Challenge 1.3 Validate composition

macro_rules! compose {
    ( $last:expr ) => { $last };
    ( $head:expr, $($tail:expr), +) => {
        compose_two($head, compose!($($tail),+))
    };
}

fn compose_two<A, B, C>(f: impl Fn(A) -> B, g: impl Fn(B) -> C) -> impl Fn(A) -> C {
    move |x| g(f(x))
}

fn id<A>(a: A) -> A {
    a
}

fn main() {
    let f = |x| x + 4;
    let f_then_id = compose!(f, id);
    let id_then_f = compose!(id, f);
    assert_eq!(f_then_id(5), id_then_f(5));
}

Challenge 1.4 World-wide web

It depends on how we define the arrows whether the world-wide web can be considered a category. The web pages are objects without doubt. If we define the arrows as links, then A->B and B->C does not necessarily imply A->C because there might not be a direct link from A to C, and in this case the world-wide web is not a category. However, if we define the arrows as sequences of links, we can safely assume that A->B and B->C implies A->C, and thus the world-wide web can be a category.

Challenge 1.5 Facebook

Facebook is not a category because friendship cannot be composed. Having A->B and B->C does not necessarily imply A->C.

Challenge 1.6 Directed graph

A directed graph is a category if every node has a self link, and for every triplets A, B, C, if there is an edge A->B and an edge B->C, then there must be an edge A->C.

Chapter 2 Types and Functions

2.1 Who Needs Types

Quote

Do we want to make monkeys happy, or do we want to produce correct programs?

2.2 Types Are About Composability

2.3 What Are Types?

Quote

The simplest intuition for types is that they are set of values.

Here, sets can be finite or infinite.

Ideally we can consider types to be sets and functions to be mathematical functions between sets. But we will encounter the halting problem since there are functions that never terminate. That is why we introduce \(\bot\), which means a non-terminating computation. So a function with the following signature:

fn f(x: bool) -> bool {}

may return true, false, or \(\bot\).

For example, the following program:

fn f(x: bool) -> bool {
    !f(x)
}

fn main() {
    println!("{}", f(true));
}

compiles but never terminate.

Such functions are called partial functions, as opposed to total functions, which return valid results for every possible argument.

2.4 Why Do We Need a Mathematical Model?

Quote

The problem is that it's very hard to prove things about programs using operational semantics.

An alternative is denotational semantics, which is based on math.

Difficulty: computational effects (e.g., I/O)

Solution: mapping to monads (by Eugenio Moggi)

Quote

One of the important advantages of having a mathematical model for programming is that it’s possible to perform formal proofs of correctness of software.

2.5 Pure and Dirty Functions

What is a pure function?

A pure function is a function which is guaranteed to produce the same output every time it's called with the same input, and has no side effects.

2.6 Examples of Types

Empty set: Haskell Void (not C++ void, which is actually unit).

Singleton set: C++ void, Haskell/Rust ().

A function without explicit arguments does not mean it takes nothing. This is best shown in Haskell:

f44 :: () -> Integer 
f44 () = 44

The function f44 above obviously takes (), which is the only element in the singleton unit set.

Quote

Functions that can be implemented with the same formula for any type are called parametrically polymorphic.

#![allow(unused)]
fn main() {
fn unit<T>(x: T) {}
}

Two-element set: C++/Rust bool, Haskell Bool. Functions to these two-element sets are called predicates.

2.7 Challenges

Challenge 2.1 Higher-order `memoize`

use std::collections::HashMap;
use std::hash::Hash;

fn memoize<A, B, F>(f: F) -> impl FnMut(A) -> B
where
    F: Fn(A) -> B,
    A: Eq + Hash + Clone,
    B: Clone,
{
    let mut cache: HashMap<A, B> = HashMap::new();
    move |arg| {
        if let Some(result) = cache.get(&arg) {
            result.clone()
        } else {
            println!("Evaluated!");
            let result = f(arg.clone());
            cache.insert(arg, result.clone());
            result
        }
    }
}

fn main() {
    let mut f = memoize(|n| 2 * n);
    println!("{}", f(20)); // First-time call, evaluated
    println!("{}", f(20)); // Memoization used
}

Challenge 2.2 Memoize RNG

This cannot be reached.

Challenge 2.3 Memoize RNG with seed

extern crate rand;
extern crate rand_chacha;
use rand::prelude::*;
use rand_chacha::ChaCha8Rng;

fn main() {
    let mut memoized_rng = |seed: u64| {
        let mut rng = ChaCha8Rng::seed_from_u64(seed);
        rng.gen::<u64>()
    };
    println!("{}", memoized_rng(0));
    println!("{}", memoized_rng(0));
    println!("{}", memoized_rng(1));
    println!("{}", memoized_rng(1));
}

Challenge 2.4 Which is pure

The factorial function is pure. Other functions have side effects (input, output, global state).

Challenge 2.5 Bool to Bool

#![allow(unused)]
fn main() {
fn id(x: bool) -> bool {
    x
}

fn not(x: bool) -> bool {
    !x
}

fn always_true(x: bool) -> bool {
    true
}

fn always_false(x: bool) -> bool {
    false
}
}

Challenge 2.6 Draw category

flowchart LR
    Void
    Unit
    Unit -->|id| Unit
    Bool
    Bool -->|unit| Unit
    Unit -->|true| Bool
    Unit -->|false| Bool
    Bool -->|id| Bool
    Bool -->|not| Bool
    Bool -->|true| Bool
    Bool -->|false| Bool

The Calculus of Computation

This book contains my notes on The Calculus of Computation by Aaron R. Bradley and Zohar Manna.

Chapter 1 Propositional Logic

1.1 Syntax

not: \(\neg F\)
and: \(F_1\land F_2\)
or: \(F_1\lor F_2\)
implication: \(F_1\implies F_2\)
if and only if: \(F_1\iff F_2\)

Precedence from highest to lowest: \(\neg\), \(\land\), \(\lor\), \(\implies\), \(\iff\)

Associativity of binary operators: \(\land\) and \(\lor\) are left-associative, \(\implies\) and \(\iff\) are right-associative.

1.2 Semantics

An interpretation:

\[I: P\mapsto\mathrm{true}, Q\mapsto\mathrm{false},\dots\]

A truth table:

\(F\)	\(\neg F\)
0	1
1	0

Given that all symbols used are included in the interpretation, we can evaluate an arbitrary expression.

Constants:

\[\forall I, I\vDash\top, I\nvDash\bot\]

1.3 Satisfiability and Validity

1.3.1 Truth Tables

1.3.2 Semantic Arguments

1.4 Equivalence and Implication

1.5 Substitution

1.6 Normal Forms

1.7 Decision Procedures for Satisfiability

1.7.1 Simple Decision Procedures

1.7.2 Reconsidering the Truth-Table Method

1.7.3 Conversion to an Equisatisfiable Formula in CNF

1.7.4 The Resolution Procedure

1.7.5 DPLL

1.8 Summary

Exercises

Geometric Data Structures for Computer Graphics

This book contains my notes on Geometric Data Structures for Computer Graphics by Elmar Langetepe and Gabriel Zachmann.

Chapter 1 Quadtrees and Octrees

1.1 Definition

Binary tree -> Quadtree -> Octree.

1.2 Complexity and Construction

Quote

The depth of a quadtree for a set \(P\) of points in the plane is at most \(\log(s/c)+\frac{3}{2}\), where \(c\) is the smallest distance between any two points in \(P\) and \(s\) is the side length of the initial square.

Neighbor finding: \(\mathcal{O}(d+1)\) time

Balancing a quadtree: \(\mathcal{O}((d + 1)n)\) time and \(\mathcal{O}(n)\) space, see de Berg et al., 2008.

1.3 Height Field Visualization

Example: Digital Elevation Model (DEM).

Height fields can be stored as Triangular Irregular Networks (TINs). TINs are efficient but not visualization-friendly.

Use quadtrees.

Problem: neighboring fine and coarse cells. This is called T-vertices.

Solution: triangulate each node. When subdividing one triangle, also subdivide the one on the otherside. This leads to a 4-8 division.

1.4 Isosurface Generation

Curvilinear grids (physical space) -> regular grids (computational space).

Octrees offer a simple way to computer isosurfaces efficiently. It helps to discard large parts of the volume where the isosurface is guaranteed to not be.

Each leaf holds the data range information for the eight nodes of the cell.

If the isosurface crosses an edge of a cell, that edge will be visited exactly four times. Use a hash table to memoize. And delete the entry after the fourth visit to keep the hash table small.

1.5 Ray Shooting

Partition the universe into a regular grid to avoid checking the ray against all objects.

1.6 3D Octree

Ray shooting with octrees:

Bottom-up: starts at the leaf that contains the origin of the ray, and tries to find the next neighbor cell.
Top-down: starts at the root and tries to find all the nodes that are stabbed by the ray.

A top-down method:

Parameterize the ray as: \(\vec{r}(t) = \vec{p} + t\vec{d}\)
Determine on which side the ray enters the current cell
Determine the first subcell to visit

Negative directions can be handled efficiently by an XOR operation.

1.7 5D Octree

Rays are essentially static objects, just like the geometry of the scene!

Discretize the rays based on the direction cube. There is a one-to-one mapping between direction vectors and points on all six sides of the cube.

We need six 5D octrees, one for each side. For a given side, the \(xyz\)-intervals define a box, and the remaining \(uv\) intervals define a cone.

Patterns for Parallel Programming

This book contains my notes on Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill.

Chapter 1 A Pattern Language for Parallel Programming

1.1 Introduction

1.2 Parallel Programming

Quote

The key to parallel computing is exploitable concurrency.

Even when a parallel program is "correct", it may fail to deliver the anticipated performance improvement from exploiting concurrency. Care must be taken to ensure that the overhead incurred by managing the concurrency does not overwhelm the program runtime

1.3 Design Patterns and Pattern Languages

1.4 A Pattern Language for Parallel Programming

flowchart TB
    FC[Finding Concurrency] <--> AS[Algorithm Structure]
    AS <--> SS[Supporting Structures]
    SS <--> IM[Implementation Mechanisms]

Chapter 2 Background and Jargon of Parallel Computing

2.1 Concurrency in Parallel Programs versus Operating Systems

2.2 Parallel Architectures: A Brief Introduction

2.2.1 Flynn's Taxonomy

Four possibilities:

SISD (Single Instruction, Single Data)
SIMD (Single Instruction, Multiple Data)
MISD: mentioned for completeness
MIMD

2.2.2 A Further Breakdown of MIMD

Shared memory
- SMP (Symmetric Multiprocessors)
- NUMA (Non-Uniform Memory Access)
Distributed memory
- MPP (Massively Parallel Processors)
- Clusters
Hybrid

2.2.3 Summary

2.3 Parallel Programming Environments

Before: a long, long list...

Now:

OpenMP: a set of extensions to sequential programming languages
MPI: a library of routines to be called from programs written in a sequential programming language
Hybrid: OpenMP for each node and MPI between nodes
Built-in: Languages with built-in features to support parallel programming

2.4 The Jargon of Parallel Computing

Task
Unit of execution (UE)
Processing element (PE)
Load balancing
Synchronization
Synchronous versus asynchronous
Race conditions
Deadlocks

2.5 A Quantitative Look at Parallel Computation

A few formulae for measuring the efficiency.

2.6 Communication

2.6.1 Latency and Bandwidth

Fixed cost (latency) plus cost proportional to the amount of data (data / bandwidth).

2.6.2 Overlapping Communication and Computation and Latency Hiding

Continue computation instead of waiting for the message idly.

2.7 Summary

Chapter 3 The Finding Concurrency Design Space

3.1 About the Design Space

Quote

Parallel programs attempt to solve bigger problems in less time by simultaneously solving different parts of the problem on different processing elements. This can only work, however, if the problem contains exploitable concurrency, that is, multiple activities or tasks that can execute at the same time.

3.1.1 Overview

flowchart LR
    subgraph D[Decomposition]
        direction TB
        TD[Task Decomposition]
        DD[Data Decomposition]
    end

    subgraph DA[Dependency Analysis]
        GT[Group Tasks]<-->OT[Order Tasks]
        OT<-->DS[Data Sharing]
    end

    D<-->DA
    DA<-->DE[Design Evaluation]

3.1.2 Using the Decomposition Patterns

3.1.3 Background for Examples

Medical Imaging

PET (Positron Emission Tomography)

TD: Associating each trajectory with a task
DD: Partition the body into sections

Linear Algebra

Molecular Dynamics

Pick a distance beyond which the force contribution is so small that it can be ignored.

3.2 The Task Decomposition Pattern

How can a problem be decomposed into tasks that can execute concurrently?

A good understanding of the problem being solved.
- Which are the computationally intensive parts of the problem?
- What are the key data structures?
- How does the data unfold?
Define the tasks that make up the problem.
- How do they imply the data decomposition?

Design principles:

Flexibility
Efficiency
Simplicity

Solution: first identify as many tasks as possible and merge them later on.

Where are the tasks?

Functional decomposition: a task for each function call.
Loop splitting: map each iteration onto a task.
Decompose a large data structure into individual chunks.

The next step will be data decomposition.

Quote

Many problems can be decomposed primarily in terms of data or primarily in terms of tasks. If a task-based decomposition avoids the need to break up and distribute complex data structures, it will be a much simpler program to write and debug. On the other hand, if memory and/or network bandwidth is a limiting factor, a decomposition that focuses on the data might be more effective.

A detailed example (Figure 3.3) is given for molecular dynamics.

3.3 The Data Decomposition Pattern

How can a problem's data be decomposed into units that can be operated on relatively independently?

A data-based decomposition is good if:

The most computationally intensive part is organized around the manipulation of a large data structure.
Similar operations are being applied to independent parts of the data structure.

For shared-memory environments, the data decomposition is often implied by the task decomposition. But for distributed-memory environments, the data decomposition must be done by hand.

Look at the central data structures, instead of the tasks.

There might be some symmetry in the data structure which can be exploited.
In the distributed-memory case, some data can be replicated on each node to save communication.

3.4 The Group Tasks Pattern

3.5 The Order Tasks Pattern

3.7 The Design Evaluation Pattern

3.8 Summary

Structured Parallel Programming

This book contains my notes on Structured Parallel Programming by Michael McCool, Arch D. Robinson and James Reinders.

Chapter 1 Introduction

Quote

All computers are now parallel. Specifically, all modern computers support parallelism in hardware through at least one parallel feature, including vector instructions, multithreaded cores, multicore processors, multiple processors, graphics engines, and parallel co-processors.

In short, parallel programming is programming.

1.1 Think Parallel

Operations need to be done simultaneously. If the original iterative form is not suitable, we need to convert them to the appropriate parallel structure.

1.2 Performance

Two problems:

Computation may not be the bottleneck. Instead, access to memory or communication may be.
The potential for scaling performance is constrained by the algorithm's span (longest sequential-only part, or the critical path).

This book focuses on the shared memory machine model.

1.3 Motivation: Pervasive Parallelism

1.3.1 Hardware Trends Encouraging Parallelism

Three walls:

Power wall
Instruction-level parallelism (ILP) wall
Memory wall

1.3.2 Observed Historical Trends in Parallelism

1.3.3 Need for Explicit Parallel Programming

Serial traps: the long-sustained serial illusion has built traps into our tools and ways of thinking. For example, pointers work well in serial programs but become a nightmare in parallel programs.

So we need to:

Create appropriate tools.
Think in parallel.

Two types of parallelism:

Mandatory parallelism
Optional parallelism