Before you continue reading, note that the capital-T Trees we talk about in this section are different than the trees from SICP. In SICP (and in the previous sections), trees are simply a fancy word for deep lists. In this section, we introduce a new concept, Trees, which are an abstract data type (ADT). These Trees must respect certain abstraction barriers. When you hear most computer scientists talk about Trees in the real-world, they are typically talking about this ADT.
As with lists and sentences, we can also store data in the Trees data structure. They are generally useful for providing hierarchy, ordering, and composition.
The name comes from the downwards branching structure, similar to real trees but upside down. A node is a point at the Tree. Each node contains a datum ("U.S.A", "California" are some datums). Notice that a node can contain another Tree. The node with "California" can be regarded as a Tree with "California" at the top. Because of this, nodes and Trees are the same thing! We generally use 'Tree' to refer to the whole structure. Another synonym for a node is subtree.
We have our own ADT to represent Trees that we will use for the rest of this lesson, but there is no official way to represent Trees. Why? This is because there are several different design choices to make when creating a Tree ADT:
Different representations of Trees will give you different limitations, features, and functionalities.
Here are the built-in constructors and selectors for Trees:
make-treetakes in two arguments, a datum and a list of its children, and creates a Tree ADT.
datumtakes in a node and returns the datum that the node stores.
childrentakes in a node and returns the list of its children.
One way to implement the Trees described above is with the following definitions:
(define (make-tree datum children) (cons datum children)) (define (datum node) (car node)) (define (children node) (cdr node))
children accepts a node as its single argument and returns its children, a list of Trees. A list of Trees is called a forest. Remember that
Trees and forests are two different data structures! In addition, you
should think of a forest as a list of Trees, but you should NOT think
of a Tree as a bunch of
To reiterate, the constructor and selectors for forests are
cdr, while the constructor and selectors for (this ADT of) Trees are
Additionally, since a leaf is a node with no children, we could use a predicate like this to check whether a node is a leaf:
(define (leaf? node) (null? (children node)))
Remember that using lists is just one way to represent Trees. We can't
assume that someone who designed the ADT would use a list. For example, if
maple is a Tree, we can't assume that
(cdr maple) will give us the children. Instead, we must respect the data abstraction and use the constructors and selectors they provide for us.
We cannot stress enough that you cannot make any assumptions on how a Tree ADT is implemented. When working on Trees, you can only use the constructors/selectors that are
provided. Since forests are implemented as a list of Trees, you can use
of a forest to find the first Tree, or
cdr to find a list of the rest of the
Something useful that we do to Trees often is to map a certain operation to it, akin to mapping over a list. We can achieve this by the following:
(define (treemap fn tree) (make-tree (fn (datum tree)) (map (lambda (t) (treemap fn t)) (children tree) )))
We apply the function to our datum, and map the function recursively on the children.
Make sure you stare at the above code until it makes sense.
Here is an alternative way to define
treemap that applies a function
throughout the tree. Observe how the overall process is recursive, but
does not directly call itself.
treemap will be (mostly) responsible for
fn to the datum of a single Tree. Who handles the forest? Well,
treemap will call a helper procedure,
forest-map, which applies
fn to all
elements in the forest.
(define (treemap fn tree) (make-tree (fn (datum tree)) (forest-map fn (children tree))))
fn to the forest? Well, a forest is just a list
of Trees, and we know that we have
treemap that handles a single Tree. So, all we
need to do is recursively call
treemap on all Trees in the forest!
(define (forest-map fn forest) (if (null? forest) '() (cons (treemap fn (car forest)) (forest-map fn (cdr forest)))))
The pattern of A calling B and B calling A is called mutual
Let's use mutual recursion to write the procedure
count-leaves, which returns the number of leaves in a tree. Let's take this step by step. Since we're using mutual recursion, that means we'll need a procedure to manage Trees,
count-leaves, and a procedure to manage forests,
Here is the code for
(define (count-leaves tree) (if (leaf? tree) 1 (count-leaves-in-forest (children tree))))
null?, return 0.
carof the forest to find how many leaves are in the first Tree of the forest.
cdrof the forest to find the number of leaves in the rest of the forest.
Here is the code for
(define (count-leaves-in-forest forest) (if (null? forest) 0 (+ (count-leaves (car forest)) (count-leaves-in-forest (cdr forest)))))
We have seen how we can store and find elements in Trees. Now, many situations that use the Tree data structure involves visiting all of our nodes and do something with all of the elements. The obvious way is to go up-down and left-right, but there are many other ways we can traverse a Tree.
Depth First Search (DFS) is when you explore a node's children before its siblings. The name comes from the fact that you are going as deep as you can in one branch before looking at other branches. The gif below demonstrates this. The numbers indicate the order in which the nodes are visited.
Note that it finishes exploring one branch before exploring other branches.
We can demonstrate this in Racket. Lets say we want to print every single node. Our Tree ADT actually follows the same structure, so our implementation of dfs is rather simple:
(define (depth-first-search tree) (print (datum tree)) (for-each depth-first-search (children tree)))
Breadth First Search (BFS) explores the siblings before its children. It's easier to imagine this as looking at the graph in 'layers'. First we look at the Tree's root, then its children, followed by its grandchildren, and so on. The gif below demonstrates this:
Implementing BFS in Racket is slightly harder because our ADT stores information in a different order than the order in which BFS traverses. One way around this is to use another data structure called a queue, which stores (in order) the nodes that are going to be checked next.
(define (breadth-first-search tree) (bfs-iter (list tree))) (define (bfs-iter queue) (if (null? queue) 'done (let ((task (car queue))) (print (datum task)) (bfs-iter (append (cdr queue) (children task))))))
Lets walk through how the code above works using the example Tree below. The arrows in the diagram indicate the parent --> child relationships.
bfs-iter is first called, the whole Tree is put into the
simplify things, let's say a tree is denoted by its root.
It dequeues node F, prints the value of node F, and recursively calls
bfs-iter with the
rest of the queue and the children of F. The rest of the queue is empty, but
the children of F is B G.
queue: B G
bfs-iter will print the node of the first tree in the queue, B and
bfs-iter with the rest of the queue, G, and the
children of B, A D.
queue: G A D
And so on until the queue is empty. Once the queue is empty, we will have printed out each node's datum exactly once.
Note that the siblings are always first in the queue and the children are entered from the back. This ensures that siblings are checked first before children.
Is one better than the other? It depends on what you are trying to do with your Trees and how you are storing elements in the Tree.
The Tree below represents things in a house. In a "House" you can find a "Kitchen" and "Cat Food". In a "Kitchen" you can find a "Drawer", "Trash Can", etc. The leaves contain food and the deeper you go, the more filling the foods are.
Here are the takeaways from this subsection: