Unit 4 - Linear Data Structures
Table of Contents
- Overview: Abstract Data Types, and the Stack
- Linear Data Structures: Queue
- Linear Data Structures: Deque
- Linear Data Structures: Linked List
4.1. Overview - The Stack
class
There are lots of different types of linear data structures: collections of data where order is important. When we put an item at the beginning of a list in Python, we expect it to stay at the beginning of the list. (That wouldn't be the case with a dictionary or a set.) Other examples of linear data structures include the stack, the queue, the deque and the linked list.
These linear data structures, which we will soon define, are each an example of an abstract data type (ADT). An ADT is defined in terms of its functions in a "language-agnostic" way. A stack, for example, we'll define first, and then we'll implement the structure in Python. Note that we can also implement an ADT in Java, in JavaScript, in C++, in BASIC, LISP, etc. The implementation is specific to the language, but the abstract data type itself is independent of that.
4.1.1. The Stack
Abstract Data Structure: Stack
A stack is a linear data structure in which items are always added to and removed from the same end. A stack can be visualized as a literal stack of items placed on a table. The first item is placed on the table, then a second item placed on top of the first, then a third placed on top of the second. When the items are removed from the stack, the "top" item is removed first (the third item, or last item placed there), then the second one, and finally the first item.
Because of this behavior, the stack is sometimes referred to as a "last-in, first-out" (LIFO) structure.
There are a number of real life examples that can be considered as a stack.
- Plastic trays stacked in a cafeteria. Last one placed on the stack is the first one picked up.
- The "back" button on your web browser. As you click on links you move "forward" in the browser, with previously viewed pages stacked in your history. Using the "back" button moves back through those pages, with the last page you visited shown first, and the first page you visited revealed last.
- "Paragraph-matching" in mathematical or programmatic expressions.
- HTML angle-bracket tags in a web document, and XML angle-bracket tags in an XML document
- My bedtime reading list (see right)
4.1.2. Formal Definition of the Stack ADT
For us to be able to use a stack, we want to be able to do five things:
- Construct a Stack
- push an item onto the top of the stack
- pop an item from the top of the stack
- peek at the top of the stack
- identify the size of the stack (the number of items in it)
With these specifications, we'll be creating a Stack
class that we can use to manipulate Stack
objects.
Here's a more complete abstract specification that we'll use in writing our stack class.
Stack()
creates a new, empty stack. There are no parameters, and it returns the empty stack object.- The
push(item)
method adds an item to the top of the stack object. It takesitem
as a parameter, and returns nothing. - The
pop()
method removes an item from the top of the stack, and returns that item as a result. - The
peek()
method returns the top item from the top of the stack, but doesn't remove it. - The
is_empty()
method returns True if the stack has no items in it, False otherwise. - The
size()
method returns the number of items currently in the stack.
4.1.3. Implementing the Stack ADT in Python
The Stack
class in Python
Based on the specification above, write the Stack
class in a file called atds.py
. Test your class by writing a program stacktester.py
that imports the Stack
class, then demonstrates the creation and manipulation of stacks.
When you think the atds.py
file with your Stack
class is ready to be evaluated by the instructor's tester, upload it to the server.
For further information on this assignment, see Activity: The Stack Class [PDF].
Once you've implemented the Stack
class in Python, it can be applied to a variety of situations.
4.1.3.1. Checking parentheses
Using parentheses to group mathematical operations into the correct order of calculation might look something like this:
(( a + 3 ) / (( c + 7 / (g + h))) / (q + 1)
As part of evaluating that expression, it will be important to know that the parentheses are balanced: that for every opening parenthesis there is a closing one, and that a parenthesis that closes match with the nearest opening one.
It is easy for us to create legal and illegal sets of parentheses, and people are pretty good at evaluating whether a set of parentheses is "well-formed" by inspection:
( ( ( ) ) ) legal () () () legal (( )(( )(( )))) legal )()()( illegal ()()( illegal ((()() illegal
How can we get the computer to inspect them and determine whether they form a legal set of parentheses or not?
Problem: Matching parentheses
Write a program called paren_checker
that imports the atds
module, creates a Stack()
object, and has a boolean function is_valid(expr)
that identifies whether a string of parentheses is legal or not.
Once you've got your paren_checker
working, create a new version of the program called bracket_checker
which checks expressions that include square brackets and curly braces as well.
4.2. Overview - The Queue
Class
Once you've got the "Last In, First Out" (LIFO) mechanisms of a stack figured out, the queue is a simple variation on that. A queue is a "First In, First Out (FIFO) structure that is familiar to anyone who has ever waited in a line.
Someone shows up first, other people show up later and get in line behind the earlier people. First in, first out.
4.2.1. The Queue as Abstract Data Type
A queue is another linear data type where order matters. As mentioned above, in a queue structure, items that are added to the sequence first are processed before those that come later. We say that items are processed at the "front" of the queue, and new items, as they are added, are placed at the "back" of the queue.
Methods for a Queue
class
Based on what we've already done for the Stack
class, what methods do we need for an abstract data type Queue
?
- The
Queue()
constructor creates a new, empty, queue. enqueue(item)
adds an item to the rear of the queue.dequeue()
removes the item from the front of the queue and returns it.peek()
identifies the item at the front of the queue but doesn't return it. (Note: this method isn't included in our book.)is_empty()
returnsTrue
if the list is empty.size()
returns the number of items in the queue.
4.2.2. Implementing the Queue ADT in Python
As we consider the Queue
class and implementing it in Python, it probably makes sense to use a list
for our queue. But where should we place the "front" of the queue in our list: at index 0, or at index [-1] (the last item in the list)?
Let's say that I place the following items into my queue, in this order: "item" , 3 , 2.78 , "Richard", True.
- If we use the
.append()
method toenqueue
items, we'll be adding them to the right side of the list. The left side of the list will be the head then, and we'lldequeue
items by popping them from index 0 with.pop(0)
.
List q: [ "item" , 3 , 2.78 , "Richard", True ] Index: 0 1 2 3 4 "Head" "Tail"
enqueue
items by using the .insert(0, item)
method (an expensive operation), and I'll dequeue
items by popping them from the end of the list with .pop()
(which pops the last item on a list, by default).List q: [ True , "Richard" , 2.78 , 3, "item" ] Index: 0 1 2 3 4 "Tail" "Head"
You can get either of these implementations to work Python, but the first one, because of the way that Python works with lists, turns out to be quite a bit more efficient as your Queue increases in size.
You can confirm this by writing two implementations of the Queue and testing them!
The Queue
class in Python
Based on the specification above, add the Queue
class to your atds.py
module. Test your class by writing a program queue_tester.py
that creates queues and manipulates them.
When you think your Queue
class is ready to be evaluated by the instructor's tester, upload this new atds.py
file to the server to replace the one that is already there.
For further information on this assignment, see Activity: The Queue Class [PDF].
4.3. Overview - The Deque
Class
The Deque
class is a "double-ended Queue," and not to be confused with the word "dequeue." It has the best of both worlds. You can:
- add to the front:
.add_front(item)
- add to the rear:
.add_rear(item)
- remove from the front:
.remove_front()
- remove from the rear:
.remove_rear()
Which side should we make the front? Index 0? or Index -1? Does it matter?
4.3.1. Implementing the Deque
class
The Deque
class in Python
Add a Python implementation of the Deque
class into your atds.py
module. There is no peek method for this class, but be sure to include .is_empty()
and .size()
methods.
4.3.2. Testing your Deque
class
Get a copy of the atds_tester.py and run it against your atds.py
to make sure your classes are working as they should be.
4.4. Overview - Linked Lists
You're already familiar with Python's list
data structure. It's a well-established implementation based on an Abstract Data Type called an unordered list. In this section we're going to be considering the details of this Abstract Data Type, and then implementing this ADT using a new class we'll be writing: the Node
class.
Attention!
The material that we'll be developing in this next sections will be critical to what we do later on in the course, so make sure you get a good grasp of what we're doing as we go.
There's a place for "fake it till you make it," but this is not it. Make sure you understand these concepts well before moving on.
4.4.1. The Unordered List ADT
Imagine that you have a list of items. Each item on your list comes before and/or after the other items on the list; it's a sequence in that sense. It's "unordered," however, because the values themselves are not in order: not in numeric order, or alphabetic order.
If you want to be able to interact with that list, and do things with the elements of that list, you'll need a number of operations:
UnorderedList()
will allow us to create a new, empty list.add(item)
will allow us to add an item to the beginning of the list. (For the moment, we're going to assume it's a unique item, and that it's not already present on the list.)remove(item)
removes the specified item from the list. This assumes that the item is already in the list.search(item)
looks for the specified item in the list, and returnsTrue
it if is there somewhere.is_empty()
returnsTrue
if the list is empty.length()
returns the number of items in the list.append(item)
adds the item to the end of the list.index(item)
returns the position of the item in the list (and assumes it is in the list)insert(pos, item)
adds a new item to the list at the specified position. This assumes that the item is not already on the list and that the position exists.pop()
removes the last item from the list and returns it.pop(pos)
removes the item at the specified position and returns it.
4.4.2. What does this unordered list look like?
The unordered list is abstract, and so it doesn't need to look like anything, but if we were to draw it, we could consider it as a series of data values connected by pointers linking them. The head
of the list is indicated by a special pointer, and from there, each element in the list points to the next element in the list, until we finally come to the last element in the list.
4.4.3. The Node
class
In order to track these different elements and their pointers, we're going to create a new class, the Node
class.
A Node
will contain two pieces of information—two instance variables:
-
data
= the information contained in the node -
next
= the pointer to the nextNode
in the unordered list
This class will also require that we write four methods that will allow us to interact with those variables: two "getters" and two "setters".
.get_data()
= returns the data of this node.get_next()
= the pointer to the next node in the unordered list.set_data()
= allows for altering the data at thisNode
.set_next()
= allows for resetting the pointer from thisNode
to a differentNode
4.4.4. Writing the Node
class
Based on what we've identified, writing the Node
class should be relatively straightforward.
Note that in this class we'll construct a new Node
with an initial piece of data, but that there is no initial next
that we point to. Make sure to set self.next
to None
explicitly.
Write the Node
class
Based on the description above, write the Node
class.
Test your class, either interactively, importing it into a small test program, or by writing a main
function written into that same program.
4.4.5. Writing the UnorderedList
class
Our List is going to consist of a series of Node
objects all linked to each other via the next
pointer. The only thing we need is something to point to the very first item—the first Node
—and that will be handled by a reference in the UnorderedList
class. Because this reference is pointing to the head of the list, we'll call it head
.
Here is the start to our class:
from atds import Node class UnorderedList(): """Maintains an unordered list via a linked series of Nodes """ def __init__(self): self.head = None
When we create a new UnorderedList
object, we can visualize that initial empty list object in this way:
We can identify that there are no items in this new list because the head
has a value of None
.
If we want to add an item to the list, the new sketch of what that looks like is given here:
We can see that the head
reference no longer has a value of None
. It has a reference to a Node
object, which has data, and a pointer to None
, which is an indication that this node is the last in the list.
Let's see what the add(item)
method for the UnorderedList
class looks like.
def add(self, new_data): """Creates a new Node based on the data, and adds it to the beginning of the list. """ temp_node = Node(new_data) temp_node.set_next(self.head) self.head = temp_node
Sketch of an UnorderedList
Draw a sketch of what an unordered list called my_data
would look like after the following code is executed:
my_data = UnorderedList() my_data.add(5) my_data.add(2) my_data.add(17)
Because of the order that we added the items into the list, and the fact that the .add()
method inserts items at the head of the list, you can see that the first item added is last on the list.
4.4.6. Traversing the List
Many of the methods in the UnorderedList
class are going to require working through the entire collection of nodes, following from one object to the next, as indicated by the next
references.
Let's see how you do that as we write the length
method.
def length(self): """Traverses the entire length of the UnorderedList to identify how many values (nodes) there are in the list. """ node_count = 0 # local variable for counting current = self.head # start at the beginning while current != None: # if we're not at the end of the list current = current.get_next() # move to the next node node_count += 1 # increment node counter return node_count
Now that we know a little about traversing...
UnorderedList
: the search()
method
Write the boolean search
method, which takes an data
value and looks through the list to identify if it's there. If the item exists on the list, return True
.
def search(self, data): """Traverses through the list to determine if the indicated data is present. """ current = self.head found = False # Assumed at the start while current != None and not found: if current.get_data() == data: found = True else: current = current.get_next() return found
4.4.7. UnorderedList Debugging Tool
Insert this method into your UnorderedList
class so that you can print a representation of your list at any time. It will help in trying to identify the current state of your list.
def __repr__(self): """Creates a representation of the list suitable for printing, debugging. """ result = "UnorderedList[" next_node = self.head while next_node != None: result += str(next_node.get_data()) + "," next_node = next_node.get_next() if result[-1] == ",": result = result[:-1] # remove trailing comma result = result + "]" return result
4.4.8. The remove()
method
Perhaps the trickiest method to write (at this point in our course) is the remove(data)
method. We need to be able to traverse the linked list, identify the node that has the given data in it, and then take it out of the list.
But how do we "take it out of the list?"
We need to take the pointer from the previous Node—the one that points to our current node—and have that point to the node just after our current node, effectively removing our current node from the list.
So we'll need to be sure that we keep track of that previous node with a variable for that purpose.
Take a look at this code to see how it all works:
def remove(self,data): """This tricky method uses a pair of variables to monitor the traversal so that we can remove the desired item from the list by taking the pointer from the previous Node and setting it to the item AFTER the one that we're removing. """ previous = None # first item in list has no previous current = self.head # start at the beginning while current != None: if current.get_data() == data: # we found the item to remove! if previous == None: # if it's the first item on the list... self.head = current.get_next() # just reset head to next node else: previous.set_next(current.get_next()) # reset previous next to this node's next return # we're done else: previous = current # if we haven't found it yet, set previous current = current.get_next() # to current and follow to the next
4.4.9. Other UnorderedList
methods
What else needs to be written?
is_empty()
returnsTrue
if the list is empty.append(item)
adds the item to the end of the list.insert(pos, item)
adds a new item to the list at the specified position. This assumes that the item is not already on the list and that the position exists.index(item)
returns the position of the item in the list (and assumes it is in the list)pop()
removes the last item from the list and returns it.pop(pos)
removes the item at the specified position and returns it.
You'll have a much better understanding of linked lists once you've worked through the details of each of these methods.
4.4.10. Additional Exercises
There are a number of modifications that would be appropriate to implement, or interesting to experiment with. Pick two of the following and modify your UnorderedList
class as needed.
- Alter the
remove
method so that it works even if the item isn't on the list. - Modify the
UnorderedList
methods so that duplicates are allowed. Which methods must be modified? - Implement a
Stack
using anUnorderededList
. - Implement a
Queue
using anUnorderededList
. - Design an experiment that compares the performance of a standard Python
list
object with a list created using yourUnorderedList
.
4.5.0. The Ordered List
Let's consider now an ordered list constructed of nodes. The ordered list is similar to an Unordered list, but all of the data values are in order.
Take a look at this description of the constructor and methods of the OrderedList
class.
The OrderedList
class
OrderedList()
creates a new ordered list that is empty. It needs no parameters and returns an empty list.add(item)
adds a new item to the list making sure that the order is preserved. It needs the item and returns nothing. Assume the item is not already in the list.remove(item)
removes the item from the list. It needs the item and modifies the list. Assume the item is present in the list.search(item)
searches for the item in the list. It needs the item and returns a boolean value.is_empty()
tests to see whether the list is empty. It needs no parameters and returns a boolean value.size()
returns the number of items in the list. It needs no parameters and returns an integer.index(item)
returns the position of item in the list. It needs the item and returns the index. Assume the item is in the list.pop()
removes and returns the last item in the list. It needs nothing and returns an item. Assume the list has at least one item.pop(pos)
removes and returns the item at position pos. It needs the position and returns the item. Assume the item is in the list.
Most of the methods that we've already written for the UnorderedList
class will work just fine here.
Which methods will need to be rewritten? How can we build an OrderedList
as items are added to it?