Unit 3 - Data Structures: Strings and Lists
Table of Contents
- Strings; Slicing
- String Methods
- String Formatting
- Lists; Looping through a list
- Applications using Lists
- Nested loops
3.0. Strings and Slicing
3.0.0. Overview
We've spent some time looking at numbers in Python— int and float values. Now we can turn out attention to looking at two different types of data collections: strings (str) and lists (list).
In both cases, individual elements of the structure can be referred to by their index, or position, starting at position 0.
Let's get started!
3.1. Basic Strings and Slicing
The str data type
The str (string) data type is used to work with alphanumeric text.
Strings are simply a sequence of characters, and we can easily refer to them by their index, or position in the string.
Slicing
Accessing a string via its indices is called slicing. The indices are referred to by enclosing them in square brackets [ ].
Note that the indices for the characters in a string start at 0, not 1. Thus, if password = "secret", password[1] is "e", not "s".
You can slice a range of characters from a string by indicating the first position (inclusive) and the last position (exclusive) of the characters that you want to use. Thus, if password = "secret", password[4:6] is "wo", the characters from position 4 to position 5 (inclusive).
If you want to slice all the way to the end of a string you can leave off the last digit: password[3:] is "ret".
You can also leave off the first digit if you want to start at the beginning of the string: password[:4] is "secr", the characters at position 0-3 (inclusive).
Try these
Predict what will be produced by each of the following statements.
3.1.1. Additional string operations
In addition to slicing, there are a number of string operations that may be useful:
More string functions
Predict what will be produced by each of the following statements.
Of these operations, being able to determine the length of a string and being able to concatenate two strings are definitely among the most valuable.
Try this: Odd numbers
Write a short program that prints out just the odd numbers from 1 to 100: 1, 3, 5, 7, 9, ..., 99
Try this: Odd letters
Write a short program that prints out just the odd-numbered letters in "Polytechnic School": o, y, e, ....
3.1.2. ASCII, and converting strings to numbers
The American Standard Code for Information Interchange (ASCII) represents all alphanumeric characters has coded number, which is how every character that we type, or view on the screen, or keep in a file, gets represented in the memory of the computer. It's occasionally convenient for us to be able to convert from a character to its ASCII code, and back again.
String conversion functions
- The ord function returns the ASCII value associated with a character.
Example: ord(" ") returns the integer 32, the ASCII code for a space. - The chr function returns the character associated with a value.
Example: chr(65), returns the "A" character. - The float function returns the value of a number enclosed in quotations.
Example: float("42.3"), returns the floating point number 42.3.
Let's see what some of those ASCII codes look like.
Write a loop...
In the Python interpreter, write a short loop that prints out the characters associated with the numbers 32 through 126.
It's not much of a secret
Write a short program that has the user enter a string, and then prints out the ASCII code value for each character in the string.
We can use ord, chr, and float to do various conversions:
Predict what will be produced by each of the following statements.
3.2. String Methods
3.2.0. Overview
Objects like strings (value of the str type) in Python come with a set of built-in functions called methods. We can use string methods to interact with strings.
3.2.1. Methods
In a couple of weeks we're going to introduce the idea of an object. In Python, just about everything is an "object," and objects may be grouped into classes. The objects "Richard", "Fletch", and "Ms. Bush" are all objects of the class str (strings). The function main() is an object of the class function. The values 3, 17, and -204 are all objects of the class int (integers).
Later on we'll go over this in more detail, and learn how we can even design our own classes so that we can create our own objects. For now, though, it's important to realize that most of these objects have methods that allow you to do stuff with the object.
Because we're looking at strings right now, let's take a look at some string methods that you can use to manipulate the string in question.
How to use a Method
You use a method by indicating the object being used, a period (.), and the method name followed by a pair of parentheses. Depending on the method you're using, there may be arguments that you'll need to include in the parentheses.
Example: "Mississippi".count("s") uses the string method "count" to count the number of "s"s in the string "Mississippi" and gives us a value of 4.
Here are some other string methods that you may find useful at some point.
Predict what will be produced by each of the following statements, and try them out in the Python interpreter to get some familiarity with them.
3.2.2. Practical Use of String Methods
Let's take a look at one quick use of one of the string methods that we might be able to use right away.
In writing programs, you'll have occasion to ask the user to respond to a "yes or no" question. What's one way that we can check their response?
This will work, perhaps, but what if they just enter a "Y" instead of the full word "Yes"? You might suggest that they follow the directions more carefully, but in the interests of keeping our program running more cleanly, perhaps we could do this:
But what if they type "Yes" now? "Yes" is not equal to "Y". But maybe this will work...
That looks at the first character of the string and checks to see if it's a "Y". But what if they enter a lower-case "y"?
This is the answer we want. It'll accept a broad range of responses without failing—"Yes","YES","Y","yes","y", and even "yeah"—and give our program the best chance of continuing uninterrupted.
One other common strategy is to give the user a default value that will be automatically selected if s/he just hits the <Enter> key. (This default value in a text interface is usually indicted by capitalizing it.) When a string is expected and the <Enter> key is pressed, the null string ('' or "") is stored in the variable. We can use check to see if the default value was entered by looking for the null string or if the first character is a y.
Note that this statement works as intended, although a slight variation on it will not. Take a look at this version of the same condition:
It turns out that this won't work. If the user hits the [Enter] key without entering anything, keep_going has a value of "". As Python works its way across that expression, evaluating it, it first tries to examine the 0th character of that string, it can't find one, and the program fails.
If we check for the null character "" first, however, Python evaluates the boolean expression in order from left to right. If the left boolean check, keep_going == "" is True, Python knows it doesn't need to evaluate the right part of an or expression: once one of the expressions is True, the whole things is True, and Python moves on without bothering to run the rest of the statement.
Clean up your language, part 1
Write an automated censor program that reads in a line of text from the user and prints a new line where all of the "four-letter" words have been replaced by "****".) Select 2-3 bad words that you want to identify and censor.
Sample interaction:
Enter a sentence, and keep it clean! This poopy computer is a darn pain! You wrote: This poopy computer is a darn pain! Censored : This ****y computer is a **** pain! See, I fixed it for you!
3.3. String Formatting
3.3.0. Overview
Sometimes, you want to get your output printed out just right, with spacing and alignment that you've specified. Let's see a few examples of how you can do that.
3.3.1. Formatting Output
Up to this point, we haven't worried very much about the format of our output—simple print statements have worked just fine.
If you do need to make your output look nice, or print data in columns, you can use Python3's formatting method, .format.
Formatting in Python2 vs Python3
While Python2 and Python3 are very similar in most ways, there are some significant differences. One of the areas of Python that has changed is the way that output may be formatted.
We don't do too much formatting of output in here, but if you find you need to and your look for formatting examples online, be sure that the examples you look at are for Python3 and not Python2.
Formatting Output
The basic way to format output in Python3 is as follows:
Thus:
Or:
Note that there are a much wider variety of format possibilities available than what is listed here. For example, this snippet produces a very nicely formatted table:
print("{0:>8s} | {1:>8s} ".format("Number", "Square")) for i in range(1,11): print("{0:8d} | {1:8d} ".format(i,i*i)) Number | Square 1 | 1 2 | 4 3 | 9 4 | 16 5 | 25 6 | 36 7 | 49 8 | 64 9 | 81 10 | 100
See the Python3 documentation for further information on how to use print formatting.
3.3.2. Program Design Exercise
Multiple solutions
A user comes to you wanting a program that takes three numbers—a, b, and c— and prints the one that has the largest value.
Write the program.
3.2.2.0. "Compare each pair" solution
Write a series of 3 if statements: one to identify if a is the largest, one to identify if b is the largest, and one for c.
Here's a "brute force" solution that certainly finds the right answer.
Advantages: Easy to understand this program.
Disadvantages: What happens if we have to find the largest value of 100 numbers??!
3.3.2.1. Multibranch "decision tree" solution
Use if-else statements nested inside of an if-else statement.
Advantages: It works.
Disadvantages: Harder to understand, and doesn't allow for easy expansion. What if we have to find the largest value of 100 numbers??!
3.3.2.2. "Sequential comparisons" solution
Use a variable called max to store the largest variable.
Advantages: Easy to understand, expandable
Disadvantage: Requires an extra variable
3.3.2.3. Python max solution
Advantages: Built-in function comes pre-defined in Python.
Disadvantages: Most languages don't have this function pre-defined, so you still need to know how to write your own!
3.4. Lists, and Looping through lists
3.4.0. Overview
One of the most powerful data structures in Python is the list, a sequential collection of items.
3.4.1. Lists
The list (called an array in most other programming languages) is an incredibly useful and powerful data type. It allows us to refer to a collection of data using a single variable.
3.4.2. Intro to lists
The list data structure
A list is a sequence of data values. Lists may be made up of any type of value: strings, ints, floats, even other lists. Lists are indicated with square brackets enclosing the items in the list, which are separated by commas.
Individual items in a list can be referenced by "slicing" (just as we did with strings) using their index or a range of indexes.
Lists are perhaps the single most powerful data structure in Python. We'll be using them a lot!
Try these
What do you think the output will be for these statements?
3.4.3. Using lists
3.4.3.0. Iterating through a list
How are lists so powerful? Just as we've used a for loop to run through a range of numbers, we can easily set up a loop to run through a series of items in a list.
Two ways of going through a list
We actually have two different ways of going through a list. Which way you'll choose to write your list iteration depends on what you need to do.
Looping through a list with an index variable
You could run through the items in the list this way:
Here, the index i changes as we go through the list, so each time we refer to shopping_list[i], we get a new value. This is the loop to use if you want to remember the location(s) of specific value(s) in your list as part of the program you're writing.
Looping through the list with an iterator
You can also go through all the items in the list this way, using iteration:
Just as in a for loop with numbers, this loop will repeat: the first time through, item will represent the first piece of data in the list shopping_list ("apples"), the second time through it will be "oranges", and the third time through, "bananas".
The advantage to this loop is that you don't need an index variable like i to refer to each item in the list. The disadvantage is that this loop runs through each item once, from beginning to end. If you need more flexibility in your program, you'll need to use the index-strategy mentioned above to go through the loop.
3.4.3.1. Some list operations
Useful list operations
Some of the more useful methods and operations that can be performed with lists include:
- languages = ['French','Hindi','Mandarin','Spanish'] - initializes a list
- listA[n] - gets the item located at position n
- listA[i:j] - slices a subset from the larger list, starting at index i and ending at j-1
- len(listA) - determines the length of a list
- shopping_list = [] - initializes an empty list
- shopping_list.append("ice cream") - adds an item to the end of a list
- for item in listA: - iterates through the list, with item taking on successive values of elements in listA
Problem: Making a shopping list
Write a small program that initializes an empty shopping list, and then uses a while loop to have the user enter a series of items that are added to the shopping list. When the user enters "", the program uses a second loop—a for loop this time—to print out all the items on the shopping list.
3.4.3.2. Finding the largest value
A lesson or two back, we looked at some ways to try to identify the maximum of three values that had been entered. Let's see if we can use a list to do a better job with that problem.
Maximum of 7 values in a list
Write a program that puts the numbers 2, 6, 4, 12, 90, 15, 13 into a list, and then write a loop that goes through the list one by one to find the largest number in the list.
Here are a couple of ways to do that. This first way goes through the list using an index variable.
This second way iterates through the list without an index variable:
3.4.3.3. Some additional list methods, operations, and strategies
More list methods, operations, strategies
- print(mylist[:])
- print(mylist[::-1])
- if (item in listA): - boolean expression, checks to see if an item is in the list
- listA + list_b - concatenates two lists into one
- listA * <integer> - repeats a list a number of times
- listA.append(x) - add an element to end of listA
- listA.sort() - sort listA
- listA.reverse() - reverse listA
- listA.index(x) - find the first location of x in listA
- listA.insert(i,x) - insert x into listA at index i
- listA.count(x) - count how many occurances of x in listA
- listA.remove(x) - remove the first occurance of x from listA
- listA.pop(i) - returns element i from listA, and removes it from list. .pop() acts on the last item of the list.
3.5. Applications using lists
3.5.0. Overview
Now that we know a little bit about how to interact with lists, let's figure out some of the ways they can actually be used.
Sometimes it's useful to be able to make a short list of items and then use that list of items to help solve a larger problem.
In this next problem, we're going to use a list of the vowels (not including 'y'): vowels = ['a','e','i','o','u']. We're also going to use a powerful boolean operator in to identify when a letter is in that list:
Finding vowels
Write a program that asks the user to enter a word, and then tells them how many vowels there are in the word. The program should use a list called vowels which stores the 5 vowels in it. When the program runs through the word entered by the user, it will check each character to see if it's in the list of vowels, and increment a counter when one is found. After going through the word, print out the number of vowels found.
Here's one solution:
Can you see how to improve the program to look for the occurrence of y's, but perhaps only count them if no other vowels have been found?
3.5.1. Finding Divisible Numbers
Up to this point we've been using relatively simple examples of lists. Let's examine some more serious applications of lists.
The % (mod) operator calculates the remainder from an integer division.
The mod operator
The mod operator, %, returns the remainder from a whole-number division operation.
Example:
3 % 2 returns 1
4 % 2 returns 0
5 % 3 returns 2
10.0 % 2.5 returns 0
One obvious application for this operation is to determine whether one number is easily divisible by another.
Even or odd?
Write an if statement to determine whether the integer stored in the variable n is even or odd.
Expanding on this idea:
Evenly divisible?
Write an if statement to determine whether the integer stored in the variable n is evenly divisible by the integer stored in the variable x.
3.5.2. Identifying if a number is prime
Based on what we know, write the following program.
Identifying a prime
Write a program that asks the user to enter an integer n and determines whether or not the number entered is prime. Recall that a prime number is an integer greater than 1 that is evenly divisible only by itself and 1.
Your program should use a loop that goes through every number between 2 and n to find out if n is prime.
Now, let's think about expanding on that idea.
A list of primes
Write a program that creates a table of the first thousand primes. Recall that a prime number is an integer greater than 1 that is evenly divisible only by itself and 1.
Your prime numbers should be stored in a list called "primes," and your strategy should consist of:
- identifying the number you are about to check for "primeness"
- going through the list of previous primes and seeing if any of them divide into the number evenly. If a prime number does divide evenly into the number we're checking, it's not prime and we can stop looking.
- if none of our prime numbers divide into the number we're checking, then this number must be prime, so add it to the list of prime numbers.
- loop back up to get the next number to check for primeness
Here's one solution to this problem.
3.5.3. Lists and Strings
You may find it useful to convert from strings to lists, or vice versa.
Converting between strings and lists
- join takes the elements from a list and assembles them into a string, with each element separated by the indicated separator.
my_friends = ["Kathy","Dana","Gary"] print(','.join(my_friends)) Kathy,Gary,DanaTo just convert the list to a string without any separators, use ''.join(myList) - split takes a string and splits it up into a list, with the specified separator (a space in the example below) used to separate elements of the list.
words = "I love you dearly!".split(" ") print(words) ['I', 'love', 'you', 'dearly!'] for word in words: print(word) I love you dearly!
3.6. Nested loops → Loop inside a loop
Just as we can nest if-else statements inside other if-else statements to work on multiple levels of a problem, we can nest loops inside other loops.
Pseudocode example: Going through a document word by word
Pseudocode example: finding the total assets of a bank
3.6.1. Nested counting loops
Here's a more practical example of a nested loop. This one demonstrates an odometer effect.
odometer.py
Enter this program and run it to see what effect is produced.
3.6.2. More nested loops
Let's try writing another nested loop and using it to print a two-dimensional object on the screen.
The draw_boxes program
Write a program called draw_boxes that asks the user to enter a number. The program then uses that number as a parameter in a function called boxy which prints out a large box composed of n-by-n "square-bracket boxes" printed on the screen.
If the user enters 3, for example, the program will need to call the function Boxy(3), which will then produce the output:
Just as above, we need to print a box composed of rows of square brackets, with each row consisting of columns of square brackets.
Once you've master the basic boxy function, try this one:
The draw_rectangles program
Modify the previous program so that it draws rectangles. The user enters a width and a height in the main program, and a modified version of your boxy function—call it rexy()—takes that information as parameters and uses it to print an appropriate figure composed of "[]" as before.
3.6.3. Nested loops for traversing a grid
One very common type of loop pattern involves using two loops to work through a 2-dimensional grid or table.
Generally:
There are a lot of uses for such a nested loop. Here's a practical one:
What does this nested loop do?
This program, after printing out a couple of header lines, has a row loop that runs from 0 to 9. Inside that loop is a second col loop that takes on the values 0 to 9. The print statement uses formatting to print out the product row * col in a space that's 3 characters wide. The effect:
It's a multiplication table!
3.6.4. Nested loops for creating a grid
In the previous multiplication table example we didn't actually have a "grid" that we went through—we just printed out some numbers in table form.
Let's actually create a table of that information, using a "list of lists."
In this important data-storage strategy, we'll have a table list that keeps track of each row in the table:
These are presented here as rows going down the screen, but that's just a convenient way of writing it to help us visualize what's happening.
So, I can put any kind of data that I want into each of the rows in that list, and what I'm going to do is put in another list for each of those row elements. After doing that, it looks like this:
We can access any of those pieces of data—any col, in any row of the table—in this way:
And how would we access every element in the table, one at a time?
times_table.py
Write a program that creates a table (a "list of lists") and stores the products for row × col in that table.
Then write a pair of nested loops that demonstrate the values stored in the rows and columns.
This code listing demonstrates how we can build that table based on a specified number of ROWS and COLS.
#!/usr/bin/env python3 """ multiplication_table.py This program creates a multiplication table as a "list of lists." """ __author__ = "Richard White" __version__ = "2022-09-06" def main(): grid = [] ROWS = 11 COLS = 13 # Create the table for row in range(ROWS): grid.append([]) # Put a row in the table for col in range(COLS): grid[row].append(row * col) # For this row, append a new col # Print the table for row in range(ROWS): # For every row... for col in range(COLS): # ... and for every col in that row... # print out the entry, with formatting # so that they all line up nicely print("{0:4d}".format(grid[row][col]), end='') print() # at end of row, move to next line if __name__ == "__main__": main()