Intro to Computer Science

Unit 3 - Data Structures: Strings and Lists

Table of Contents

  1. Strings; Slicing
  2. String Methods
  3. String Formatting
  4. Lists; Looping through a list
  5. Applications using Lists
  6. Nested loops

3.0. Strings and Slicing

3.0.0. Overview

We've spent some time looking at numbers in Python— int and float values. Now we can turn out attention to looking at two different types of data collections: strings (str) and lists (list).

In both cases, individual elements of the structure can be referred to by their index, or position, starting at position 0.

Let's get started!

3.1. Basic Strings and Slicing

The str data type

The str (string) data type is used to work with alphanumeric text.

>>> print("Hello, world!")
>>> username = input("Please enter your name: ")

Strings are simply a sequence of characters, and we can easily refer to them by their index, or position in the string.

Slicing

Accessing a string via its indices is called slicing. The indices are referred to by enclosing them in square brackets [ ].

Note that the indices for the characters in a string start at 0, not 1. Thus, if password = "secret", password[1] is "e", not "s".

You can slice a range of characters from a string by indicating the first position (inclusive) and the last position (exclusive) of the characters that you want to use. Thus, if password = "secret", password[4:6] is "wo", the characters from position 4 to position 5 (inclusive).

If you want to slice all the way to the end of a string you can leave off the last digit: password[3:] is "ret".

You can also leave off the first digit if you want to start at the beginning of the string: password[:4] is "secr", the characters at position 0-3 (inclusive).

Try these

Predict what will be produced by each of the following statements.

>>> greet = "Hello, world!"
>>> print(greet[1]) # careful!
>>> print(greet[0], greet[12])
>>> print(greet[7:12])
>>> print(greet[-1])
>>> print(greet[0:4])
>>> print(greet[7:])
>>> print(greet[:5])
>>> print(greet[1:-1])

3.1.1. Additional string operations

In addition to slicing, there are a number of string operations that may be useful:

More string functions

Predict what will be produced by each of the following statements.

>>> a_string = “Hello”; b_string = “world”
>>> print(a_string + b_string) # concatenation
>>> print(1 + b_string) # careful!
>>> print(3 * b_string) # repetition
>>> print(len(b_string))
>>> for i in range(len(b_string)): ... print(i, b_string[i])
>>> for i in range(len(b_string),0,-1): ... print(i, b_string[i]) # This won't work--why?
>>> for i in range(len(b_string),0,-1): ... print(i, b_string[i-1]) # Ah! That's better!
>>> print(b_string[::-1]) # Tricky!

Of these operations, being able to determine the length of a string and being able to concatenate two strings are definitely among the most valuable.

Try this: Odd numbers

Write a short program that prints out just the odd numbers from 1 to 100: 1, 3, 5, 7, 9, ..., 99

Try this: Odd letters

Write a short program that prints out just the odd-numbered letters in "Polytechnic School": o, y, e, ....

3.1.2. ASCII, and converting strings to numbers

The American Standard Code for Information Interchange (ASCII) represents all alphanumeric characters has coded number, which is how every character that we type, or view on the screen, or keep in a file, gets represented in the memory of the computer. It's occasionally convenient for us to be able to convert from a character to its ASCII code, and back again.

String conversion functions

Let's see what some of those ASCII codes look like.

Write a loop...

In the Python interpreter, write a short loop that prints out the characters associated with the numbers 32 through 126.

Show/hide code

for i in range(32,127): print(i,"=",chr(i))

It's not much of a secret

Write a short program that has the user enter a string, and then prints out the ASCII code value for each character in the string.

Show/hide code

plaintext = input("Enter a phrase and I'll encode it for you: ") for i in range(len(plaintext)): print(ord(plaintext[i]), end=' ')

We can use ord, chr, and float to do various conversions:

Predict what will be produced by each of the following statements.

>>> print("1" + "2")
>>> print(float("1" + "2")) # careful!
>>> print(float("1") + float("2"))
>>> print(float(12)) # will this work?
>>> print(ord("b")) # converts to ascii
>>> print(chr(98)) # converts from ascii

3.2. String Methods

3.2.0. Overview

Objects like strings (value of the str type) in Python come with a set of built-in functions called methods. We can use string methods to interact with strings.

3.2.1. Methods

In a couple of weeks we're going to introduce the idea of an object. In Python, just about everything is an "object," and objects may be grouped into classes. The objects "Richard", "Fletch", and "Ms. Bush" are all objects of the class str (strings). The function main() is an object of the class function. The values 3, 17, and -204 are all objects of the class int (integers).

Later on we'll go over this in more detail, and learn how we can even design our own classes so that we can create our own objects. For now, though, it's important to realize that most of these objects have methods that allow you to do stuff with the object.

Because we're looking at strings right now, let's take a look at some string methods that you can use to manipulate the string in question.

How to use a Method

You use a method by indicating the object being used, a period (.), and the method name followed by a pair of parentheses. Depending on the method you're using, there may be arguments that you'll need to include in the parentheses.

Example: "Mississippi".count("s") uses the string method "count" to count the number of "s"s in the string "Mississippi" and gives us a value of 4.

Here are some other string methods that you may find useful at some point.

Predict what will be produced by each of the following statements, and try them out in the Python interpreter to get some familiarity with them.

>>> name = "jules verne"
>>> print(name.capitalize())
>>> print(name.title())
>>> print("DON'T SHOUT!".lower())
>>> print(name.upper())
>>> print(name.replace("jules","phileas"))
>>> print(name.split()) # splits into a list # default separator is a space " "
>>> print("Wow. That's cool. Right?".split("."))
>>> print(name.find("verne"))
>>> print(name.count("e"))
>>> print(" ".join(["i","love","you"])) # Joins a list # of items into a string # separated by " "

3.2.2. Practical Use of String Methods

Let's take a look at one quick use of one of the string methods that we might be able to use right away.

In writing programs, you'll have occasion to ask the user to respond to a "yes or no" question. What's one way that we can check their response?

keep_going = input("Would you like to play again? (Yes/No)") if keep_going == "Yes": ...

This will work, perhaps, but what if they just enter a "Y" instead of the full word "Yes"? You might suggest that they follow the directions more carefully, but in the interests of keeping our program running more cleanly, perhaps we could do this:

keep_going = input("Would you like to play again? (Y/N)") if keep_going == "Y": ...

But what if they type "Yes" now? "Yes" is not equal to "Y". But maybe this will work...

keep_going = input("Would you like to play again? (Y/N)") if keep_going[0] == "Y": ...

That looks at the first character of the string and checks to see if it's a "Y". But what if they enter a lower-case "y"?

keep_going = input("Would you like to play again? (Y/N)") if keep_going[0].lower() == "y": ...

This is the answer we want. It'll accept a broad range of responses without failing—"Yes","YES","Y","yes","y", and even "yeah"—and give our program the best chance of continuing uninterrupted.

One other common strategy is to give the user a default value that will be automatically selected if s/he just hits the <Enter> key. (This default value in a text interface is usually indicted by capitalizing it.) When a string is expected and the <Enter> key is pressed, the null string ('' or "") is stored in the variable. We can use check to see if the default value was entered by looking for the null string or if the first character is a y.

keep_going = input("Would you like to play again? (Y/n)") # "Y" indicates a default value if keep_going == "" or keep_going[0].lower() == "y": ...

Note that this statement works as intended, although a slight variation on it will not. Take a look at this version of the same condition:

keep_going = input("Would you like to play again? (Y/n)") # "Y" indicates a default value if keep_going[0].lower() == "y" or keep_going == "": ...

It turns out that this won't work. If the user hits the [Enter] key without entering anything, keep_going has a value of "". As Python works its way across that expression, evaluating it, it first tries to examine the 0th character of that string, it can't find one, and the program fails.

If we check for the null character "" first, however, Python evaluates the boolean expression in order from left to right. If the left boolean check, keep_going == "" is True, Python knows it doesn't need to evaluate the right part of an or expression: once one of the expressions is True, the whole things is True, and Python moves on without bothering to run the rest of the statement.

Clean up your language, part 1

Write an automated censor program that reads in a line of text from the user and prints a new line where all of the "four-letter" words have been replaced by "****".) Select 2-3 bad words that you want to identify and censor.

Sample interaction:

Enter a sentence, and keep it clean! This poopy computer is a darn pain!
You wrote: This poopy computer is a darn pain!
Censored : This ****y computer is a **** pain!
See, I fixed it for you!

3.3. String Formatting

3.3.0. Overview

Sometimes, you want to get your output printed out just right, with spacing and alignment that you've specified. Let's see a few examples of how you can do that.

3.3.1. Formatting Output

Up to this point, we haven't worried very much about the format of our output—simple print statements have worked just fine.

If you do need to make your output look nice, or print data in columns, you can use Python3's formatting method, .format.

Formatting in Python2 vs Python3

While Python2 and Python3 are very similar in most ways, there are some significant differences. One of the areas of Python that has changed is the way that output may be formatted.

We don't do too much formatting of output in here, but if you find you need to and your look for formatting examples online, be sure that the examples you look at are for Python3 and not Python2.

Formatting Output

The basic way to format output in Python3 is as follows:

print("{0} is the {1}".format(value0, value1))

Thus:

>>> print("{0} is the {1}".format("Richard", "best")) Richard is the best

Or:

>>> print("The value of {0} is {1:.4f}".format("pi", 3.14159265358)) The value of pi is 3.1416

Note that there are a much wider variety of format possibilities available than what is listed here. For example, this snippet produces a very nicely formatted table:

print("{0:>8s} | {1:>8s} ".format("Number", "Square"))
for i in range(1,11):
    print("{0:8d} | {1:8d} ".format(i,i*i))

  Number |   Square
       1 |        1
       2 |        4
       3 |        9
       4 |       16
       5 |       25
       6 |       36
       7 |       49
       8 |       64
       9 |       81
      10 |      100

See the Python3 documentation for further information on how to use print formatting.

3.3.2. Program Design Exercise

Multiple solutions

A user comes to you wanting a program that takes three numbers—a, b, and c— and prints the one that has the largest value.

Write the program.

3.2.2.0. "Compare each pair" solution

Write a series of 3 if statements: one to identify if a is the largest, one to identify if b is the largest, and one for c.

Show/hide solution 1

Here's a "brute force" solution that certainly finds the right answer.

if (a > b) and (a > c): print(a) if (b > a) and (b > c): print(b) if (c > a) and (c > b): print(c)

Advantages: Easy to understand this program.

Disadvantages: What happens if we have to find the largest value of 100 numbers??!

3.3.2.1. Multibranch "decision tree" solution

Use if-else statements nested inside of an if-else statement.

Show/hide solution 2

if (a > b): if (a > c): print a else: print c else: if (b > c): print b else: print c

Advantages: It works.

Disadvantages: Harder to understand, and doesn't allow for easy expansion. What if we have to find the largest value of 100 numbers??!

3.3.2.2. "Sequential comparisons" solution

Use a variable called max to store the largest variable.

Show/hide solution 3

max = a if b > max: max = b if c > max: max = c print max

Advantages: Easy to understand, expandable

Disadvantage: Requires an extra variable

3.3.2.3. Python max solution

Show/hide solution 4

print max(a,b,c)

Advantages: Built-in function comes pre-defined in Python.

Disadvantages: Most languages don't have this function pre-defined, so you still need to know how to write your own!

3.4. Lists, and Looping through lists

3.4.0. Overview

One of the most powerful data structures in Python is the list, a sequential collection of items.

3.4.1. Lists

The list (called an array in most other programming languages) is an incredibly useful and powerful data type. It allows us to refer to a collection of data using a single variable.

3.4.2. Intro to lists

The list data structure

A list is a sequence of data values. Lists may be made up of any type of value: strings, ints, floats, even other lists. Lists are indicated with square brackets enclosing the items in the list, which are separated by commas.

my_data = ['Richard','White','626-845-1235',50]
summer_months = ["June","July","August"]
rand_nums = [3, 2.2, 14, -5, 0]
my_friends = ["Kathy","Dana","Gary"]

Individual items in a list can be referenced by "slicing" (just as we did with strings) using their index or a range of indexes.

my_best_friend = my_friends[1]
print(summer_months[-1])
print(my_data[1:3] # prints my_data[1] and my_data[2]

Lists are perhaps the single most powerful data structure in Python. We'll be using them a lot!

Try these

What do you think the output will be for these statements?

mylist = ["A","B","C","D","F"] print(mylist[2]) print(mylist[2:4]) print(len(mylist))

3.4.3. Using lists

3.4.3.0. Iterating through a list

How are lists so powerful? Just as we've used a for loop to run through a range of numbers, we can easily set up a loop to run through a series of items in a list.

Two ways of going through a list

We actually have two different ways of going through a list. Which way you'll choose to write your list iteration depends on what you need to do.

Looping through a list with an index variable

You could run through the items in the list this way:

>>> shopping_list = ["apples", "oranges", "bananas"] >>> for i in range(len(shopping_list)): ... print("I need to buy",shopping_list[i]) ... I need to buy apples I need to buy oranges I need to buy banana >>>

Here, the index i changes as we go through the list, so each time we refer to shopping_list[i], we get a new value. This is the loop to use if you want to remember the location(s) of specific value(s) in your list as part of the program you're writing.

Looping through the list with an iterator

You can also go through all the items in the list this way, using iteration:

>>> for item in shopping_list: ... print("I need to buy",item) ... I need to buy apples I need to buy oranges I need to buy banana >>>

Just as in a for loop with numbers, this loop will repeat: the first time through, item will represent the first piece of data in the list shopping_list ("apples"), the second time through it will be "oranges", and the third time through, "bananas".

The advantage to this loop is that you don't need an index variable like i to refer to each item in the list. The disadvantage is that this loop runs through each item once, from beginning to end. If you need more flexibility in your program, you'll need to use the index-strategy mentioned above to go through the loop.

3.4.3.1. Some list operations

Useful list operations

Some of the more useful methods and operations that can be performed with lists include:

Problem: Making a shopping list

Write a small program that initializes an empty shopping list, and then uses a while loop to have the user enter a series of items that are added to the shopping list. When the user enters "", the program uses a second loop—a for loop this time—to print out all the items on the shopping list.

Show/hide solution

#!/usr/bin/env python3 """ shopping_list.py Has the user enter a series of items for a shopping list until a sentinel value of "" is entered. Then print out the list. """ def main(): print("Enter items and I'll put them into the shopping list!") print("Enter the empty string [Enter] to stop and see your list.") shopping_list = [] item = input("Item: ") while item != "": shopping_list.append(item) item = input("Item: ") print("Thanks. Here is your shopping list:") for i in range(len(shopping_list)): print(item[i]) print("Let's go shopping!") if __name__ == "__main__": main()

3.4.3.2. Finding the largest value

A lesson or two back, we looked at some ways to try to identify the maximum of three values that had been entered. Let's see if we can use a list to do a better job with that problem.

Maximum of 7 values in a list

Write a program that puts the numbers 2, 6, 4, 12, 90, 15, 13 into a list, and then write a loop that goes through the list one by one to find the largest number in the list.

Show/hide solution

Here are a couple of ways to do that. This first way goes through the list using an index variable.

numberList = [2, 6, 4, 12, 90, 15, 13] position_of_max = numberList[0] # First number is the biggest we've seen for i in range(1:len(numberList): if numberList[i] > numberList[position_of_max]: position_of_max = i # And now that we've gone through the whole list... print(numberList[position_of_max])

This second way iterates through the list without an index variable:

numberList = [2, 6, 4, 12, 90, 15, 13] maxValue = numberList[0] # First number is the biggest we've seen for number in numberList: if number > maxValue: maxValue = number And now that we've gone through the whole list... print(maxValue)

3.4.3.3. Some additional list methods, operations, and strategies

More list methods, operations, strategies

3.5. Applications using lists

3.5.0. Overview

Now that we know a little bit about how to interact with lists, let's figure out some of the ways they can actually be used.

Sometimes it's useful to be able to make a short list of items and then use that list of items to help solve a larger problem.

In this next problem, we're going to use a list of the vowels (not including 'y'): vowels = ['a','e','i','o','u']. We're also going to use a powerful boolean operator in to identify when a letter is in that list:

>>> vowels = ['a','e','i','o','u'] >>> my_letter = "j" >>> if my_letter in vowels: ... print("We found a vowel!") ... >>> if "a" in vowels: ... print("We found a vowel!") >>> We found a vowel!

Finding vowels

Write a program that asks the user to enter a word, and then tells them how many vowels there are in the word. The program should use a list called vowels which stores the 5 vowels in it. When the program runs through the word entered by the user, it will check each character to see if it's in the list of vowels, and increment a counter when one is found. After going through the word, print out the number of vowels found.

Show/hide solution

Here's one solution:

userWord = input('Please enter a word: ') vowelCount = 0 vowels = ['a','e','i','o','u'] print("The vowels in your word are: ", end='') for letter in userWord: if letter in vowels: print(letter, " ", end='') vowelCount += 1 print("There were",vowelCount,"vowels.") print("I'm terribly sorry if I missed any 'y's.")

Can you see how to improve the program to look for the occurrence of y's, but perhaps only count them if no other vowels have been found?

Show/hide modification

userWord = input('Please enter a word: ') vowelCount = 0 yCount = 0 print("The vowels in your word are: ", end='') for letter in userWord: if letter in ['a','e','i','o','u']: print(letter, " ", end='') vowelCount += 1 elif letter == 'y': yCount += 1 if vowelCount == 0: if yCount > 0: print(yCount * 'y ') else: print("non-existent.") print("I don't think you entered a real word.")

3.5.1. Finding Divisible Numbers

Up to this point we've been using relatively simple examples of lists. Let's examine some more serious applications of lists.

The % (mod) operator calculates the remainder from an integer division.

The mod operator

The mod operator, %, returns the remainder from a whole-number division operation.

Example:

3 % 2 returns 1
4 % 2 returns 0
5 % 3 returns 2
10.0 % 2.5 returns 0

One obvious application for this operation is to determine whether one number is easily divisible by another.

Even or odd?

Write an if statement to determine whether the integer stored in the variable n is even or odd.

Show/hide modification

if n % 2 == 0: print("n is even") else: print("n is odd")

Expanding on this idea:

Evenly divisible?

Write an if statement to determine whether the integer stored in the variable n is evenly divisible by the integer stored in the variable x.

Show/hide modification

if n % x == 0: print("n is evenly divisible by x") else: print("n is not evenly divisible by x")

3.5.2. Identifying if a number is prime

Based on what we know, write the following program.

Identifying a prime

Write a program that asks the user to enter an integer n and determines whether or not the number entered is prime. Recall that a prime number is an integer greater than 1 that is evenly divisible only by itself and 1.

Your program should use a loop that goes through every number between 2 and n to find out if n is prime.

Now, let's think about expanding on that idea.

A list of primes

Write a program that creates a table of the first thousand primes. Recall that a prime number is an integer greater than 1 that is evenly divisible only by itself and 1.

Your prime numbers should be stored in a list called "primes," and your strategy should consist of:

Your program needs to print out the number of the prime, and next to it the value of that prime. Thus, the first few lines of output will be:

Prime # Prime Value 1 2 2 3 3 5 4 7

Here's one solution to this problem.

3.5.3. Lists and Strings

You may find it useful to convert from strings to lists, or vice versa.

Converting between strings and lists

3.6. Nested loops → Loop inside a loop

Just as we can nest if-else statements inside other if-else statements to work on multiple levels of a problem, we can nest loops inside other loops.

Pseudocode example: Going through a document word by word

letter_count = 0 for (each word in sentence): for (each letter in this word): add one to the letter_count print(letter_count)

Pseudocode example: finding the total assets of a bank

sum = 0 for (each client at bank): for (each account for this client): sum = sum + money in this account print(sum)

3.6.1. Nested counting loops

Here's a more practical example of a nested loop. This one demonstrates an odometer effect.

odometer.py

Enter this program and run it to see what effect is produced.

#!/usr/bin/env python3 """ odometer.py Demonstrates an odomter using nested loops with print formatting and clear screen function. """ import os # needed to clear the screen import time # needed to slow down the counter def main(): for hundreds in range(10): for tens in range(10): for ones in range(10): print("{0:1d}{1:1d}{2:1d}".format(hundreds, tens, ones)) time.sleep(0.1) os.system("clear") # may need to be altered for your system if __name__ == "__main__": main()

3.6.2. More nested loops

Let's try writing another nested loop and using it to print a two-dimensional object on the screen.

The draw_boxes program

Write a program called draw_boxes that asks the user to enter a number. The program then uses that number as a parameter in a function called boxy which prints out a large box composed of n-by-n "square-bracket boxes" printed on the screen.

If the user enters 3, for example, the program will need to call the function Boxy(3), which will then produce the output:

[][][] [][][] [][][]

Show/hide answer

Just as above, we need to print a box composed of rows of square brackets, with each row consisting of columns of square brackets.

def boxy(n): for row in range(n): for col in range(n): print("[]",end = '') print() # go down to the next line at the end of each row

Once you've master the basic boxy function, try this one:

The draw_rectangles program

Modify the previous program so that it draws rectangles. The user enters a width and a height in the main program, and a modified version of your boxy function—call it rexy()—takes that information as parameters and uses it to print an appropriate figure composed of "[]" as before.

3.6.3. Nested loops for traversing a grid

One very common type of loop pattern involves using two loops to work through a 2-dimensional grid or table.

Generally:

for (each row in the table): for (each column in a row): // do something

There are a lot of uses for such a nested loop. Here's a practical one:

What does this nested loop do?

print(" | 0 1 2 3 4 5 6 7 8 9") print("--+------------------------------") for row in range(10): print(row,"|",end='') for col in range(10): print("{0:3d}".format(row * col), end='') print()

Show/hide answer

This program, after printing out a couple of header lines, has a row loop that runs from 0 to 9. Inside that loop is a second col loop that takes on the values 0 to 9. The print statement uses formatting to print out the product row * col in a space that's 3 characters wide. The effect:

| 0 1 2 3 4 5 6 7 8 9 --+------------------------------ 0 | 0 0 0 0 0 0 0 0 0 0 1 | 0 1 2 3 4 5 6 7 8 9 2 | 0 2 4 6 8 10 12 14 16 18 3 | 0 3 6 9 12 15 18 21 24 27 4 | 0 4 8 12 16 20 24 28 32 36 5 | 0 5 10 15 20 25 30 35 40 45 6 | 0 6 12 18 24 30 36 42 48 54 7 | 0 7 14 21 28 35 42 49 56 63 8 | 0 8 16 24 32 40 48 56 64 72 9 | 0 9 18 27 36 45 54 63 72 81

It's a multiplication table!

3.6.4. Nested loops for creating a grid

In the previous multiplication table example we didn't actually have a "grid" that we went through—we just printed out some numbers in table form.

Let's actually create a table of that information, using a "list of lists."

In this important data-storage strategy, we'll have a table list that keeps track of each row in the table:

table = [ row0 , row1 , row2 , row3 , row4 ]

These are presented here as rows going down the screen, but that's just a convenient way of writing it to help us visualize what's happening.

So, I can put any kind of data that I want into each of the rows in that list, and what I'm going to do is put in another list for each of those row elements. After doing that, it looks like this:

table = [ row0 [ col0 , col1, col2, col3] , row1 [ col0 , col1, col2, col3] , row2 [ col0 , col1, col2, col3] , row3 [ col0 , col1, col2, col3] , row4 [ col0 , col1, col2, col3] ]

We can access any of those pieces of data—any col, in any row of the table—in this way:

print(table[row][col]) print(table[2][0])

And how would we access every element in the table, one at a time?

for row in range(len(table)): for col in range(len(table[row])): print(table[row][col],end=' ') print() print("Done")

times_table.py

Write a program that creates a table (a "list of lists") and stores the products for row × col in that table.

Then write a pair of nested loops that demonstrate the values stored in the rows and columns.

Show/hide solution

This code listing demonstrates how we can build that table based on a specified number of ROWS and COLS.

#!/usr/bin/env python3
"""
multiplication_table.py
This program creates a multiplication table as a "list of lists."
"""

__author__ = "Richard White"
__version__ = "2022-09-06"

def main():
    grid = []
    ROWS = 11
    COLS = 13
    
    # Create the table
    for row in range(ROWS):
        grid.append([])                     # Put a row in the table
        for col in range(COLS):      
            grid[row].append(row * col)     # For this row, append a new col

    # Print the table
    for row in range(ROWS):                 # For every row...
        for col in range(COLS):             # ... and for every col in that row...
                                            # print out the entry, with formatting
                                            # so that they all line up nicely
            print("{0:4d}".format(grid[row][col]), end='')
        print()                             # at end of row, move to next line
         

if __name__ == "__main__":
    main()