Unit 5.5: Text Files
Topics covered in this unit
After completing this unit you will:
- know how to use a
Scanner
object to read from aFile
reference to an external text file on your computer - know how use a
Scanner
andFile
to write to an external text file
5.5.0. Overview
One common way of storing information digitally is in a text-file format: characters arranged in lines of text. The names of these files often have a short extension such as txt
identifying them as text files, although other extensions exist as well. If you use BlueJ to manage your projects for this class, you maybe have noticed that your projects are automatically created with a file called README.TXT
. This is a text file, sometimes called a plaintext file, that can be read by any text editor.
Like most programming languages, Java allows you to read information from a text file for use in a program, and to write information to a text file for storage. This can be convenient if you wish to store the output from a program's calculations, for example. Or perhaps you have a game, and you want to keep track of the highest score ever achieved. The game program can save high scores into the text file while it's running and retrieve those scores the next time the game is run.
This is a short but useful unit that will show you exactly how this process works.
Let's get started.
5.5.1. Overview: Text files
Let's start at the beginning.
Text files
Text files are files on your computer that are made of "plain text," simple representations of characters that don't include any information about font, size, style, color, etc. The letter "A" is just an A.
Example of a text file called todo.txt
:
To-Do List =========== 1. Grocery shopping at Ralph's 2. Finish studying for test 3. Call Aaron
A word processor works with text as well, but is also concerned with styling the text to vary its appearance. Text editors don't concern themselves with these kinds of differences. (Most text editors do allow you to select a preferred font, size, etc. for the text you work with, but once selected, all of the text has that appearance.)
5.5.1.1. ASCII encoding
When computers were first invented, the American Standard Code for Information Interchange (ASCII) encoding used eight bits (one byte) to describe all the 256 characters that could be used by a computer. These characters were primarily from the English alphabet, and useful for English-language speakers.
Bits, byte
values, and char
values
Although the AP Computer Science A course doesn't cover these data types, you may be interested to see them. Here's an example of taking the binary value 01000001
(equivalent to the decimal value 65
) and converting it to a byte.
byte example = "0b01000001"; System.out.println(example); --> Produces the output 65
And if we take that byte
value and cast it as a char
, we get the character associated with that number 65:
System.out.println( (char) example); --> Produces the output A
Keeping in mind that everything the computer does is just manipulating binary digits of 0s and 1s, it's great to be able to have bytes that represent English characters.
But what about France? Taiwan? The Ukraine? How do we deal with characters from different alphabets?
5.5.1.2. UTF-8 encoding
Given that all of the English characters took up a byte (eight bits) of data, we obviously need more bits to be able to represent all the other characters there are in the world... and don't forget the emojis!
Unicode and UTF-8
In order to accommodate more characters, the Unicode Standard was created, assigning a single code point to each character, an integer reference. UTF-8 is one of a number of different encoding standards that describe how these integers (and thus their associated Unicode characters) are accessed.
Under UTF-8, the same byte that we used before still prints out an "A". But if we include some additional bytes, we'll see some more interesting characters.
Example here!
We won't spend further time here developing the concept of UTF-8, although we'll include its capabilities in some of the applications below as a matter of best practices.
5.5.2. Reading from a text file
We've already learned to use a Scanner
object to read standard input from a keyboard. We've had the user enter numeric and textual values as shown in the example here.
Scanner in = new Scanner(System.in); System.out.print("Enter your age: "); double age = in.nextDouble(); System.out.print("Cool. Enter your name, too: "); String name = in.next(); System.out.print("Thanks, " + name + "!"); System.out.println("Hope you're enjoying being " + age + " years old!");
We also want to be able to use a Scanner
object to read from a file stored on the computer, sometimes called an "external file" because it's external to the program code. To do this, we'll need to create a File
object that the Scanner can use.
Here's an example of opening up a text file of integers, using a Scanner to read through it, and printing out the sum of those values. Note that we're not yet using UTF-8 encoding for our files, primarily because we're just working with numbers. We'll see how to encode for UTF-8 soon.
Use the documentation in the code to figure out how everything works.
Using a Scanner
and File
to access an external file
Data file called values.txt
containing a series of integers to be added:
12 23 34 45 56
Main program to open that file, read in the numbers, add them, and print out the sum.
/** * Demo reading from an external file * * @author Richard White * @version 2024-08-11 */ import java.util.Scanner; // So we can receive a series of inputs // from System.in (standard input) and // a File object import java.io.File; // Allows us to reference a file of data // stored on the computer. import java.io.FileNotFoundException; // An error catching library that we have // to reference so we know how to handle // issues if the external file isn't found. public class SumValues { public static void main(String[] args) throws FileNotFoundException { Scanner in = new Scanner(System.in); // Reads standard (keyboard) input System.out.println("Enter the name of the textfile containing a series of integers,"); System.out.println("one integer per line. This file should be in the same directory"); System.out.println("as the program."); System.out.print("Filename: "); String inputFile = in.next(); File inFile = new File(inputFile); // creates a file object that can use to read Scanner diskReader = new Scanner(inFile); // Different Scanner from before! int sum = 0; while (diskReader.hasNextInt()) // while we still have integers to read in... { int value = diskReader.nextInt(); // read in an integer System.out.println("Adding the value " + value); // inform the user sum += value; // add the integer to our running sum } System.out.println("The sum of these values is: " + sum); diskReader.close(); // close the diskReader Scanner } }
Output from run:
Enter the name of the textfile containing a series of integers, one integer per line. This file should be in the same directory as the program. Filename: values.txt Adding the value 12 Adding the value 23 Adding the value 34 Adding the value 45 Adding the value 56 The sum of these values is: 170
Reading data in from an external file is an incredibly useful strategy, particularly where large amounts of data are involved.
5.5.2.1. Reading a CSV-file
If more than one piece of data is located on each line of a file, those data are often separated by commas. A text file containing such data is often called a CSV file, for "comma-separated-values."
Let's see how to handle that.
Keeping track of people in an Contacts application, for example, might include identifying their name
, emailAddress
,and phoneNumber
.
Here's a file, contacts.csv, that contains just such data.
Richard White, rwhite@polytechnic.org, 626-297-1011 Garrett Wolfe, garysimon@gmail.com, 213-455-3213 Susan Smith, ssmith@ua.edu, 001-452-344-1002
5.5.3. Writing to a text file
It's often convenient to be able to save data or program results by writing them to an external file on disk.
Writing to an external text file
Demonstration file, similar to the file above. Additional code for writing to an external file is highlighted in bold.
/** * Demo reading from an external file * * @author Richard White * @version 2024-08-11 */ import java.util.Scanner; // So we can receive a series of inputs // from System.in (standard input) and // a File object import java.io.File; // Allows us to reference a file of data // stored on the computer. import java.io.FileNotFoundException; // An error catching library that we have // to reference so we know how to handle // issues if the external file isn't found. import java.io.PrintWriter; // While a Scanner reads in values, a // PrintWriter writes out values to a // text file. public class SumValuesAndWrite { public static void main(String[] args) throws FileNotFoundException { Scanner in = new Scanner(System.in); // Reads standard (keyboard) input System.out.println("Enter the name of the textfile containing a series of integers,"); System.out.println("one integer per line. This file should be in the same directory"); System.out.println("as the program."); System.out.print("Filename: "); String inputFile = in.next(); System.out.println("Enter the name of a file to which the results will be written."); System.out.print("Filename: "); String outputFile = in.next(); File inFile = new File(inputFile); // creates a file object that can use to read Scanner diskReader = new Scanner(inFile); // Different Scanner from before! PrintWriter diskWriter = new PrintWriter(outputFile); // We don't have to create a File // object for writing. int sum = 0; diskWriter.printf("Numbers being added\n====================\n"); while (diskReader.hasNextInt()) { int value = diskReader.nextInt(); System.out.println("Adding the value " + value); diskWriter.printf("%10d\n", value); // Using formatting sum += value; } System.out.println("The sum of these values is: " + sum); diskWriter.printf("=====================\n"); diskWriter.printf("%10d\n", sum); diskReader.close(); // close the diskReader Scanner diskWriter.close(); // close the diskWriter PrintWriter } }
The program produces a file such as results.txt
which, when examined with a text editor, includes these lines:
Numbers being added ==================== 12 23 34 45 56 ===================== 170
5.5.4. Text file-based applications
Now that we have some idea of how to read from a text file and how to write data to a text file, let's see how we can use those tools to build some interesting applications.
Reading from a text file can be challenging. How should we read from our file?
- One bit at a time?
- One byte at a time?
- One character at a time?
- One line at a time?
There are other questions as welll Do spaces count when we're reading text? Sometimes we want to work with whitespace like spaces, tabs, and blank lines, sometimes we don't.
It can get complicated.
Let's take a look at some common strategies for dealing with text and text files. Apply each of these strategies as needed for your particular application.
5.5.4.1. Reading Tokens
Java separates Scanner
input using whitespace, but sometimes, that's not what you want.
Assume we have the following textfile, lyrics.txt
:
I'm Jumpin' Jack Flash it's a gas, gas, gas.
What happens when the following code is executed?
String inputFile = "lyrics.txt"; File inFile = new File(inputFile); // creates a file object that can use to read Scanner diskReader = new Scanner(inFile); // Different Scanner from before! while (diskReader.hasNext()) { String word = diskReader.next(); System.out.println(word); } diskReader.close();
By default, the next()
method in the Scanner
class gets the next "token," or collection of characters separated from other characters by whitespace. Whitespace includes spaces, tabs, and blank lines—effectively everything that isn't a letter, number, or special character.
Here's the output that is produced.
I'm Jumpin' Jack Flash it's a gas, gas, gas.
This output might be just fine for some purposes, but if we were trying to count how many different words there are, we'd see two separate tokens for gas,
and gas.
, the difference being the comma and the period, which aren't even part of the actual word.
How can we get just the words, and not the commas and periods?
There are a number of strategies to try. Here's one: just after we initialize the diskReader
, include this line:
diskReader.useDelimiter("[\t \n,.]+");
A delimiter is a character that marks the start or end of a given string of characters. Here, in the square brackets, we're giving a list of all the characters that we want to indicate the beginning or end of a string:
- a Tab character (
\t
) - a space
- a newline character (
\n
) - a comma
- a period
The +
after the square brackets indicates that we can have one or more of these characters marking the beginning or end of a word.
What's the output now?
I'm Jumpin' Jack Flash it's a gas gas gas
That's more what we're looking for here!
This system of indicating "patterns of characters" is called regular expressions, and they are an extremely important part of computer science.
5.5.4.2. Reading one character at a time
It can be useful to process characters one-at-a-time. To do this, set the delimiter to ""
:
diskReader.useDelimiter("");
Now the characters are read in one at a time, for processing by the program:
I ' m J u m p i n ' J a c k F l a s h i t ' s a g a s , g a s , g a s .
5.5.4.3. Reading one line at a time
Often each line in a file will include the information for a single record of data, with each record split up into multiple fields, separated by commas or some other delimiter.
For these cases, the Scanner should use the nextline()
method, which reads in a line of code (without the newline character) for further processing:
while (in.hasNextLine()) { String line = nextLine(); // analyze the line of text }
How do you actually go about analyzing a line of text? That depends on what the line looks like and what your needs are. Let's look at two examples which each use a slightly different strategy of analysis.
Analyzing a line with a Scanner
Explanation here.
Here is another way of looking at a line.
Analyzing a line as a String of characters
Explanation here.