Quick Links

It's pretty easy to read the contents of a Linux text file line by line in a shell script---as long as you deal with some subtle gotchas. Here's how to do it the safe way.

Files, Text, and Idioms

Each programming language has a set of idioms. These are the standard, no-frills ways to accomplish a set of common tasks. They're the elementary or default way to use one of the features of the language the programmer is working with. They become part of a programmer's toolkit of mental blueprints.

Actions like reading data from files, working with loops, and swapping the values of two variables are good examples. The programmer will know at least one way to achieve their ends in a generic or vanilla fashion. Perhaps that will suffice for the requirement at hand. Or maybe they'll embellish the code to make it more efficient or applicable to the specific solution they are developing. But having the building-block idiom at their fingertips is a great starting point.

Knowing and understanding idioms in one language makes it easier to pick up a new programming language, too. Knowing how things are constructed in one language and looking for the equivalent---or the closest thing---in another language is a good way to appreciate the similarities and differences between programming languages you already know and the one you're learning.

Reading Lines From a File: The One-Liner

In Bash, you can use a while loop on the command line to read each line of text from a file and do something with it. Our text file is called "data.txt." It holds a list of the months of the year.

January

February

March

.

.

October

November

December

Our simple one-liner is:

while read line; do echo $line; done < data.txt

while read line; do echo $line; done < data.txt in a terminal window

The while loop reads a line from the file, and the execution flow of the little program passes to the body of the loop. The echo command writes the line of text in the terminal window. The read attempt fails when there are no more lines to be read, and the loop is done.

One neat trick is the ability to redirect a file into a loop. In other programming languages, you'd need to open the file, read from it, and close it again when you'd finished. With Bash, you can simply use file redirection and let the shell handle all of that low-level stuff for you.

Of course, this one-liner isn't terribly useful. Linux already provides the cat command, which does exactly that for us. We've created a long-winded way to replace a three-letter command. But it does visibly demonstrate the principles of reading from a file.

That works well enough, up to a point. Suppose we have another text file that contains the names of the months. In this file, the escape sequence for a newline character has been appended to each line. We'll call it "data2.txt."

January\n

February\n

March\n

.

.

October\n

November\n

December\n

Let's use our one-liner on our new file.

while read line; do echo $line; done < data2.txt

while read line; do echo $line; done < data2.txt in a terminal window

The backslash escape character " \ " has been discarded. The result is that an "n" has been appended to each line. Bash is interpreting the backslash as the start of an escape sequence. Often, we don't want Bash to interpret what it is reading. It can be more convenient to read a line in its entirety---backslash escape sequences and all---and choose what to parse out or replace yourself, within your own code.

If we want to do any meaningful processing or parsing on the lines of text, we'll need to use a script.

Reading Lines From a File With a Script

Here's our script. It's called "script1.sh."

        #!/bin/bashCounter=0while IFS='' read -r LinefromFile || [[ -n "${LinefromFile}" ]]; do ((Counter++)) echo "Accessing line $Counter: ${LinefromFile}"done < "$1"
    

We set a variable called Counter to zero, then we define our while loop.

The first statement on the while line is IFS='' . IFS stands for internal field separator. It holds values that Bash uses to identify word boundaries. By default, the read command strips off leading and trailing whitespace. If we want to read the lines from the file exactly as they are, we need to set IFS to be an empty string.

We could set this once outside of the loop, just like we're setting the value of Counter . But with more complex scripts---especially those with many user-defined functions in them---it is possible that IFS could be set to different values elsewhere in the script. Ensuring that IFS is set to an empty string each time the while loop iterates guarantees that we know what its behavior will be.

We're going to read a line of text into a variable called LinefromFile . We're using the -r (read backslash as a normal character) option to ignore backslashes. They'll be treated just like any other character and won't receive any special treatment.

There are two conditions that will satisfy the while loop and allow the text to be processed by the body of the loop:

  • read -r LinefromFile : When a line of text is successfully read from the file, the read command sends a success signal to the while , and the while loop passes the execution flow to the body of the loop. Note that the read command needs to see a newline character at the end of the line of text in order to consider it a successful read. If the file is not a POSIX compliant text file, the last line may not include a newline character. If the read command sees the end of file marker (EOF) before the line is terminated by a newline, it will not treat it as a successful read. If that happens, the last line of text will not be passed to the body of the loop and will not be processed.
  • [ -n "${LinefromFile}" ] : We need to do some extra work to handle non-POSIX compatible files. This comparison checks the text that is read from the file. If it isn't terminated with a newline character, this comparison will still return success to the while loop. This ensures that any trailing line fragments are processed by the body of the loop.

These two clauses are separated by the OR logical operator " || " so that if either clause returns success, the retrieved text is processed by the body of the loop, whether there is a newline character or not.

In the body of our loop, we're incrementing the Counter variable by one and using echo to send some output to the terminal window. The line number and the text of each line are displayed.

We can still use our redirection trick to redirect a file into a loop. In this case, we're redirecting $1, a variable that holds the name of the first command line parameter that passed to the script. Using this trick, we can easily pass in the name of the data file that we want the script to work on.

Copy and paste the script into an editor and save it with the filename "script1.sh." Use the chmod command to make it executable.

chmod +x script1.sh

chmod +x script1.sh in a terminal window

Let's see what our script makes of the data2.txt text file and the backslashes contained within it.

./script1.sh data2.txt

./script1.sh data2.txt in a terminal window

Every character in the line is displayed verbatim. The backslashes are not interpreted as escape characters. They're printed as regular characters.

Passing the Line to a Function

We're still just echoing the text to the screen. In a real-world programming scenario, we'd likely be about to do something more interesting with the line of text. In most cases, it is a good programming practice to handle the further processing of the line in another function.

Here's how we could do it. This is "script2.sh."

        #!/bin/bashCounter=0function process_line() { echo "Processing line $Counter: $1"}while IFS='' read -r LinefromFile || [[ -n "${LinefromFile}" ]]; do ((Counter++)) process_line "$LinefromFile"done < "$1"
    

We define our Counter variable as before, and then we define a function called process_line() . The definition of a function must appear before the function is first called in the script.

Our function is going to be passed the newly read line of text in each iteration of the while loop. We can access that value within the function by using the $1 variable. If there were two variables passed to the function, we could access those values using $1 and $2 , and so on for more variables.

The while loop is mainly the same. There is only one change inside the body of the loop. The echo line has been replaced by a call to the process_line() function. Note that you don't need to use the "()" brackets in the name of the function when you are calling it.

The name of the variable holding the line of text, LinefromFile , is wrapped in quotation marks when it is passed to the function. This caters for lines that have spaces in them. Without the quotation marks, the first word is treated as $1 by the function, the second word is considered to be $2 , and so on. Using quotation marks ensures that the entire line of text is handled, altogether, as $1. Note that this is not the same $1 that holds the same data file passed to the script.

Because Counter has been declared in the main body of the script and not inside a function, it can be referenced inside the process_line() function.

Copy or type the script above into an editor and save it with the filename "script2.sh." Make it executable with chmod :

chmod +x script2.sh

chmod +x script2.sh in a terminal window

Now we can run it and pass in a new data file, "data3.txt." This has a list of the months in it, and one line with many words on it.

January

February

March

.

.

October

November \nMore text "at the end of the line"

December

Our command is:

./script2.sh data3.txt

./script2.sh data3.txt in a terminal window

The lines are read from the file and passed one by one to the process_line() function. All the lines are displayed correctly, including the odd one with the backspace, quotation marks, and multiple words in it.

Building Blocks Are Useful

There's a train of thought that says that an idiom must contain something unique to that language. That's not a belief that I subscribe to. What's important is that it makes good use of the language, is easy to remember, and provides a reliable and robust way to implement some functionality in your code.