Files

Getting data into programs:

Reading files

The structure of files

Step 1: Open the file. Step 2: Read the file one line at a time.

file = open("mydata.txt")
for line in file:
  print(line)
Show an example (perhaps on Python tutor) of a file that has multiple lines and how this program reads the file one line at a time. Notice that the last character of a line is the "\n" character.

You can use the strip function to remove any leading and trailing whitespace (including newline and tab characters) from the string (to return a new string):

file = open("mydata.txt")
for line in file:
  print(line.strip())

Does the following code read (and print) a file twice?

file = open("mydata.txt")
for line in file:
  print(line)
print("------")
file = open("mydata.txt")
for line in file:
  print(line)
No! Internally, the file object maintains the current position that gets incremented for every line read from it.

Example lab code:

def process_input(filename):
    lines = open(filename, 'r').readlines()
    lines = [line.strip() for line in lines]
    ...

The readline function is used to read a single line from the file. The function returns a string that includes the newline character '\n' at its end. The string is empty only if the end of file (also called EOF) is reached. The function also increments the current position in the file, so that a next call to readline returns the next line.

file = open("mydata.txt")
line = file.readline()

Exercise: write a recursive function reverseLines that accepts a file input stream and prints the lines of that file in reverse order.
Example input:

Hello world
Hello foo
Hello bar
baz hello
Expected output:
baz hello
Hello bar
Hello foo
Hello world
Is this problem self-similar? What is a file that is very easy to reverse? Hint: reversing the lines of a file can be done by (1) reading a line L from the file, (2) printing the rest of the lines in reverse order --- self-similarity, (3) printing the line L.
def reverseLines(input):
    line = input.readline()
    if len(line) != 0:
        # recursive case
        reverseLines(input)
        print(line.strip())
reverseLines(open("input.txt"))
Where is the base-case?

Directories

A directory is a cataloging structure which contains references to other computer files, and possibly other directories. It is also called a folder on some operating systems.

Need to import os for all directory operations.

os.listdir(dir) is used to obtain the names of files (and directories) in directory dir, as a list of strings.

A file or a directory could be nested in another directory. The full sequence of directory names to reach a file (including the filename) separated by a forward-slash / character is called a path. e.g., a/b/c. If the first character is a forward-slash, then the path is an absolute path starting from the root directory /; else it is a relative path from the current working directory.

os.path.isfile(path) returns True if and only if the file represented by path is a file.

os.path.isdir(path) returns True if and only if the file represented by path is a directory.

os.path.basename(path) extracts the filename from path (and ignores all the parent directories leading up to the filename).

In all cases, path can either be relative or absolute.

Exercise: write a function crawl that accepts a file name as a parameter and prints information about that file.

Example:
courses
    col100
        lab2
            hello_world.cpp
            order_of_evaluation.cpp
        lab3
            if_then_else.cpp
        minor1.pdf
        minor2.pdf
    phl100
        ...

How is this problem self-similar? Crawling a directory can be expressed in terms of crawling the subdirectories, albeit with a different indentation.
Base-case? File

import os

# Prints information about this file, 
# and (if it is a directory) any files inside it. 
def crawl(path, indent):
    print(indent + os.path.basename(path))
    if os.path.isdir(path):
        # recursive case; print contained files/dirs 
        for subfile in os.listdir(path):
            crawl(path + "/" + subfile, indent + "    ")

crawl(".", "")