Hacks, Leaks, and Revelations

~48 min read

Chapter 7: An Introduction to Python

The skills you’ve learned in the last few chapters are instrumental for investigating leaked datasets, but having basic programming knowledge is even more powerful. Using Python or other programming languages, you can give your computer precise instructions for performing tasks that existing tools or shell scripts don’t allow. For example, you could write a Python script that scours a million pieces of video metadata to determine where the videos were filmed. In my experience, Python is also simpler, easier to understand, and less error-prone than shell scripts.

This chapter provides a crash course on the fundamentals of Python programming. You’ll learn to write and execute Python scripts and use the interactive Python interpreter. You’ll also use Python to do math, define variables, work with strings and Boolean logic, loop through lists of items, and use functions. Future chapters rely on your understanding of these basic skills.

Exercise 7-1: Install Python

Some operating systems, including most versions of Linux and macOS, come with Python preinstalled, and it’s common to have multiple versions of Python installed at once. This book uses Python 3. After you follow the Python installation instructions for your operating system in this exercise, you should be able to run Python scripts with the python3 (for Linux and Mac) or python (for Windows) command.

Windows

Download and install the latest version of Python 3 for Windows from https://www.python.org. During installation, check the box Add Python 3.x to PATH (where 3.x is the latest Python 3 version), which allows you to run the python command in PowerShell without using the Python program’s absolute path.

Wherever this chapter instructs you to open a terminal, use PowerShell instead of an Ubuntu terminal. You can also learn to use Python in Ubuntu with WSL by following this chapter’s Linux instructions, but running Python directly in Windows makes reading and writing data on your Windows-formatted USB disk much faster.

Windows users should replace all instances of python3 with python when running the example code in this chapter.

Linux

Open a terminal and make sure the python3, python3-pip, and python3-venv packages are installed, using this apt command:

sudo apt install python3 python3-pip python3-venv

This command either installs the latest version of Python 3 available in the Ubuntu repositories (as well as a few related packages you’ll need for this chapter) or does nothing if the packages are already installed.

macOS

Open a terminal and run the following Homebrew command to make sure python3 is installed:

brew install python3

This command either installs the latest version of Python 3 available in Homebrew or does nothing if it’s already installed.

Exercise 7-2: Write Your First Python Script

Now that you’ve downloaded Python, you’ll write and run a simple Python script that displays some text in your terminal.

In your text editor, create a new file called exercise-7-2.py (all Python scripts end in .py). The first time you open a Python script in VS Code, it asks if you want to install the Python extension. I recommend doing so in order to enable VS Code to make suggestions as you’re typing. The extension also has various features for highlighting syntax errors and helping you format your code nicely.

Enter the following code (or copy and paste it from https://github.com/micahflee/hacks-leaks-and-revelations/blob/main/chapter-7/exercise-7-2.py), then save the file:

print("hacks")
print("leaks")
revelations = "revelations".upper()
print(revelations)

As with shell scripts, Python scripts run instructions one line at a time, starting at the top. When you run this code, print("hacks") calls a function called print() and passes the string hacks into it, displaying hacks in your terminal window. The second line similarly displays leaks. (I’ll explain strings in greater detail in the Python Basics section on page 172, and functions in the Functions section on page 192.)

Next, the script defines a variable called revelations and sets its value to the uppercase version of the string revelations. To find the uppercase version of that string, the program calls the upper() method, which is a type of function. The final line then displays what’s stored in the revelations variable: REVELATIONS.

NOTE I have fond memories of retyping snippets of code from books. When I was a teenager, I taught myself web and video game development by reading programming books and typing the code samples I found into my own editor. I always found that actually retyping the code, rather than copying and pasting it, helped make the concepts stick, so I recommend doing that for the exercises in this book.

In a terminal, change to your exercises folder and run the script you just created with the following command (Windows users, remember to replace python3 with python):

micah@trapdoor chapter-7 % python3 exercise-7-2.py

The argument in this command is the path to the script that you want to run, exercise-7-2.py. You should get the following output:

hacks
leaks
REVELATIONS

Try making the following changes to your script, running it after each change to see the results:

Python Basics

In this section, you’ll learn to write code in the interactive Python interpreter, comment your code, start doing simple math in Python, and use strings and lists. This gentle introduction to Python syntax will let you quickly try out some code on your own, before you dive into more advanced topics.

As you read, don’t be shy about searching online for answers to any Python questions you might have beyond what this book covers. I frequently find solutions to Python problems on websites like Stack Overflow, a forum where people can ask technical questions and others can answer them.

The Interactive Python Interpreter

The Python interpreter is a command line program that lets you run Python code in real time, without writing scripts first, allowing you to quickly test commands. To open the Python interpreter, you run the python3 command without any arguments, like so:

micah@trapdoor ~ % python3
--snip--
Type "help", "copyright", "credits" or "license" for more information.
>>>

The interpreter starts by telling you exactly which version of Python you’re using. Similar to a command line interface, it gives you the prompt >>> and waits for you to enter a Python command.

Run the following command:

>>> print("Hello World!")
Hello World!
>>>

Entering print("Hello World!") and pressing ENTER should immediately run your code, displaying Hello World! on the next line. Exit the interpreter and return to the shell by running exit() or pressing CTRL[-D]{.Character_20_style}.

In the remainder of this book, if my examples include the >>> prompt, that means they’re running in the Python interpreter. Run the same code in your own interpreter as you follow along.

Comments

Writing code can be confusing even to experienced programmers, so it’s always a good idea to comment your code: add inline notes to yourself or to others who might read your program. If you describe the purpose of a specific portion of code in plain English (or whatever language you speak), whoever looks at this code in the future can understand the gist of what it’s doing at a glance.

If a line of code starts with a hash mark (#), the whole line is a comment. You can also add a hash mark after some code, followed by your comment. For example, run the following lines of code:

>>> # This is a comment
>>> x = 10 # This sets the variable x to the value 10
>>> print(x)
10

This is exactly the same as comments in shell scripting, which you learned about in Chapter 3. Python ignores comments, since they’re intended for humans.

Math with Python

Computers, which are technically complicated calculators, are great at doing math. It might not be immediately apparent, but investigating datasets means constantly dealing with basic math: calculating disk space, counting files, searching for keywords, and sorting lists. Here’s how a few basic mathematical operations work in Python:

Operators

The arithmetic operators for addition (+), subtraction ([−]{.listplain_symbol}), multiplication ([×]{.listplain_symbol}), and division (/) are mostly the same in Python: +, -, and /, with an asterisk * for multiplication.

Variables

In math, a variable is a placeholder, normally a letter like x. Variables in math often represent something unknown and it’s your job to solve for it, but Python variables are never unknown—they always have a value. Name your Python variables something descriptive like price or number_of_retweets rather than single letters without clear meanings. Variables in Python can represent much more than just numbers, as you’ll see later in this chapter.

Expressions

An expression is a bit like a sentence made up of numbers, variables, and operators. For example, here are a few expressions:

1 + 1
100 / 5
x * 3 + 5

Like sentences, expressions need to have the correct syntax. Just like “potato the inside” isn’t a valid sentence, 1 1 + isn’t a valid expression. Enter the following expressions in the Python interpreter to see how it evaluates them:

>>> 1 + 1
2
>>> 100 / 5
20.0
>>> 3.14 * 2
6.28

Just like a calculator, Python respects the order of operations. It also supports using parentheses:

>>> 100 - 12 * 2
76
>>> (100 - 12) * 2
176

As in the rest of math, Python won’t allow you to divide by zero:

>>> 15 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

You define a variable in Python by saving a value inside that variable with the equal sign (=). Try defining price and sales_tax variables and then using them in an expression:

>>> price = 100
>>> sales_tax = .05
>>> total = price + (price * sales_tax)
>>> print(total)
105.0

You can’t use variables that you haven’t yet defined. For example, if you use an undefined variable x in an expression, you’ll get an error:

>>> x * 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Instead of just setting a variable equal to some value, you’ll often want to modify its existing value by a certain amount. For example, if you’re keeping track of the total price of items in a shopping cart in the total variable and want to add 10 dollars to that total, you would define the variable like so:

total = total + 10

Python’s += operator performs the same operation:

total += 10

The += operator adds the number on the right to the variable on the left. The Python operators -=, *=, and /= work the same way. In your Python interpreter, define a variable, then try changing its value using these operators.

Strings

A string is a sequence of characters. Any time you need to load, modify, or display text, you store it in a string. If you load the contents of a text file into a variable in Python (for example, a 5MB EML file that includes attachments), that’s a string. But strings are also often very short: in Exercise 7-2, you used the strings "hacks", "leaks", and "revelations".

In Python, strings must be enclosed in either single quotes (') or double quotes ("). Run the following examples, which demonstrate how to use each type of quote. Here is a string with double quotes:

>>> "apple" 
'apple'

Here is the same string with single quotes:

>>> 'apple' # The same string with single quotes
'apple'

Use double quotes if you have single quotes within the string:

>>> "She's finished!"
"She's finished!"

Use single quotes if you have double quotes within the string:

>>> 'She said, "Hello" '
'She said, "Hello" '

Some of the same techniques you learned in Chapter 3 to work with strings in your shell also apply to strings in Python. If your string uses double quotes, you can escape them like so:

>>> "She said, \"Hello\" "

You can similarly escape single quotes in a single-quote string:

>>> 'She\'s finished!'

Like numbers, strings can be stored in variables. Run the following code to define first_name and last_name variables, replacing my name with yours:

>>> first_name = "Micah"
>>> last_name = "Lee"

In Python, f-strings are strings that can contain variables. To use an f-string, put the letter f before the quotes, then put variable names in braces ({and}). For example, run the following commands to display the values of the variables you just defined:

>>> print(f"{first_name} {last_name}")
Micah Lee
>>> full_name = f"{first_name} {last_name}"
>>> print(f"{first_name}'s full name is {full_name}, but he goes by {first_name}")
Micah's full name is Micah Lee, but he goes by Micah

Place expressions inside f-strings in order to evaluate them:

>>> print(f"1 + 2 + 3 + 4 + 5 = {1 + 2 + 3 + 4 + 5}")
1 + 2 + 3 + 4 + 5 = 15

Python will evaluate the expression for you, in this case 1 + 2 + 3 + 4 + 5, and just print the result, which is 15.

Exercise 7-3: Write a Python Script with Variables, Math, and Strings

In this exercise, you’ll practice the concepts you’ve learned so far by writing a simple Python script that uses variables and a few basic math expressions and prints some strings. The script calculates how old a person is in months, days, hours, minutes, and seconds, given their name and an age (in years), and then displays this information. In your text editor, create a new file called exercise-7-3.py and define these two variables:

name = "Micah"
age_years = 38

Replace the values of name and age_years with your own name and age.

Next, define some more variables that represent age in different units: months, days, hours, minutes, and seconds. Start with months:

age_months = age_years * 12

Add a days variable:

age_days = age_years * 365

Finally, define variables for hour, minutes, and seconds:

age_hours = age_days * 24
age_minutes = age_hours * 60
age_seconds = age_minutes * 60

Now that you’ve defined the variables, you can display them to the user. Since the numbers in this exercise are going to get big, you’ll include commas to make them easier to read. For example, run this code in the interpreter to display the variable number with commas using an f-string, adding :, after the variable name within the braces:

>>> number = 1000000
>>> print(f"the number is: {number}")
the number is: 1000000
>>> print(f"the number is: {number:,}")
the number is: 1,000,000

Back in the Python script, add code to display all of the values, like this:

print(f"{name} is {age_years:,} years old")
print(f"That would be {age_months:,} months old")
print(f"Which is {age_days:,} days old")
print(f"Which is {age_hours:,} hours old")
print(f"Which is {age_minutes:,} minutes old")
print(f"Which is {age_seconds:,} seconds old")

This code uses {name} to display the value of the name variable. That variable is a string, so it doesn’t make sense to try to separate it with commas. The rest of the variables are numbers, though, so the code includes :, inside the braces for all of them to include commas in the output. (The age_years values don’t need commas, unless you happen to be older than 1,000, but it doesn’t hurt to use the :, syntax—it adds a comma only if one is needed.)

Save the file in your text editor. (A complete copy of the script is available at https://github.com/micahflee/hacks-leaks-and-revelations/blob/main/chapter-7/exercise-7-3.py.) In a terminal, change to your exercises folder for this exercise and run the script. Here’s what happens when I do so:

micah@trapdoor chapter-7 % python3 exercise-7-3.py
Micah is 38 years old
That would be 456 months old
Which is 13,870 days old
Which is 332,880 hours old
Which is 19,972,800 minutes old
Which is 1,198,368,000 seconds old

When you run the script with your name and age, try changing the age and running it again to see how the numbers change.

Lists and Loops

You’ll often need to manage lists when investigating datasets. For example, you might work with lists of filenames or rows in a spreadsheet. In this section, you’ll learn how to store lists as variables and loop through those lists in order to run the same code for each list item. You did something similar in Chapter 4 with for loops in the shell, but this time you’ll be working in Python.

Defining and Printing Lists

In Python, lists are defined with brackets ([and]), with each item in the list separated by commas (,). You might have a list of numbers:

[1, 2, 3]

Or of strings:

["one", "two", "three"]

Or an empty list:

[]

Just as variables can contain numbers or strings, they can also contain lists. Use this line of code to store a list of letters in the Hebrew alphabet, spelled out using Latin characters, in the hebrew_letters variable:

>>> hebrew_letters = ["aleph", "bet", "gimel", "dalet", "he", "vav", "zayin",
"chet", "tet", "yod", "kaf", "lamed", "mem", "nun", "samech", "ayin", "pe",
"tsadi", "qof", "resh", "shin", "tav"]

Now use the print() function to display the items in the hebrew_letters variable:

>>> print(hebrew_letters)
['aleph', 'bet', 'gimel', 'dalet', 'he', 'vav', 'zayin', 'chet', 'tet', 'yod',
'kaf', 'lamed', 'mem', 'nun', 'samech', 'ayin', 'pe', 'tsadi', 'qof', 'resh',
'shin', 'tav']

You can make long lists easier to read by entering each item in the list on its own line, indented, like this:

hebrew_letters = [
    "aleph",
--snip--
    "tav"
]

Each item in a list has an index, a number that represents where in the list that item is located. The index of the first item is 0, the second is 1, the third is 2, and so on. To select a list item, you append brackets with the item’s index to the end of the list. For example, to select the first letter in the hebrew_letters list, use hebrew_letters[0]:

>>> print(hebrew_letters[0])
aleph
>>> print(hebrew_letters[1])
bet

The first line of code uses the print() function to display the item from the hebrew_letters list at index 0 (aleph), and the second line displays the item at index 1 (bet).

Now use negative numbers to select items starting from the end of the list, like so:

>>> print(hebrew_letters[-1])
tav
>>> print(hebrew_letters[-2])
shin

You can use the len() function to count the number of items in a list. For example, run the following code to get the number of items in the hebrew_letters list:

>>> print(len(hebrew_letters))
22

This code uses the print() function to display the output of the len() function. You could get the same result by storing the output of the len() function in a variable:

>>> length_of_hebrew_alphabet = len(hebrew_letters)
>>> print(length_of_hebrew_alphabet)
22

The first line of code runs len(hebrew_letters) and stores the result in the length_of_hebrew_alphabet variable. The second line uses the print() function to display that result.

You don’t have to store a list in a variable to select items from it. For example, run this code to display the second item (at index 1) in the list [1,2,3]:

>>> print([1,2,3][1])
2

The append() method lets you add items to lists. For example, run the following code to add a new color to a list of favorites:

>>> favorite_colors = ["red", "green", "blue"]
>>> favorite_colors.append("black")
>>> print(favorite_colors)
['red', 'green', 'blue', 'black']

This code defines the variable favorite_colors as a list of strings containing red, green, and blue. It then adds another string, black, to the list by using the append() method, before finally displaying the value of the favorite_colors variable, using the print() function.

When writing code that analyzes datasets, you’ll often create an empty list and then append items to that list to make the data easier to work with. For example, you’ll learn in Chapter 13 about the code I wrote while investigating America’s Frontline Doctors, an anti-vaccine group. To properly analyze a dataset of hundreds of thousands of files containing patient information, I wrote code that created an empty list, opened each file, and appended the pertinent patient data to that list.

Running for Loops

In Chapter 4, you used a for loop to unzip each BlueLeaks ZIP file. Python also has for loops, and they work the same way they do in shell scripting: by running a snippet of code, called a block, on each item in a list. A for loop has the following syntax:

for variable_name in list_name:

This syntax is followed by a block of indented code. Once you choose a new variable to define in variable_name, you can use it in your code block.

For example, run the following code to loop through the hebrew_letters list, store each item in the variable letter, and then display that item:

>>> for letter in hebrew_letters:
...     print(letter)
...

After you enter the for loop, which ends in a colon (:), the Python interpreter changes the prompt from >>> to and waits for you to enter the code block that will run for each item. Indent every line in your block with the same number of spaces, then end your block with a blank line. In this example, the code block that runs is just one line: print(letter).

The code should return the following output:

aleph
bet
--snip--
shin
tav

In this example, the for loop runs 22 times, once for each item in the list, and stores the item in the variable letter. The first time it loops, the value of letter is aleph. The second time, the value is bet, and so on.

NOTE Indentation tells Python which lines of code are part of your code blocks. If some lines are indented with four spaces, but others with two or three spaces, your Python code won’t work. To keep things simple, I recommend always indenting with four spaces. When writing scripts in VS Code, you can indent multiple lines of code by selecting them with your mouse and then pressing TAB (which indents four spaces for you) or unindent by selecting a line and pressing SHIFT-TAB.

The following, slightly more complicated, example uses the len() function to count not the number of items in a list but characters in a string:

>>> for letter in hebrew_letters:
...     count = len(letter)
...     print(f"The letter {letter} has {count} characters")
...
The letter aleph has 4 characters
The letter bet has 3 characters
The letter gimel has 5 characters
--snip--
The letter resh has 4 characters
The letter shin has 4 characters
The letter tav has 3 characters

This code tells you how many characters are used to spell the word for each Hebrew letter in the Latin alphabet.

You can use for loops to loop through strings as well, since a string is essentially a list of characters:

>>> word = "hola"
>>> for character in word:
...     print(character)
...
h
o
l
a

You can run a single for loop as many times as you need for the dataset you’re working on. For example, in Chapter 9, you’ll write code that can open each of the hundreds of spreadsheets in the BlueLeaks dataset and uses a for loop to run your block of code on each row.

In the next section, you’ll learn to make your programs more dynamic and useful by determining which blocks of code should run under which circumstances.

Control Flow

Python scripts start at the top and run one line of code at a time, but they don’t always run these lines consecutively. In for loops, for example, the same block of code might run over and over again before the loop completes and the program continues to the next line. The order in which your lines of code run is your program’s control flow.

As you start writing code, you’ll often alter the control flow by telling your computer to do different things in different situations. If you write a program that loops through a list of files in a dataset, for instance, you may want to run different code when the program reaches a PDF document than when it encounters an MP4 video.

This section teaches you how to run certain blocks of code under certain conditions. To do this, you’ll learn how to compare values, use if statements based on these comparisons, and express arbitrarily complicated conditions using Boolean logic, all of which allow you to control the flow of your program. You’ll need this sort of logic whenever you write code that searches a dataset for something specific and then responds according to what it finds.

Comparison Operators

As mentioned earlier in this chapter, expressions that use the arithmetic operators +, -, /, and * generally evaluate to numbers: 1 + 1 evaluates to 2, for example. Expressions in Python also use the following comparison operators to compare terms:

< Less than

<= Less than or equal to

> Greater than

>= Greater than or equal to

== Equal to (not to be confused with a single equal sign (=), which defines a variable)

!= Not equal to

A Boolean is a type of variable that is either True or False. Expressions that use comparison operators evaluate to Booleans instead of numbers, as in the following examples:

>>> 100 > 5
True
>>> 100 < 5
False
>>> 100 > 100
False
>>> 100 >= 100
True
>>> 0.5 < 1
True
>>> 0.999999 == 1
False

You can use these same operators to compare strings, too. In Python, saying that one string is less than another means that the former comes before the latter in alphabetical order, as in the following examples:

>>> "Alice" == "Bob"
False
>>> "Alice" != "Bob"
True
>>> "Alice" < "Bob"
True
>>> "Alice" > "Bob"
False

Strings are case sensitive. If you don’t care about capitalization and want to just see whether the strings are made up of the same words, make them both lowercase before you compare them:

>>> name1 = "Vladimir Putin"
>>> name2 = "vladimir putin"
>>> name1 == name2
False
>>> name1.lower() == name2.lower()
True

This technique allows you to determine whether the data you’re evaluating fulfills a given condition. For example, in Chapter 11, you’ll write code to analyze the metadata of over a million videos uploaded to the far-right social network Parler. Using comparison operators, you’ll determine which videos were filmed on January 6, 2021, in Washington, DC, during the insurrection after Trump lost the 2020 election.

if Statements

You use if statements to tell your code to do something under certain conditions but not others. The syntax for an if statement is if expression``: followed by an indented block of code. If the expression evaluates to True, then the code block runs. If the expression evaluates to False, the code doesn’t run, and the flow moves on to the next line.

For example, run the following code:

>>> password = "letmein"
>>> if password == "letmein":
...     print("ACCESS GRANTED")
...     print("Welcome")
...
ACCESS GRANTED
Welcome
>>>

This code sets the value of the password variable to letmein. That means the expression in the if statement (password == "letmein") evaluates to True and the code block runs, so it displays ACCESS GRANTED and Welcome.

Now try including the wrong password in your if statement:

>>> password = "yourefired"
>>> if password == "letmein":
...     print("ACCESS GRANTED")
...     print("Welcome")
...
>>>

This time, because you set the password to "yourefired", the expression password == "letmein" evaluates to False, and Python doesn’t run the if statement’s code block.

An if statement can optionally incorporate an else block so that if the condition is true, one code block runs, and if it’s false, another block runs:

if password == "letmein":
    print("ACCESS GRANTED")
    print("Welcome")
else:
    print("ACCESS DENIED")

You can also incorporate elif blocks, short for “else if.” These let you make another comparison if the first comparison is false, as shown in Listing 7-1.

if password == "letmein":
    print("ACCESS GRANTED")
    print("Welcome")
elif password == "open sesame":
    print("SECRET AREA ACCESS GRANTED")
else:
    print("ACCESS DENIED")

Listing 7-1: Comparing if, elif, and else statements

In this code, the if statement evaluates the password == "letmein" expression. If it evaluates to True, the code block runs and displays the ACCESS GRANTED and Welcome messages. If the expression evaluates to False, the program moves on to the elif block, which evaluates the password == "open sesame" expression. If that evaluates to True, it runs the block of code that displays SECRET AREA ACCESS GRANTED. If it evaluates to False, the program moves on to the else code block, which displays ACCESS DENIED.

Nested Code Blocks

You can also accomplish the results of Listing 7-1 with multiple if statements and no elif, using nested code blocks, or indented blocks of code inside other indented blocks of code:

if password == "letmein":
    print("ACCESS GRANTED")
    print("Welcome.")
else:
    if password == "open sesame":
        print("SECRET AREA ACCESS GRANTED")
    else:
        print("ACCESS DENIED")

This code is functionally the same as Listing 7-1.

The more complicated your code, the more nested code blocks may come in handy. You might include for loops inside your if statement code blocks, or if statements inside for loops, or even for loops inside for loops.

You might prefer elif statements to nested if statements purely for readability purposes: it’s easier to read and write code with 100 elif statements than code that’s indented 100 times because it has 100 nested if statements.

Searching Lists

The Python in operator, which tells you whether an item appears in a list, is useful for working with lists. For example, to check whether the number 42 appears in a list of numbers, you can use in as follows:

favorite_numbers = [7, 13, 42, 101]
if 42 in favorite_numbers:
    print("life, the universe, and everything")

To the left of the in operator is a potential item inside a list, and to the right is the list. If the item is in the list, then the expression evaluates to True. If not, it evaluates to False.

You can also use not in to check if an item isn’t in a list:

if 1337 not in favorite_numbers:
    print("mess with the best, die like the rest")

Additionally, you can use in to search for smaller strings inside of larger strings:

sentence = "What happens in the coming hours will decide how bad the Ukraine
crisis gets for the vulnerable democracy in Russian President Vladimir Putin's
sights but also its potentially huge impact on Americans and an already deeply
unstable world."
if "putin" in sentence.lower():
    print("Putin is mentioned")

This code defines the variable sentence, then checks to see if the string putin is inside the lowercase version of that sentence.

Logical Operators

It’s possible to describe any scenario, no matter how complicated, using the logical operators and, or, and not. Like comparison operators, logical operators also evaluate to True or False, and they let you combine comparisons.

For example, say you like astronomy and want to know if it’s a good time for stargazing. Let’s set this up as a logical expression: if ((it’s dark out) and (it’s not raining) and (it’s not cloudy)) or (you have access to the James Webb Space Telescope), then yes. Otherwise, no. Logical operators let you define this sort of logic in your Python code.

Like other operators, the and and or operators compare an expression on the left with an expression on the right. With and, if both sides are true, the whole expression is true. If either is false, the whole expression is false. For example:

With or, if either expression is true, the whole expression is true. The whole expression is false only when both expressions are false. For example:

The not expression differs from the others in that it doesn’t use an expression to the left, just to the right. It flips true to false, and false to true. For example:

In sum, use and to determine whether two things are both true, use or to determine whether at least one of two things is true, and use not to change a true to a false or vice versa. For example, consider this code:

if country == "US" and age >= 21:
    print("You can legally drink alcohol")
else:
    if country != "US":
        print("I don't know about your country")
    else:
        print("You're too young to legally drink alcohol")

The first if statement has an expression that compares two other expressions, country == "US" and age >= 21. If country is US and age is greater than or equal to 21, the expression simplifies to True and True. Since both Booleans are true, this evaluates to simply True, and the code block after the if statement runs, printing You can legally drink alcohol to the screen.

The first else block determines what happens if that expression evaluates to False. For example, if country is Italy, but age is 30, the expression simplifies to False and True. Since at least one of the Booleans is false, this evaluates to simply False, so the code block after else runs. Likewise, if country is US but age is 18, then the expression simplifies to True and False. This, too, evaluates to False, so the code block after else runs.

Inside the second else block is a simple if statement without Boolean logic: if country isn’t US, the screen displays I don't know about your country. Otherwise (meaning country is US), it displays You're too young to legally drink alcohol.

Just like with math, you can use parentheses in if statements to compare multiple expressions. For example, the drinking age in the US is 21 and the drinking age in Italy is 18. Let’s add Italy to this program, this time incorporating an or operator:

if (country == "US" and age >= 21) or (country == "Italy" and age >= 18):
    print("You can legally drink alcohol")
else:
    if country not in ["US", "Italy"]:
        print("I don't know about your country")
    else:
        print("You're too young to legally drink alcohol")

In plain English, the first if statement tells the program that if your country is the US and you’re at least 21 or if your country is Italy and you’re at least 18, then you can legally drink. In either case, the whole expression in the if statement is true, and the program prints You can legally drink alcohol. If just one of those is true and not the other (for instance, if you’re a 19-year-old Italian), the whole statement is still true. That’s what or means: if either of the things you’re comparing is true, then the whole expression is true.

Use the operator not to turn True values into False or False values into True. For example:

if country == "US" and not age >= 21:
    print("Sorry, the drinking age in the US is 21")

You could replace not age >= 21 with age < 21 for the same result.

Exception Handling

Python programs may abruptly quit with an error called an exception. This is typically known as “throwing an exception.” Exception handling ensures that your Python code will run another code block when your code catches an exception, instead of quitting with an error.

You’ve seen a few examples of exceptions already in this chapter, like when you tried dividing by zero (something you can’t do in math) or using a variable that hasn’t been defined:

>>> 15 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> x * 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

In these cases, Python threw a ZeroDivisionError exception and a NameError exception, respectively.

You can write code that catches exceptions when they’re thrown, allowing you to handle them gracefully. For example, let’s say you have a list of names called names, and you want display the first name in the list:

>>> names = ["Alice", "Bob", "Charlie"]
>>> print(f"The first name is {names[0]}")
The first name is Alice

This code displays the value at names[0], or the first item in the names list. This works as expected if there are a few names in the list. But what if names is empty?

>>> names = []
>>> print(f"The first name is {names[0]}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

In this case, since the index 0 doesn’t exist because the list is empty, Python throws an IndexError exception.

You can catch this exception using try and except statements, like this:

try:
    print(f"The first name is {names[0]}")
except:
    print("The list of names is empty")

This code first runs a try statement, followed by a code block. It attempts to run the code in that block, and if it succeeds without hitting an exception, it moves on to the next line of code after the except block. However, if it hits an exception, then it runs the code in the except block before moving on.

Here’s what it looks like when there’s no exception:

>>> names = ["Alice", "Bob", "Charlie"]
>>> try:
...     print(f"The first name is {names[0]}")
... except:
...     print("The list of names is empty")
...
The first name is Alice

In this case, the code block after the try statement ran successfully, so the control flow moved on past the except block.

Here’s what it looks like when the exception is thrown, but the code catches it and handles it gracefully:

>>> names = []
>>> try:
...     print(f"The first name is {names[0]}")
... except:
...     print("The list of names is empty")
...
The list of names is empty

The code block after the try statement ran, but Python threw an IndexError exception when it evaluated names[0]. Instead of crashing and displaying an error, this code caught the exception and the except block ran. In this case, the except statement runs if any exception is thrown in the try block, but you can get more granular than that by using different except statements for different types of exceptions. Consider the following example:

try:
    --snip--
except ZeroDivisionError:
    # This catches ZeroDivisionError exception
    --snip--
except NameError:
    # This catches NameError exceptions
    --snip--
except IndexError:
    # This catches IndexError exceptions
    --snip--
except:
    # This catches any other exceptions that haven't been caught yet
    --snip--

By using except Exception``:, where you replace Exception with a specific exception you’re interested in catching, you can write different code to handle different types of exceptions. You’ll revisit exception handling in Chapter 10, when you learn how to work with JSON data, and in the Chapter 14 case study on neo-Nazi chat logs.

Now that you know how control flow works in Python, you’ll practice some basic Python syntax and make comparisons using if statements and Boolean logic in the next exercise.

Exercise 7-4: Practice Loops and Control Flow

In social media slang, a common form of mockery is to employ alternating caps, or switching from uppercase to lowercase and back to uppercase, when quoting people. For example, here’s the text of a viral tweet from the now-suspended Twitter account @BigWangTheoryy:

*failing classes*
Me: “Can I get some extra credit?”
Professor: “cAn i GEt SomE eXtRa creDiT?”

In this exercise, you’ll write a Python script that starts with some text and converts it into alternating caps style, using the control flow concepts you learned in the previous section.

In your text editor, create a new file called exercise-7-4.py, and start by defining the variable text, like this:

text = "One does not simply walk into Mordor"

The simplest way to write this script is to start with an empty string, called alternating_caps_text, and then loop through the characters in text, adding characters to alternating_caps_text one at a time and alternating their capitalization as you do so. Add a second line to your script defining that variable, like this:

alternating_caps_text = " "

Next, you’ll define a Boolean variable called should_be_capital. Each time you loop through a character in text, you’ll use this Boolean to keep track of whether the current character should be capital or lowercase. For this example, start with a capital letter:

should_be_capital = True

Beneath that line, add the main part of the script:

for character in text:
    if should_be_capital:
        alternating_caps_text += character.upper()
         should_be_capital = False
    else:
         alternating_caps_text += character.lower()
         should_be_capital = True

Using a for loop, this code loops through the characters in text, storing each character in the character variable. It then adds these characters to alternating_caps_text, switching between upper- and lowercase.

During each iteration of the for loop, character is another character in text, the variable containing the "One does not simply walk into Mordor" string. The first time the code loops, character is O. When the code reaches the if statement, should_be_capital evaluates to True for this character, so the code block runs. The += operator adds character.upper() (or the uppercase version of character) to alternating_caps_text. Since the code began by adding a capital letter, you want it to add a lowercase letter next, so you set should_be _capital to False. The code block ends, and the code starts its second loop.

During the second iteration, character is n and should_be_capital evaluates to False. When the code reaches the if statement, the expression evaluates to False, so the else block runs. This is similar to the other block, except that it appends the lowercase version of character, character.lower(), to alternative_caps_text and sets should_be_capital back to True. So far, alternating_caps_text is On.

During the third iteration, character is e and should_be_capital evaluates to True. When the code reaches the if statement, the expression evaluates to True, so that code block runs again, adding a capital E to alternating _caps_text and setting should_be_capital to False again. The code continues in this way for the rest of the characters in text. Note that the uppercase and lowercase versions of the space character, " ".upper() and " ".lower(), are identical. The upper() and lower() methods also don’t change punctuation characters like ,, ., !, and so on.

When this for loop is finished, all you have left to do is display the value of alternating_caps_text by adding this line to your script:

print(alternating_caps_text)

Your Python script is complete (you can also find a complete copy at https://github.com/micahflee/hacks-leaks-and-revelations/blob/main/chapter-7/exercise-7-4.py). Run your script. Here’s the output I get:

micah@trapdoor chapter-7 % python3 exercise-7-4.py
OnE DoEs nOt sImPlY WaLk iNtO MoRdOr

Now change the value of text and run the script again. For example, I changed the value to "There are very fine people on both sides":

micah@trapdoor chapter-7 % python3 exercise-7-4.py
ThErE ArE VeRy fInE PeOpLe oN BoTh sIdEs

You’ve gained a beginner’s understanding of using lists and loops and controlling the flow of execution. I’ll conclude the chapter with one more fundamental programming skill: breaking your code down into simpler chunks using functions.

Functions

The more complicated your programs get, the more important it is to break the problems you’re trying to solve down into smaller chunks and work on them individually. This allows you to focus on the bigger picture, using those smaller chunks of code as building blocks. In this section, you’ll learn how to do this using functions.

Functions, fundamental building blocks of programming, are reusable chunks of code. They take arguments—the variables that you pass into a function—as input and can return a value after they finish running. You’ve already used a few functions that come with Python, like print() and len(), but you can also define your own function and use it as many times as you want without having to rewrite that code. You’ll learn how to do that in this section.

The def Keyword

You can define a new function using the def keyword. For example, this code defines a function called test(), which prints a string to your terminal:

>>> def test():
...     print("this is a test function")
...
>>> test()
this is a test function

Function definition lines end with a colon and are followed by an indented code block that defines exactly what the function does: in this case, it displays the string this is a test function. This test() function doesn’t include any arguments, which means every time you run it, it will do the exact same thing.

Listing 7-2 defines a slightly more complicated function, sum(), that adds two numbers together.

def sum(a, b):
    return a + b

Listing 7-2: Defining an example function

This new function takes a and b as arguments and returns the sum of those two variables. For any function that takes more than one argument, like this one, you separate the arguments with commas (,).

Each variable has a scope, which describes which parts of your code can use that variable. The arguments of a function (in this case, a and b), as well as any variables defined inside the function, have a scope that can be accessed only by code in that function’s code block. In other words, you can use these a and b variables only inside the sum() function, and they won’t be defined outside of that code block.

You can think of defining a function as telling Python, “I’m making a new function with this name, and here’s what it does.” However, the function itself won’t run until you call it. Consider the following Python script:

def sum(a, b):
    return a + b

red_apples = 10
green_apples = 6
total_apples = sum(red_apples, green_apples)

print(f"There are {total_apples} apples")

First, the code defines a function called sum() to be a code block with just a return statement. This function doesn’t run yet. The code then defines the red_apples variable, setting its value to 10, and the green_apples variable, setting its value to 6.

The next line starts with total_apples =, but before Python can set the value of that variable, it needs to learn what that value should be. To do that, the code first calls the sum() function, passing in the arguments red_apples and green_apples as a and b. Now that the code is finally calling this function, return a + b runs. In this function call, a is red_apples and b is green_apples. The function returns a + b, which is 16. Now that the sum() function has returned, the code defines a variable called total_apples, setting its value to the return value of the sum() function, 16.

Finally, the code calls the print() function, passing in an f-string as an argument, which displays the total_apples variable. It will display the message There are 16 apples.

Default Arguments

Function definitions can also have default arguments, which means defining their value is optional. If you haven’t passed in any values for them when the function is called, the default value is used instead.

For example, consider this function, which, given a number and optionally a number of exclamation marks and question marks, prints a greeting using its arguments:

def greet(name, num_exclamations=3, num_questions=2):
    exclamations = "!" * num_exclamations
    questions = "?" * num_questions
    print(f"Hello {name}{exclamations}{questions}")

The argument name is a positional argument, which means when you call this function, the first argument you pass in always has to be name. However, num_exclamations and num_questions are default arguments, so passing values in for those is optional. The greet() function defines the strings exclamations and questions and sets them to a series of exclamation points and question marks. (In Python, when you multiply a string by a number, you get the original string repeated multiple times; for example, "A" * 3 evaluates to the string AAA.) The code then displays Hello, followed by the value of name, followed by the number of exclamation points and question marks passed into the function.

This function has one positional argument (name) and two default arguments (num_exclamations and num_questions). You can call it just passing in name, without passing values in for the default arguments, and they will automatically be set to 3 and 2, respectively:

>>> greet("Alice")
Hello Alice!!!??

You can also keep the default value for one of the default arguments, but choose a value for another. When you manually choose a value for a default argument, you’re using a keyword argument. For example:

>>> greet("Bob", num_exclamations=5, num_questions=5)
Hello Bob!!!!!?????
>>> greet("Charlie", num_questions=0)
Hello Charlie!!!
>>> greet("Eve", num_exclamations=0)
Hello Eve??

The first function call uses keyword arguments for both num_exclamation and num_questions; the second function call uses a keyword argument only for num_questions and uses the default argument for num_exclamations; and the third function call uses a keyword argument for num_exclamations and uses the default argument for num_questions.

Return Values

Functions become a lot more useful when they take some input, do some computation, and then return a value, known as the return value. The greet() function just described displays output, but it doesn’t return a value that I could save in a variable or pass into further functions. However, the len() function you used earlier takes input (a list or a string), does some computation (calculates the length of the list or string), and returns a value (the length).

Here’s an example of a function that takes a string s as an argument and returns the number of vowels in the string:

def count_vowels(s):
    number_of_vowels = 0
    vowels = "aeiouAEIOU"
    for c in s:
        if c in vowels:
            number_of_vowels += 1

    return number_of_vowels

This function brings together many of the concepts covered in this chapter so far: it defines the variable number_of_vowels as 0, then defines the variable vowels as a string containing lowercase and uppercase English vowels. Next, it uses a for loop to loop through each character in s, the string that’s passed into the function.

In each loop, the code uses an if statement to check whether the character is a vowel (since vowels contains both lowercase and uppercase letters, this code considers both a and A to be vowels). If the character is a vowel, the code increases the number_of_vowels variable by one. Finally, it returns number_of_vowels, which equals however many vowels it counted in s.

Here are a few examples of calling this function and passing in different strings:

>>> count_vowels("THINK")
1
>>> count_vowels("lizard")
2
>>> count_vowels("zzzzzzz")
0
>>>

When you define a variable, you can set its value to the return value of a function just by setting the variable equal to that function call:

>>> num_vowels_think = count_vowels("THINK")
>>> num_vowels_lizard = count_vowels("lizard")

This code defines the variable num_vowels_think and sets its value to the return value of count_vowels("THINK"), or the number of vowels in the string THINK. It also defines the variable num_vowels_lizard and sets its value to the return value of count_vowels("lizard").

You can then use those variables to define new variables:

>>> total_vowels = num_vowels_think + num_vowels_lizard
>>> print(total_vowels)
3

This code adds those two variables together, saving their sum in a new variable called total_vowels. It then prints the value of total_vowels to the terminal.

When a return statement runs, the function immediately ends, so return is also useful if you want to stop a function early. For example, the following is_exciting() function loops through all the characters in a string s to check whether the character is an exclamation point:

def is_exciting(s):
    for character in s:
        if character == "!":
            return True

    return False

If the function finds an exclamation point, it returns True, immediately stopping the function. If it checks each character and finds no exclamation points, it returns False. For example, if you call this function and pass in the string !@#$, the function will return True during the first iteration of the loop and immediately end—it will never even get to the second iteration. If you pass in the string hello!, it won’t return True until the last iteration of the loop, since it doesn’t find the ! until the end of the string. And if you pass in the string goodbye, it will loop through the entire string and not find an exclamation point, so it will return False.

Docstrings

In self-documenting code, documentation is defined as part of the code as docstrings rather than in a separate document. Docstrings are strings enclosed by three double quotes (""") or three single quotes ('``'') on either side, placed as the first line of code after a function definition. When you run the function, the program ignores the docstring, but Python can use it to pull up documentation about the function on request. Docstrings are optional, but they can help other people understand your code.

For example, here’s how you’d define the sum() function with a docstring:

>>> def sum(a, b):
...     """This function returns the sum of a and b"""
...     return a + b

This is exactly the same as the sum() function defined in Listing 7-2, except it includes a docstring.

If you run the help() function, passing in the name of a function (without arguments) as the argument, the Python interpreter will display documentation for that function. For example, running help(sum) gives you the following output:

Help on function sum in module __main__:

sum(a, b)
    This function returns the sum of a and b

The help() function works for any function, though it’s useful only if the programmer who wrote that function included a docstring. In this case, it tells you that it’s showing you help for the function called sum() in the __main__ module. You’ll learn more about modules in Chapter 8, but they’re essentially functions you write yourself. Try running help(print) or help(len) to view the docstrings for the print() and len() functions.

Press Q to get out of the help interface and back to the Python interpreter.

Exercise 7-5: Practice Writing Functions

In this exercise, you’ll turn the script you wrote in Exercise 7-4 into a function. You can then call this function multiple times, passing text into it so that it returns an alternating caps version of that text each time.

In your text editor, create a new file called exercise-7-5.py and create a new function called alternating_caps(), which takes in the argument text, like this:

def alternating_caps(text):
    """Returns an aLtErNaTiNg cApS version of text"""

Next, copy the code from Exercise 7-4 and paste it into this function, making sure to indent it so that it aligns with the docstring. Delete the line that defines the text value; instead, define text by passing it into the function as an argument. Also change the last line of the Exercise 7-4 code from print(alternating_caps_text) to return alternating_caps_text. This function shouldn’t display the alternating caps version of a string; it should create a variable containing this version of a string and return it.

Your complete function should look like this (you can also find a copy at https://github.com/micahflee/hacks-leaks-and-revelations/blob/main/chapter-7/exercise-7-5.py):

def alternating_caps(text):
    """Returns an aLtErNaTiNg cApS version of text"""
    alternating_caps_text = " "
    should_be_capital = True

    for character in text:
        if should_be_capital:
            alternating_caps_text += character.upper()
            should_be_capital = False
        else:
            alternating_caps_text += character.lower()
            should_be_capital = True

    return alternating_caps_text

Now that you have a function—a reusable chunk of code—you can use it as many times as you want. Call this function a few times, remembering to display its return value using the print() function, like this:

print("Hacks, Leaks, and Revelations")
print(alternating_caps("This book is amazing"))
print(alternating_caps("I'm learning so much"))

You can change the text that you pass in to the alternating_caps() function calls to whatever you want.

Here’s what it looks like when I run this script:

micah@trapdoor chapter-7 % python3 exercise-7-5.py
Hacks, Leaks, and Revelations
ThIs bOoK Is aMaZiNg
I'M LeArNiNg sO MuCh

While the output of this script is displayed in a mocking tone, I hope that the sentiment is true for you!

Summary

This chapter has covered several basic Python programming concepts you’ll rely upon in future investigations. You learned to write simple Python scripts that incorporate the major features of the language, including variables, if statements, for loops, and functions. You’re ready to continue your Python programming journey in the next chapter, this time writing code to directly investigate datasets.

Buy Now Contents ⬅️ Chapter 6 Chapter 8 ➡️ Back Home

Information Wants to be Free

Everyone should have access to the information in this book. To remove barriers to access, I've made Hacks, Leaks, and Revelations available for free online under a Creative Commons license. If you can afford it, show your support and buy a copy today!