Hacks, Leaks, and Revelations

~51 min read

Chapter 3: The Command Line Interface

Back in the days of the command-line interface, users were all Morlocks who had to convert their thoughts into alphanumeric symbols and type them in, a grindingly tedious process that stripped away all ambiguity, laid bare all hidden assumptions, and cruelly punished laziness and imprecision.

—Neal Stephenson, In the Beginning … Was the Command Line

If you’re like most people, you interface with your computer primarily via its graphical desktop environment: you move the pointer with your mouse or trackpad and click icons to run programs and open documents. Programs open in windows that you can resize, maximize, minimize, and drag around the screen. You can run various programs at once in separate windows and switch between them. However, there’s an alternative, incredibly powerful interface you can use to communicate with your computer and give it instructions: the command line interface (CLI).

Command line interfaces are text-based, rather than graphical, interfaces to interact with your computer. Instead of clicking on icons, you enter commands to run programs in a terminal emulator (normally referred to just as a terminal). After running a command, you’ll typically see text-based output displayed in the terminal.

In this chapter, you’ll learn the basic command line skills you need to follow along with the rest of this book. Whether you’re using Windows, macOS, or Linux, you’ll learn how to install and uninstall software via the command line, how filepaths work, how to navigate around the folders on your computer, and how to use text editors. You’ll also write your first shell script, a file containing a series of commands.

Introducing the Command Line

To prepare you to start working on the command line, this section explains some fundamentals: what shells are, how users and paths work in different operating systems, and the concept of privilege escalation.

The Shell

The shell is the program that lets you run text-based commands, while the terminal is the graphical program that runs your shell. When you open a terminal and see a blinking text cursor waiting for commands, you’re using a shell. When hackers try to break into a computer, their initial goal is to “pop a shell,” or access the text-based interface that allows them to run whatever commands they want.

All operating systems, even mobile ones like Android and iOS, have shells. This book focuses on Unix shells, the kind that come with macOS and Linux (but Windows users can also use them). Most versions of Linux use a shell called bash, and macOS uses one called zsh. These shells are very similar, and for the purposes of this book you can think of them as interchangeable.

Windows, on the other hand, comes with two shells: an older one called Command Prompt (or cmd.exe) and a newer one called PowerShell. The syntax—rules that define what different commands mean—used by Windows shells is very different from that used by Unix shells. If you’re a Windows user, you’ll primarily work in a Unix shell for the examples in this book. Setting up your computer to run Linux directly in Windows will be this chapter’s first exercise.

To make your shell do something, such as run a program, you carefully enter the desired command and then press ENTER (or RETURN on Mac keyboards). To quit the shell, enter exit and press ENTER. Shells are finicky: you need to enter commands using the correct capitalization, punctuation, and spacing, or they won’t work. Typos usually result in nothing more serious than error messages, however, and it’s easy to go back and fix a mistake in a command. I’ll explain how to do so in the Editing Commands section on page 68.

Users and Paths

Although operating systems like Windows, macOS, and Linux are different in some ways, they all share basic building blocks, including users and paths.

All operating systems have users, separate accounts that different people use to log in to the same computer. Users generally have home folders, also known as home directories, where their files live. Figure 3-1 shows my terminal in Ubuntu, a popular Linux distribution.

Figure 3-1: My Ubuntu terminal
Figure 3-1: My Ubuntu terminal

My username is micah and the name of my Ubuntu computer is rogue. Your terminal will look different depending on your operating system, username, and computer name.

All operating systems also have filesystems, the collection of files and folders available on the computer (you got a brief introduction to filesystems in Chapter 1 while encrypting your USB disk). In a filesystem, each file and folder has a path, which you can think of like the location, or address, of that file. For example, if your username is alice, the path of your home folder in different operating systems would look as follows:

Windows filesystems operate differently from macOS or Linux filesystems in a few key ways. First, in Windows, disks are labeled with letters. The main disk, where Windows itself is installed, is the C: drive. Other disks, like USB disks, are assigned other letters. In Windows, folders in a path are separated with a backslash (\), while other operating systems use forward slashes (/). In Linux, paths are case sensitive, but not in Windows and macOS (by default). For example, in Linux you can store one file called Document.pdf and another called document.pdf in the same folder. If you try to do the same in Windows, saving the second file overwrites the first.

Let’s look at some example paths. If your username is alice and you download a file called Meeting Notes.docx into the Downloads folder, here’s what that path would look like:

When you plug in a USB disk, it’s mounted to different paths for different operating systems. If your disk is labeled datasets, the path representing the location of that disk might look as follows:

It’s important to understand how to read paths, since you’ll need to include the location of your dataset or files it contains in the commands you run.

User Privileges

Most users have limited privileges in an operating system. However, the root user in Linux and macOS and the administrator user in Windows have absolute power. While alice may not be able to save files into bob‘s home folder, for example, the root user has permissions to save files anywhere on the computer. When a Mac asks you to enter your user password to change system preferences or install software, or when a Windows machine asks if you want to allow a program to make changes to your computer, the operating system is asking for your consent before switching from your unprivileged user account to the root or administrator user account.

Most of the time when you’re working in a terminal, you run commands as an unprivileged user. To run a command that requires root (or administrative) privileges in Linux and macOS, such as to install a new program, just put sudo in front of it and press ENTER, and you’ll be prompted to enter the password for your regular user account.

As an example, the whoami command tells you which user just ran a command. On my computer, if I enter whoami without sudo, the output is micah. However, if I enter sudo whoami, which requires me to type my password, the output is root:

micah@rogue:~$ whoami
micah
micah@rogue:~$ sudo whoami
[sudo] password for micah:
root

If you recently ran sudo, you can run it again for a few minutes without having to re-enter your password.

WARNING Be very careful when running commands as root, since running the wrong commands as the root user can accidentally delete all of your data or break your operating system. Before using sudo, make sure you have a clear understanding of what you’re about to do.

You can use sudo to gain root access only if your current user has administrator access. If you’re the only user on your computer, you’re probably an administrator. To find out, try using sudo and see whether you get a “permission denied” error.

Figure 3-2 shows a comic by Randall Munroe from his XKCD website that succinctly demonstrates the power of sudo.

Figure 3-2: Demanding a sandwich with sudo
Figure 3-2: Demanding a sandwich with sudo

Before learning more command line code, Windows users must install Ubuntu (see Exercise 3-1). Mac or Linux users can skip to the Basic Command Line Usage section on page 62.

Exercise 3-1: Install Ubuntu in Windows

To work with Ubuntu on a Windows machine, you could install both Windows and Linux or use a virtual machine within Windows, as mentioned in Chapter 1. However, for this book’s purposes, it’s simplest to use the Windows Subsystem for Linux (WSL), a Microsoft technology that lets you run Linux programs directly in Windows. Opening an Ubuntu window in WSL will, in turn, open a bash shell and let you install and run Ubuntu software. (Technically, WSL does use a VM, but it’s fast, managed by Windows, and unobtrusive, running entirely behind the scenes.)

To install WSL, open a PowerShell window as an administrator: click Start, search for powershell, right-click Windows PowerShell, choose Run as Administrator, and click Yes. Figure 3-3 shows this process, which may look slightly different depending on your version of Windows.

Figure 3-3: Running PowerShell as an administrator in Windows
Figure 3-3: Running PowerShell as an administrator in Windows

In your administrator PowerShell window, enter the following command and press ENTER:

wsl --install -d Ubuntu

This installs the Windows Subsystem for Linux, then downloads and installs Ubuntu Linux on your computer.

Your screen should now look something like this:

PS C :\Windows\system32> wsl --install -d Ubuntu
Installing: Windows Subsystem for Linux
Windows Subsystem for Linux has been installed.
Downloading: WSL Kernel
Installing: WSL Kernel
WSL Kernel has been installed.
Downloading: GUI App Support
Installing: GUI App Support
GUI App Support has been installed.
Downloading: Ubuntu
The requested operation is succession. Changes will not be effective until the
system is rebooted.
PS C:\Windows\system32>

The final line of this output tells you to reboot your computer. Enter exit and press ENTER (or just close the window) to quit PowerShell, then reboot. After you log in to Windows again, you should see an Ubuntu window informing you that the installation may take a few more minutes to complete. Then the window should present you with a prompt asking you to create a new user:

Please create a default UNIX user account. The username does not need to match
your Windows username.
For more information visit: https://aka.ms/wslusers
Enter new UNIX username:

Ubuntu needs to keep track of its own users rather than the existing users on your Windows computer.

With the Ubuntu terminal window in focus, enter a username and press ENTER. The terminal should then prompt you to create a password:

New password:

Either use the same password you use to log in to your Windows account or create a new one and save it in your password manager. Enter your password and press ENTER. While you’re typing, nothing will appear in the Ubuntu terminal.

The terminal should now prompt you to re-enter your new password; do so and press ENTER, which should drop you into an Ubuntu shell with a prompt and a blinking cursor. My prompt says micah@cloak:~$ because my username is micah and the name of my Windows computer is cloak:

New password:
Retype new password:
passwd: password updated successfully
Installation successful!
--snip--
micah@cloak:~$

You can now open Ubuntu in your Windows computer. From this point on, when instructed to open a terminal or run some command line code, use an Ubuntu terminal window unless I specify otherwise.

From within your Ubuntu shell, you can access your Windows disks in the /mnt folder. For example, you can access the C: drive in /mnt/c and the D: drive in /mnt/d. Suppose I download a document using my web browser and want to access it from Ubuntu. The path to my Downloads folder in Windows is /mnt/c/Users/micah/Downloads, so the document would be in that folder. If I want to access the BlueLeaks data that I downloaded to my USB disk from Ubuntu, then assuming that D: is the USB disk’s drive, the path would be /mnt/d/BlueLeaks.

For more details on using Windows and WSL, including information on common problems related to using USB disks in WSL, as well as disk performance issues and various ways to deal with them, check out Appendix A. Wait until you’ve worked through at least Chapter 4 to start implementing these solutions, since the instructions involve more advanced command line concepts introduced in that chapter.

Basic Command Line Usage

In this section, you’ll learn to use the command line to explore files and folders on your computer. This is a prerequisite to working with datasets, which are just folders full of files and other folders. You’ll learn how to open a terminal, list files in any folder, distinguish between relative and absolute paths, switch to different folders in your shell, and look up documentation on commands from within your terminal.

NOTE When learning command line skills, you can always look things up if you run into problems—I still do this every day. You’re likely not the first person to encounter any given command line issue, so with a few well-worded internet searches, you can find someone else’s solution.

Opening a Terminal

To get started, skip to the subsection for your operating system to learn how to open a terminal. Throughout this chapter, keep a terminal open while you’re reading to test all the commands.

The Windows Terminal

Open the Ubuntu app by clicking Start in the bottom-left corner of the screen, searching for ubuntu, and clicking Ubuntu.

You’ll use Ubuntu most often for this book, but you may need to open the native Windows terminals occasionally as well. You can likewise open PowerShell and Command Prompt by clicking Start and searching for them. Check out the Microsoft program Windows Terminal (https://aka.ms/terminal), which lets you open different terminals in different tabs, choosing between PowerShell, Command Prompt, Ubuntu, and others. If you choose to install it, you can open it the same way.

Pin the Ubuntu app or Windows Terminal app to your taskbar so you can quickly open it in the future: right-click its icon and select Pin to Taskbar.

The macOS Terminal

Open the Terminal app by opening Finder, going to the Applications folder, double-clicking the Utilities folder, and double-clicking Terminal. Figure 3-4 shows my macOS terminal running zsh, the default macOS shell. My username is micah, and the name of my Mac is trapdoor.

Figure 3-4: My macOS terminal
Figure 3-4: My macOS terminal

Pin the Terminal app to your dock so you can quickly open it in the future. To do so, after you open Terminal, press CTRL and click the Terminal icon on your dock, then choose OptionsKeep in Dock.

The Linux Terminal

In most Linux distributions, open the Terminal app by pressing the Windows key, typing terminal, and pressing ENTER. If you’re running Ubuntu (or any other Linux distribution that uses the GNOME graphical environment), pin the Terminal app to your dock so you can quickly open it in the future. To do so, right-click the Terminal icon and select Add to Favorites.

Clearing Your Screen and Exiting the Shell

As you practice using the terminal in the following sections, you’ll sometimes want to start fresh, without having to see all the previous commands you ran or their output or error messages. Run this simple command to declutter your terminal:

clear

This clears everything off the screen, leaving you with nothing but a blank command prompt. Make sure to do this only if you no longer need to see the output of your previous commands. (In the Windows Command Prompt and PowerShell, use cls instead of clear.)

When you’re done using the CLI, exit your shell by running this command:

exit

You can also close the terminal window to exit. If you’re running a program when you close the terminal, that program will quit as well.

Exploring Files and Directories

When you open a terminal, your shell starts out in your user’s home folder, represented as a tilde (~). The folder you’re currently in is your current working directory, or just working directory. If you ever forget what directory you’re in, run the pwd command (short for “print working directory”) to find out.

Running the ls command in your terminal lists all of the files in your working directory. You can use this command to check the contents of folders you’re working with. If there are no files or only hidden files, ls won’t list anything. To check for hidden files, modify the ls command using -a (short for --all):

ls -a

When you add anything to the end of a command, like -a, you’re using a command line argument. Think of arguments as settings that change how the program you’re running will act—in this case, by showing hidden files instead of hiding them.

By default, the ls command displays files in a format intended to take up as few lines in your terminal as possible. However, you may want to display one file per line for easier reading and to get more information about each file, such as its size, when it was last modified, permissions, and whether it’s a folder. Using the -l argument (short for --format=long) formats the output as a list.

You can use both -a and -l at the same time like so:

ls -al

Running this command on my Mac gives me the following output:

total 8
drwxr-x---+ 13 micah  staff   416 Nov 25 11:34 .
drwxr-xr-x   6 root   admin   192 Nov  9 15:51 ..
-rw-------   1 micah  staff     3 Nov  6 15:30 .CFUserTextEncoding
-rw-------   1 micah  staff  2773 Nov 25 11:33 .zsh_history
drwx------   5 micah  staff   160 Nov  6 15:31 .zsh_sessions
drwx------+  3 micah  staff    96 Nov  6 15:30 Desktop
drwx------+  3 micah  staff    96 Nov  6 15:30 Documents
drwx------+  3 micah  staff    96 Nov  6 15:30 Downloads
drwx------+ 31 micah  staff   992 Nov  6 15:31 Library
drwx------   3 micah  staff    96 Nov  6 15:30 Movies
drwx------+  3 micah  staff    96 Nov  6 15:30 Music
drwx------+  3 micah  staff    96 Nov  6 15:30 Pictures
drwxr-xr-x+  4 micah  staff   128 Nov  6 15:30 Public

The first column of this output describes the type of file—whether it’s a directory (another name for a folder) or an ordinary file—as well as the file’s permissions. Directories start with d, and ordinary files start with a hyphen (-). The second column represents the number of links in the file, which isn’t relevant for the purposes of this book.

The third and fourth columns represent the user and the group that owns the file. In addition to users, operating systems have groups of users that can have their own permissions. For example, in Linux, all users allowed to use sudo are in the sudo group. If you create or download a file, its user and group are normally your username. The fifth column is the file size in bytes. For example, in the file called .zsh_history, my output is 2,773 bytes.

The next three columns of the output represent the time and date when the file was last modified, and the final column shows the filename.

To see a listing of files in a folder other than the working directory, add the path to that folder to the end of the ls command. For example, this is how I’d create a listing of files in my code/hacks-leaks-and-revelations folder, which contains the files released with this book:

ls -la code/hacks-leaks-and-revelations

I’d get the following output:

total 96
drwxr-xr-x  22 micah  staff    704 Jul 27 09:28 .
drwxr-xr-x  12 micah  staff    384 Jul 27 09:28 ..
drwxr-xr-x  12 micah  staff    384 Jul 27 09:28 .git
drwxr-xr-x   3 micah  staff     96 Jul 27 09:28 .github
-rw-r--r--   1 micah  staff     30 Jul 27 09:28 .gitignore
-rw-r--r--   1 micah  staff  35149 Jul 27 09:28 LICENSE
-rw-r--r--   1 micah  staff   6997 Jul 27 09:28 README.md
drwxr-xr-x   6 micah  staff    192 Jul 27 09:28 appendix-b
drwxr-xr-x   5 micah  staff    160 Jul 27 09:28 chapter-1
drwxr-xr-x   6 micah  staff    192 Jul 27 09:28 chapter-10
drwxr-xr-x  11 micah  staff    352 Jul 27 09:28 chapter-11
drwxr-xr-x  13 micah  staff    416 Jul 27 09:28 chapter-12
drwxr-xr-x   8 micah  staff    256 Jul 27 09:28 chapter-13
drwxr-xr-x   3 micah  staff     96 Jul 27 09:28 chapter-14
drwxr-xr-x   5 micah  staff    160 Jul 27 09:28 chapter-2
drwxr-xr-x  10 micah  staff    320 Jul 27 09:28 chapter-3
drwxr-xr-x  13 micah  staff    416 Jul 27 09:28 chapter-4
drwxr-xr-x  13 micah  staff    416 Jul 27 09:28 chapter-5
drwxr-xr-x  10 micah  staff    320 Jul 27 09:28 chapter-6
drwxr-xr-x  12 micah  staff    384 Jul 27 09:28 chapter-7
drwxr-xr-x  20 micah  staff    640 Jul 27 09:28 chapter-8
drwxr-xr-x  14 micah  staff    448 Jul 27 09:28 chapter-9

You’ll download your own copy of these files in Exercise 3-7.

Navigating Relative and Absolute Paths

Programs often require you to provide paths to files or folders, usually when you run a program that works with specific files on your computer. The path that I passed into ls in the previous section, code/hacks-leaks-and-revelations, is a relative path, meaning it’s relative to the current working directory, my home folder. Relative paths can change. For example, if I change my working directory from my home folder (/Users/micah) to just /Users, the relative path to that folder changes to micah/code/hacks-leaks-and-revelations.

The absolute path to the code/hacks-leaks-and-revelations folder is /Users/micah/code/hacks-leaks-and-revelations, which always provides the location of that folder regardless of my working directory. Absolute paths start with a forward slash (/), which is also known as the root path.

You can use two keywords to access relative paths to specific folders: . (dot), which represents a relative path to the current folder, and .. (dot dot), which represents a relative path to the parent folder (the folder that contains the current folder).

Changing Directories

The cd command (which stands for “change directory”) allows you to change to a different folder. To change your working directory to the folder, run:

cd path

For path, substitute the path to the folder to which you’d like to move. You can use either a relative or an absolute path.

Suppose I’m using macOS and have downloaded BlueLeaks to a datasets USB disk plugged into my machine. After opening a terminal, I can run the following command to change my working directory to the BlueLeaks folder, using the absolute path to the folder:

cd /Volumes/datasets/BlueLeaks

Alternatively, I can use a relative path to the folder, running the following command from my home folder:

cd ../../Volumes/datasets/BlueLeaks

Why does the relative path start with ../.. in this example? When I open the terminal, the working directory is my home folder, which in macOS is /Users/micah. The relative path .. would be its parent folder, /Users; the relative path ../.. would be /; the relative path ../../Volumes would be /Volumes; and so on.

As noted earlier, the tilde symbol (~) represents your home folder. No matter what your working directory is, you can run the following to go back to your home folder:

cd ~

Use the following syntax to move to a folder inside your home folder:

cd ~/folder_name

For example, the following command would move you to your Documents folder:

cd ~/Documents

If you run ls again after a cd command, the output should show you the files in the folder to which you just moved.

Using the help Argument

Most commands let you use the argument -h, or --help, which displays detailed instructions explaining what the command does and how to use it. For example, try running the following:

unzip --help

This command should show instructions on all of the different arguments that are available to you when using the unzip command, which is used to extract compressed ZIP files.

Here’s the output I got when I ran that command on my Mac:

UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
--snip--
  -p  extract files to pipe, no messages     -l  list files (short format)
  -f  freshen existing files, create none    -t  test compressed archive data
  -u  update files, create if necessary      -z  display archive comment only
  -v  list verbosely/show version info       -T  timestamp archive to latest
  -x  exclude files that follow (in xlist)   -d  extract files into exdir
--snip--

This output briefly describes what each argument for the unzip command does. For example, if you use the -l argument, the command shows a list of all of the files and folders inside the ZIP file without actually unzipping it.

Accessing Man Pages

Many commands also have manuals, otherwise known as man pages, which give more detail about how to use those commands. Run the following to access a command’s man page:

man command_name

For example, to read the manual for the unzip command, run:

man unzip

The output should display a longer explanation of how to use the unzip command and its arguments.

Use the up and down arrows and the page up and page down keys to scroll through the man pages, or press / and enter a term to search. For example, to learn more details about how the unzip command’s -l argument works, press / and enter -l, then press ENTER. This should bring you to the first time -l appears on the man page. Press N to move on to the next occurrence of your search term.

When you’re finished, press Q to quit the man page.

Tips for Navigating the Terminal

This section introduces ways to make working on the command line more convenient and efficient, along with tips for avoiding and fixing errors. It also shows how to handle problematic filenames, such as those with spaces, quotes, or other special characters. A basic understanding of these concepts will save you a lot of time in the future.

Entering Commands with Tab Completion

Shells have a feature called tab completion that saves time and prevents errors: enter the first few letters of a command or a path, then press TAB. Your shell will fill in the rest if possible.

For example, both macOS and Ubuntu come with a program called hexdump. In a terminal, enter hexd and press TAB. This should automatically fill in the rest of the hexdump command. Tab completion also works for paths. For example, Unix-like operating systems use the /tmp folder to store temporary files. Enter ls /tm and press TAB. Your shell should add the p to finish typing out the full command.

If you enter only the first couple letters of a command or a path, there may be more than one way for your shell to complete your line of code. Assuming that you have both Downloads and Documents folders in your home folder, type ls ~/Do and press TAB. You’ll hear a quiet beep, meaning that the shell doesn’t know how to proceed. Press TAB one more time, and it should display the options, like this:

Documents/  Downloads/

If you enter a c so that your command so far is ls ~/Doc and press TAB, the command should complete to ls ~/Documents/. If you enter a w so that your command so far is ls ~/Dow and press TAB, it should complete to ls ~/Downloads/.

If you’ve already typed out the path of a folder, you can also press TAB to list files in that folder, or to automatically complete the filename if there’s only one file in the folder. For example, say I have my datasets USB disk, on which I’ve downloaded BlueLeaks, plugged into my Ubuntu computer. If I want to change to my BlueLeaks folder, I can enter the following and press TAB:

cd /Vo

This completes the command as follows:

cd /Volumes/

I press TAB again, and my computer beeps and lists the folders in /Volumes, which in my case are Macintosh HD and datasets. I enter d, so my command is cd /Volumes/d, and press TAB, and the shell completes the command as follows:

cd /Volumes/datasets/

I press TAB again. My computer beeps again and lists all of the files and folders in my datasets USB disk. I enter B (the first letter of BlueLeaks) and press TAB, which gives me:

cd /Volumes/datasets/BlueLeaks/

Finally, I press ENTER to change to that folder.

Editing Commands

You can also edit commands. When you start typing a command, you can press the left and right arrow keys to move the cursor, allowing you to edit the command before running it. You can also press HOME and END—or, if you’re using a Mac keyboard, CONTROL-A and CONTROL-E—to go to the beginning and end of a line, respectively. You can also cycle between commands you’ve already run using the up and down arrows. If you just ran a command and want to run it again, or to modify it and then run it, press the up arrow to return to it. Once you find the command you’re looking for, use the arrow keys to move your cursor to the correct position, edit it, and then press ENTER to run it again.

For example, I frequently get “permission denied” errors when I accidentally run commands as my unprivileged user when I should have run them as root. When this happens, I press the up arrow, then CONTROL-A to go to the beginning of the line, add sudo, and press ENTER to successfully run the command.

Dealing with Spaces in Filenames

Sometimes filenames contain multiple words separated by spaces. If you don’t explicitly tell your shell that a space is part of a filename, the shell assumes that the space is there to separate parts of your command. For example, this command lists the files in the Documents folder:

ls -lh ~/Documents

Under the hood, your shell takes this string of characters and splits it into a list of parts that are separated by spaces: ls, -lh, and ~/Documents. The first part, ls, is the command to run. The rest of the parts are the command’s arguments. The -lh argument tells the program to display the output as a list and make the file sizes human-readable. That is, it will convert the file sizes into units that are easier to read, like kilobytes, megabytes, and gigabytes, rather than a large number of bytes. The ~/Documents argument means you want to list the files in that folder.

Suppose you want to use the same command to list the files in a folder with a space in its name, like \~/My Documents. You’ll run into problems if you enter this command:

ls -lh ~/My Documents

When your shell tries to separate this command into parts, it will come up with ls, -lh, ~/My, and Documents; that is, it sees \~/My Documents as two separate arguments, ~/My and Documents. It will try to list the files in the folder \~/My (which doesn’t exist), then also list files in the folder Documents, which isn’t what you intended.

To solve this problem, put the name of the folder in quotes:

ls -lh "~/My Documents"

The shell sees anything within quotes as a single entity. In this case, ls is the command and its arguments are -lh followed by ~/My Documents.

Alternatively, you can use a backslash (\) to escape the space:

ls -lh ~/My\ Documents

In the Unix family of operating systems, the backslash is called the escape character. When the shell parses that string of characters, it treats an escaped space (\ followed by a space) as a part of the name. Again, the shell reads ls as the command and -lh and ~/My Documents as its arguments.

Using Single Quotes Around Double Quotes

You can use the escape character to escape more than spaces. Suppose you want to delete a filename that has a space and quotes in it, like Say “Hello”.txt. You can use the rm command to delete files, but the following syntax won’t work:

rm Say "Hello".txt

Your shell will split this command into the words rm, Say, and Hello.txt. You might think you could solve this by simply adding more quotes

rm "Say "Hello".txt"

but that won’t work either, since you’re quoting something that contains quotes already. Instead, surround the argument with single quotes ('), like this:

rm 'Say "Hello".txt'

Your shell will read this command as rm and the argument as Say "Hello" .txt, exactly as you intended.

Avoid putting spaces, quotes, or other troublesome characters in filenames whenever possible. Sometimes you can’t avoid them, especially when working with datasets full of someone else’s files. Tab completion helps in those cases, allowing you to enter just enough of the filename so that when you press TAB, your shell will fill out the rest for you. To delete a file in your working directory called Say “Hello”.txt, for example, entering rm Sa, then pressing TAB, completes the command to rm Say\ \"Hello\".txt with the correct escape characters included, so you don’t have to provide the proper syntax yourself.

Installing and Uninstalling Software with Package Managers

Of the many powerful command line tools that let you quickly work with datasets, only some come preinstalled; you’ll need to install the rest yourself. While you’re likely used to installing software by downloading an installer from a website and then running it, the command line uses package managers, programs that let you install, uninstall, and update software. Nearly all CLI software is free and open source, so Linux operating systems come with large collections of software that you can easily install or uninstall with a single command. Package management projects are also available for macOS (Homebrew) and Windows (Chocolately).

If you’re using Linux, you likely use a package manager called apt. This is what the popular Linux operating systems like Ubuntu and Debian use, as well as all of the Linux distributions based on them (including Ubuntu in WSL). If your Linux distribution doesn’t use apt, you’ll need to look up the package manager documentation for your operating system.

PACKAGE MANAGEMENT FOR NON-UBUNTU LINUX USERS

You should be able to follow along with this book no matter what version of Linux you’re using. Several other Debian-based Linux distributions also rely on apt, like Linux Mint, Pop! OS, and others. If you’re using one of these, the apt commands in this book should work, though the names of software packages may be slightly different. If you encounter that issue, run apt search [software_name] to find the name of the package that you should be installing for your operating system.

If you’re using a version of Linux that doesn’t use apt as its package manager, you’ll need to slightly modify this book’s commands to use your Linux distribution’s package manager. For example, if you’re running Fedora, Red Hat, CentOS, or other similar Linux distributions, you’ll use a package manager called DNF (for older versions of these distributions, the package manager is called yum). See Fedora’s documentation at https://docs.fedoraproject.org/en-US/quick-docs/dnf/ for more details on using DNF. Arch Linux uses a package manager called pacman (https://wiki.archlinux.org/title/Pacman).

If you’re using a Linux distribution not mentioned here, read your operating system’s package management documentation and learn how to search for, install, uninstall, and update software from the terminal. When you come across an apt command in this book, use your operating system’s package manager software instead. Other Linux commands covered in this book should be the same regardless of your distribution.

If you’re using a Mac, start with Exercise 3-2 to learn how to use Homebrew. If you’re using Linux or Windows with WSL, skip to Exercise 3-3 to learn how to use apt. This book mostly uses Unix shells and doesn’t cover Chocolately, which installs Windows software instead of Linux software.

Exercise 3-2: Manage Packages with Homebrew on macOS

To install Homebrew, macOS’s package manager, open a browser and go to Homebrew’s website at https://brew.sh, where you should find the command to install the tool. Copy and paste the installation command into your terminal and press RETURN:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

This command uses a program called cURL, which I’ll discuss later in this chapter, to download a shell script from GitHub. It then runs that script using the bash shell. The script itself uses sudo, meaning that if you enter your password, it will run commands as root on your computer.

This is what the output looks like on my Mac:

==> Checking for 'sudo' access (which may request your password)...
Password:

Enter the password you use to log in to your Mac and press RETURN to change your status from unprivileged user to root. No characters will appear in the terminal while you’re typing.

After you enter your password, Homebrew should show you a list of paths for files that it will install. The output should end with the following message:

Press RETURN to continue or any other key to abort:

Press RETURN and wait for Homebrew to finish installing. If any problems arise, Homebrew will fail and show you an error message.

WARNING Copying and pasting commands into your terminal can be dangerous: if a hacker tricks you into running the wrong shell script, they could hack your computer. Copy and paste commands in your terminal only from sources you trust.

Now that you’ve installed Homebrew, you have access to the brew command, which you can use to install more software. To check whether Homebrew has a certain program available to install, run:

brew search program_name

For example, Neofetch is a CLI program that displays information about your computer. To see if it’s available in Homebrew, run:

brew search neofetch

The output should list the packages that have neofetch in their names or descriptions; in this case, Neofetch should be listed. Similarly combine brew search with other program names to check whether they’re available to install.

When you find a package you want to install, run:

brew install program_name

For example, to install Neofetch, run:

brew install neofetch

This should download and install the neofetch tool. Try running it:

neofetch

Figure 3-5 shows Neofetch running on my Mac. The figure is black-and-white in print, but if you run the command on your computer, you should see a rainbow of colors.

Figure 3-5: Running Neofetch on my Mac
Figure 3-5: Running Neofetch on my Mac

Uninstall programs with the brew uninstall command. For example, run the following to uninstall Neofetch:

brew uninstall neofetch

To update all programs you’ve installed with Homebrew to their latest versions, run:

brew upgrade --greedy

Run brew help to see some examples of how to use this command.

Now that you have a package manager installed, you’ll practice using the command line in Exercise 3-4.

Exercise 3-3: Manage Packages with apt on Windows or Linux

You must run most apt commands as root. Before installing or updating software, make sure your operating system has an up-to-date list of available software by opening a terminal and running the following:

sudo apt update

When I run that command on my Linux computer, I get this output:

Hit:1 http://us.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:3 http://us.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:4 http://us.archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
178 packages can be upgraded. Run 'apt list --upgradable' to see them.

This tells me I have 178 packages that can be upgraded. Run the following to upgrade your own software:

sudo apt upgrade

Here’s the output when I run that command:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
--snip--
178 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
64 standard security updates
Need to get 365 MB of archives.
After this operation, 2,455 kB of additional disk space will be used.
Do you want to continue? [Y/n]

Type Y and press ENTER to install the updates.

You’re now ready to install new software. To check whether the package manager has a certain program available to install, run:

apt search program_name

You don’t need to use sudo with this search command because it’s not installing or uninstalling anything. However, once you find a package you want to install, run:

sudo apt install program_name

For example, Neofetch is a CLI program that displays information about your computer. To see if Neofetch is available in your package manager, run:

apt search neofetch

The output should show a list of packages that have neofetch in their names or descriptions; in this case, Neofetch should be listed.

To install the Neofetch tool, run:

sudo apt install neofetch

You should see a list of packages that you must install in order to use Neofetch. Press Y and then ENTER to download and install them all.

Once installation is complete, try running Neofetch:

neofetch

Figure 3-6 shows Neofetch running on my Ubuntu computer. The figure is black-and-white in print, but if you run the command on your computer, the output should appear in several different colors.

Figure 3-6: Running Neofetch on my Ubuntu computer
Figure 3-6: Running Neofetch on my Ubuntu computer

Uninstall packages with the sudo apt remove command. For example, to uninstall Neofetch, run:

sudo apt remove neofetch

Now that you have a package manager installed, you’ll practice using the command line in Exercise 3-4.

Exercise 3-4: Practice Using the Command Line with cURL

In this exercise, you’ll learn how to determine whether you have a command installed, download web pages, save the output from a file using redirection, and view the contents of files directly from the terminal.

The cURL program is a common way to load web pages from the command line. To load all of the HTML code for the website https://www.torproject.org, for example, run the following command:

curl https://www.torproject.org

To see if cURL is installed, use the which command:

which curl

If cURL is installed, the output should show you the path where the program is installed on your computer (something like /usr/bin/curl). If not, the output should return you to the shell prompt.

If you don’t have cURL, use your package manager to install it. Enter sudo apt install curl for Windows with WSL and Linux machines or brew install curl for Macs. Then run which curl again, and you should see the path to the cURL program.

Download a Web Page with cURL

When you load a web page, your web browser renders a human-readable version of its content based on the page’s HTML, CSS, and JavaScript code. To see the raw HTML content from the web page hosted at https://example.com, run the following command in your terminal:

curl example.com

If you load that site in a browser and then view the HTML source by pressing CTRL-U in Windows or Linux, or ⌘-U in macOS, you should see the same HTML code that this command displays in your terminal.

Some websites are designed to show you text that’s easy to read in a terminal when you access them through cURL, as opposed to showing you HTML. For example, https://ifconfig.co will tell you your IP address, geolocate it, and tell you what country and city it thinks you’re in. Try running the following command:

curl https://ifconfig.co

This should display your IP address. Next, run the following:

curl https://ifconfig.co/country

When I run this command, my output is United States. You can try connecting to a VPN server in another country and then run it again; it should detect your web traffic as coming from that other country.

Save a Web Page to a File

Run the following commands to load https://example.com and save it to a file:

cd /tmp
curl https://example.com > example.html

The first line of code changes your working directory to /tmp, a temporary folder where files you store get deleted automatically. The second line loads https://example.com, but instead of displaying the site’s contents for you in the terminal, it redirects them into the file example.html and doesn’t display anything in the terminal.

The > character takes the output of the command to its left and saves it into the filename to its right. This is called redirection. Since you changed to the /tmp folder before running the curl command and the filename you provided was a relative path, it saved to the file /tmp/example.html.

Run a directory listing to make sure you’ve stored the file correctly:

ls -lh

This should list all the files in /tmp, which should include a file called example.html. Try displaying the contents of that file in your terminal using the cat command:

cat /tmp/example.html

The terminal isn’t always a good place to view a file’s contents. For example, long lines will wrap, which may make them difficult to comprehend. In the following section, you’ll learn more about the different types of files and how to work with them more easily in the command line.

Text Files vs. Binary Files

There are many different types of files, but they all fit into one of two categories: text files and binary files.

Text files are made up of letters, numbers, punctuation, and a few special characters. Source code, like Python scripts (discussed in Chapters 7 and 8); shell scripts; and HTML, CSS, and JavaScript files are all examples of text files. Spreadsheets in CSV (comma-separated value) format and JSON files (discussed in Chapters 9 and 11, respectively) are also text files. These files are relatively simple to work with. You can use the cat command to display text files, as you did in the previous exercise.

Binary files are made up of data that’s more than just letters, numbers, and punctuation. They’re designed for computer programs, not humans, to understand. If you try to view the contents of a binary file using the cat command, you’ll just see gibberish. Instead, you must use specialized programs that understand those binary formats. Office documents like PDFs, Word documents, and Excel spreadsheets are binary files, as are images (like PNG and JPEG files), videos (like MP4 and MOV files), and compressed data like ZIP files.

NOTE The term binary file is technically a misnomer, because all files are represented by computers as binary—strings of ones and zeros.

Text files aren’t always easy to understand (if you’re not familiar with HTML, viewing it might look like gibberish), but it’s at least possible to display them in a terminal. This isn’t true for binary files. For example, if you try using cat to display the contents of binary files like PNG images in your terminal, the output will look something like this:

?PNG

IHDR?L??
?D?ؐ???? Pd@?????Y????????u???+?2???ע???@?!N???? ^?K??Eׂ?(??U?N????E??ł??.?ʛ?u_??|?????g?s?ܙ{?@;?
?sQ
 ?x?)b?hK'?/??L???t?+???eC????+?@????L??????/@c@웗7?qĶ?F
                                                        ?L????N??4Ӈ4???!?????
--snip--

Your terminal can’t display all of the characters that make up PNG images, so those characters just don’t get displayed. If you want to see the information stored in a PNG, you need to open it in software that’s designed to view images.

To work with the files in datasets or write shell scripts and Python code, you’ll need a text editor, a program designed to edit text files. You’ll install a text editor in Exercise 3-5 to prepare for writing your first shell script.

Exercise 3-5: Install the VS Code Text Editor

In this exercise, you’ll download the free and open source text editor Visual Studio Code (VS Code) and practice using it to view a file. Download VS Code from https://code.visualstudio.com and install it. (If you’re already familiar with another text editor, feel free to keep using that one instead.)

VS Code comes with a command called code that makes it easy to open files in VS Code directly from your terminal. Once VS Code is finished installing, run the following commands:

curl https://example.com > /tmp/example.html
code /tmp/example.html

The first line of code saves the HTML from https://example.com in the file /tmp/example.html, just like you did in Exercise 3-4. The second line opens this file in VS Code.

When you open new files and folders in VS Code, it asks whether you trust each file’s author, giving you the option to open the file in Restricted Mode. For the exercises in this book, you can open files without using Restricted Mode.

When you open example.html, it should look something like this:

<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv=”Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans",
"Helvetica Neue", Helvetica, Arial, sans-serif;

    }
--snip--

The output shows the same HTML code that you saw in your terminal when you ran cat/tmp/example.html in Exercise 3-4, but this time it should be much easier to read. VS Code and many other text editors have a feature called syntax highlighting, where different parts of the file appear in different colors. This makes it far quicker and easier for your brain to interpret source code, and also for you to catch mistakes in syntax.

VS Code is highly customizable and includes a wide variety of extensions that add extra functionality and make the program more pleasant to use. When you open new types of files, for instance, VS Code might ask if you’d like to install extensions to better support those files.

NOTE To learn more about VS Code’s other features, including when to use Restricted Mode, check out the documentation at https://code.visualstudio.com/docs.

Now that you have some experience running commands in a shell and have set up a text editor, you’ll write your first shell script in Exercise 3-6.

Exercise 3-6: Write Your First Shell Script

As mentioned earlier, a shell script is a text file that contains a list of shell commands. When you tell your shell to run the script, it runs those commands one at a time. Many commands are themselves shell scripts, such as the man command you used earlier in this chapter.

Navigate to Your USB Disk

Make sure your datasets USB disk is plugged in and mounted, and open up a terminal. To change your working directory to the datasets disk, skip to the subsection for your operating system.

Windows

After mounting your USB disk, open File Explorer by clicking This PC on the left. This page will show all of your connected drives and their drive letters. Note your USB disk’s drive letter, then change your working directory to the disk by running the following command, substituting d for the correct drive letter:

cd /mnt/d/

Your shell’s working directory should now be your datasets USB disk. To check, run ls to view the files on this disk.

macOS

After mounting your datasets USB disk, open a terminal and change your working directory to the disk by running the following command:

cd /Volumes/datasets

Your shell’s working directory should now be your datasets USB disk. To check, run ls to view the files on this disk.

Linux

After mounting your datasets USB disk, open a terminal and change your working directory to the disk. In Linux, the path to your disk is probably something like /media/[\<]{.symbol_Italic}username[>]{.symbol_Italic}/datasets. For example, my username is micah, so I would run this command:

cd /media/micah/datasets

Your shell’s working directory should now be your datasets USB disk. To check, run ls to view the files on this disk.

Create an Exercises Folder

The mkdir command creates a new folder. Now that you’re in your USB disk drive in your terminal, run the following commands to create a new folder called exercises, and then switch to it:

mkdir exercises
cd exercises

Now make a folder for your Chapter 3 exercises:

mkdir chapter-3

Next, you’ll open the exercises folder in VS Code.

Open a VS Code Workspace

Each VS Code window is called a workspace. You can add folders to your workspace, which allows you to easily open any files in that folder or create new ones. To open a VS Code workspace for your exercises folder, run the following command:

code .

If the argument that you pass into code is a folder, like . (the current working directory), VS Code will add that folder to your workspace. If the path is a file, like in Exercise 3-5 when you opened /tmp/example.html, it will open just that file.

Next, create a new file in the chapter-3 folder. To do this, right-click the chapter-3 folder, choose New File, name your file exercise-3-6.sh, and press ENTER. This should create a new file that you can edit. Since the file extension is .sh, VS Code should correctly guess that it’s a shell script and use the right type of syntax highlighting.

Figure 3-7 shows a VS Code workspace with the exercises folder added and the empty file exercise-3-6.sh created.

The VS Code window is split into two main parts. The Explorer panel on the left shows the contents of all of the folders added to your workspace. In this case, it shows exercises and everything it contains: a chapter-3 folder and the exercise-3-6.sh file you just created. The right side of the window is the editor, where you’ll enter your shell script.

Figure 3-7: VS Code with the exercises folder open in a workspace
Figure 3-7: VS Code with the exercises folder open in a workspace

Write the Shell Script

Enter the following text into exercise-3-6.sh in VS Code and save the file:

#!/bin/bash
echo "Hello world! This is my first shell script."
# Display the current user
echo "The current user is:"
whoami
# Display the current working directory
echo "The current working directory is:"
pwd

The first line that starts with #! is called the shebang, and it tells the shell which interpreter—the program that opens and runs the script—to use. In this case, the shell will use /bin/bash, meaning you’re writing a bash script. In this book, you’ll add that same shebang to the top of all of your shell scripts. Even if you’re working from a shell besides bash, this shebang tells your computer to run the current script using bash.

In shell scripts, lines that start with the hash character (#) are called comments, and they don’t affect how the code itself works; if you removed the comments from this script, it would run the same way. The first character of the shebang is a hash character, which means that it’s technically a comment in bash and zsh.

Comments like # Display the current user work as notes to remind you what your code does when you come back to a script you wrote months or years earlier. Anyone else who works with your code, perhaps trying to fix something or add features, will appreciate your comments for the same reason.

The echo command displays text to the terminal. The whoami command displays the name of the user running the script. The pwd command displays the current working directory.

Run the Shell Script

Before you can run a script, you need to make it executable by giving it permission to run as a program. The chmod command lets you change permissions on files with the following syntax:

chmod permissions filename

To mark a file as executable, use +x as the permissions argument. Run the following command in your terminal (from within your exercises folder):

chmod +x ./chapter-3/exercise-3-6.sh

You can now run the script by entering either its absolute path or its relative path:

./chapter-3/exercise-3-6.sh

Starting your command with ./ tells your shell that you’re entering the relative path to a script.

Here’s the output I get when I run this script on my Mac:

Hello world! This is my first shell script.
The current user is:
micah
The current working directory is:
/Volumes/datasets/exercises

The current user is micah and the current working directory is /Volumes/datasets/exercises.

This script shows you different output depending on your working directory. To demonstrate the differences, here’s what happens when I switch to my home folder and then run it again:

micah@trapdoor exercises % cd ~
micah@trapdoor ~ % /Volumes/datasets/exercises/chapter-3/exercise-3-6.sh
Hello world! This is my first shell script.
The current user is:
micah
The current working directory is:
/Users/micah

This time, the current working directory in the output has changed to /Users/micah. Try switching to your own home folder with cd ~ and running the script again.

The script also shows different output depending on which user is running it. So far I’ve been running it as micah, but here’s what the output looks like when I run it as root:

micah@trapdoor ~ % sudo /Volumes/datasets/exercises/chapter-3/exercise-3-6.sh
Password:
Hello world! This is my first shell script.
The current user is:
root
The current working directory is:
/Users/micah

This time, the output lists the current user as root. Try running the script as root on your own computer.

You’ll write many more scripts throughout this book. I’ve included a copy of the code for every exercise in this book’s online resources. In Exercise 3-7, you’ll download a copy of all of this code.

Exercise 3-7: Clone the Book’s GitHub Repository

Programmers store source code in git repositories (or git repos for short), which are composed of a collection of files (usually source code) and the history of how they have changed over time. By storing your scripts this way, you can host them on GitHub, a popular website for hosting git repos. Git repos help you share your code with others, and they make it easier for multiple people to write code for the same project. When you clone a git repo, you download a copy of it to your computer.

This book comes with a git repo at https://github.com/micahflee/hacks-leaks-and-revelations containing the code for every exercise and case study in this book, along with additional instructions and source code related to the book’s appendixes. In this exercise, you’ll clone this repo and store the copy locally on your computer.

First, check whether the git program is installed on your machine:

which git

If git is installed, you’ll see its path in the output, like /usr/bin/git. If it’s not installed, this command won’t display anything in the terminal. In that case, install git by entering the appropriate command for your operating system: brew install git for macOS users, or sudo apt install git for Linux and WSL users.

Next, in your terminal, change to your USB disk folder. On my macOS computer, I do this with the following command:

cd /Volumes/datasets

If necessary, replace the path in my command with the appropriate path to your datasets USB disk for your operating system.

Once you’re in the datasets disk, run this command to clone the repo:

git clone https://github.com/micahflee/hacks-leaks-and-revelations.git

This should create a new folder called hacks-leaks-and-revelations containing all of the code from the book’s repo.

Finally, add the book’s git repo folder to your VS Code workspace. In VS Code, click FileAdd Folder to Workspace, then browse for the hacks-leaks-and-revelations folder on your USB disk. This will add the book’s code to your VS Code workspace so you can easily browse through all of the files.

You now have access to solutions for all future exercises! In the following chapters, I’ll walk you through the process of writing your own scripts from scratch, but you can also run the complete scripts taken from the git repo or copy and paste their code into your own code.

Summary

In this chapter, you’ve learned the basics of command line theory, including how to use the shell in a terminal, run various shell commands, and navigate the shell using features like tab completion. You installed software directly in the terminal using a package manager, and you wrote your first simple shell script.

In the next chapters, you’ll put these techniques into practice to explore hundreds of gigabytes of data, make datasets searchable, convert email from a proprietary format to an open format, and write Python code. You’ll start in the following chapter by taking a deeper dive into the BlueLeaks dataset.

Buy Now Contents ⬅️ Chapter 2 Chapter 4 ➡️ Back Home

Information Wants to be Free

Everyone should have access to the information in this book. To remove barriers to access, I've made Hacks, Leaks, and Revelations available for free online under a Creative Commons license. If you can afford it, show your support and buy a copy today!