Hacks, Leaks, and Revelations

~18 min read

Appendix A: Solutions to Common WSL Problems

It’s hard for me to imagine doing the kind of data analysis work I do without Linux. However, many Linux tools that I rely on every day simply don’t exist in the Windows ecosystem. Using just Windows, you can’t make your datasets searchable using Aleph, for example, or quickly spin up popular SQL servers to import leaked databases. Windows Subsystem for Linux (WSL) allows you to do these tasks and considerably more, including running command line tools, in Windows without needing to set up your own Linux VM.

WSL generally works well, but you may encounter a few issues, particularly related to disk performance, when you attempt to crunch data stored on Windows-formatted disks from your Linux terminal. Some tasks may take your computer hours or days to finish when they should take just minutes. This appendix teaches Windows users more about the quirks of the Linux filesystem, as well as various possible solutions to the disk performance problem. It’s optional for Windows users, but if you do run into a situation where programs are taking considerably longer to run than I describe, this appendix should help you diagnose the problem and come up with a solution.

I recommend waiting to read this appendix until after you finish Chapter 3, where you’ll install WSL, and Chapter 4, where you’ll learn the command line code required to implement the performance solutions described here. You might need to reference this appendix in Chapter 5, while you’re bind-mounting in Docker or indexing datasets in Aleph; in Chapter 11, while you’re using Linux tools to extract over a million files; and in Chapter 12, while you’re importing 20GB of data into a SQL database running in Docker.

Understanding WSL’s Linux Filesystem

Before solving any problems that you might encounter with WSL, you’ll need to understand how and where WSL stores your Linux files. In this section, you’ll learn how the WSL Linux filesystem works in Windows, how to access Linux files in Windows, and, conversely, how to access Windows files in Linux. I also outline some of the simpler problems you might encounter and how to solve them.

Your WSL Linux filesystem contains much more than just the data you store there. It includes a complete copy of the Ubuntu operating system, and when you install new programs using apt, it installs those into the Linux filesystem, too. Just as in a real Ubuntu system, / is the root folder, and it contains all of the usual default folders for Ubuntu systems. In your Ubuntu terminal, list the folder names in the root filesystem by running the ls / command. This should give you the following output:

bin dev home lib lib64 lost+found mnt proc run snap sys usr
boot etc init lib32 libx32 media opt root sbin srv tmp var

This output lists all of the folders inside the root folder in your Ubuntu system. Your home folder is in /home. For example, since my Ubuntu username is micah, my Ubuntu home folder is /home/micah. If you’re using Windows 11 or newer, you should be able to browse your Linux files directly from File Explorer. When you’re browsing your Linux files, you see all of the Linux files, not just the data in your home folder. If you’re running a new enough version of Windows, change to your home folder (cd ~) and then run the command explorer.exe . to open your current working folder in Linux in Windows File Explorer.

NOTE If you ever need to run a Windows program in Linux (such as explorer.exe), run Linux programs in Windows, or otherwise do more advanced tasks in WSL, check out Microsoft’s detailed documentation at https://learn.microsoft.com/en-us/windows/wsl/filesystems.

For example, Figure A-1 shows a list of Linux files viewed in the Windows File Explorer app.

Figure A-1: Browsing Linux files in File Explorer in Windows
Figure A-1: Browsing Linux files in File Explorer in Windows

It’s also helpful to understand the different behaviors of Windows and Linux filesystems. The Windows filesystem format is NTFS, and the most popular Linux filesystem format is ext4. Each Linux file has separate permissions for reading, writing, and executing, but files on NTFS systems don’t have this metadata.

If you’re accessing a Windows filesystem within Linux (when you access /mnt/c in WSL, for example), Linux treats every file as having read, write, and execute permissions, and you can’t change these permissions by default. This often isn’t a problem, but it does mean that if you copy files from Linux to Windows, you’ll lose their original permissions. If you want to be able to use file permissions, you’ll need to work with files on a proper Linux filesystem instead.

Under the hood, the WSL Linux filesystem is stored in a single file in the C: drive in your user’s home folder. On my Windows 11 computer, the filename is C:\Users\micah\AppData\Local\Packages\CanonicalGroupLimited .UbuntuonWindows_79rhkp1fndgsc\LocalState\ext4.vhdx. The path on your computer will be slightly different, but the file containing your Linux filesystem will still be called ext4.vhdx. The more data you store in your Linux filesystem, the bigger the ext4.vhdx file gets.

For example, suppose you want to save your datasets directly to your Linux filesystem in the folder ~/datasets, rather than to a USB disk. Downloading the BlueLeaks dataset to that folder will cause your ext4.vhdx file to grow hundreds of gigabytes larger. Because your whole WSL Linux filesystem is stored on your C: drive, this means you only have as much disk space available in Linux as you have free space on that drive. In addition to the limits of free space on your C: drive, by default, your WSL Linux filesystem can only take up a maximum of 256GB.

If you want to store more data than this, you’ll need to take additional steps to expand the size of your Linux filesystem. Find detailed instructions for doing this at https://learn.microsoft.com/en-us/windows/wsl/vhd-size.

Finally, you might come across a situation where you open an Ubuntu terminal, try to access a Windows-formatted USB disk in /mnt, and find that it’s just not there. This is because you need to have mounted your Windows USB disk (plugged it in and, if it’s encrypted, entered your BitLocker password) before opening the Ubuntu shell. If you can’t access a USB disk from Ubuntu at a path like /mnt/d (assuming your USB disk is mounted to the D: drive), you’ll need to restart WSL.

To do so, open PowerShell and run wsl --shutdown. This will close all open Ubuntu terminals and stop any running Docker containers. Afterward, open an Ubuntu terminal again, and you should be able to access that USB disk.

With the basics of WSL out of the way, let’s discuss the primary problem you might encounter: disk performance.

The Disk Performance Problem

Using Windows files in WSL, like the disks mounted in /mnt, has major performance issues. Reading from and writing to disks takes considerably longer when you’re working with files on a Windows disk than when you’re working with them on WSL’s Linux filesystem, like those in /home. Some disk-intensive tasks, like extracting a compressed file that contains a million small files, might take several hours on a Windows disk, when the same task could be completed in seconds on a Linux disk. These performance issues can severely cut into your ability to get anything done.

For this reason, Microsoft recommends that you store data in the same operating system filesystem as the tools you plan to use. Using Linux tools like find, grep, and unzip (see Chapter 4) or making your data searchable using Aleph (see Chapter 5) will work best if you store your datasets in the Linux filesystem. Meanwhile, analyzing datasets with Windows software, such as 7-Zip, will work best if those datasets are stored in the Windows filesystem.

By far the simplest solution to the performance issue is to store all of your datasets in your Linux filesystem and use Linux tools to work with them. However, datasets are often too large to fit in the C: drive. For example, BlueLeaks alone takes up over half the disk space I have available on the laptop I’m using right now. This gives you no choice but to store the datasets on a USB disk like the datasets disk that you encrypted in Chapter 1. If you don’t have enough space on your C: drive to work with all of the data that you’d like to, you’ll need to implement one of the solutions I discuss in the following section.

You can sometimes work around disk performance problems by simply using native Windows programs rather than Linux programs to do your most disk-intensive tasks. Throughout this book, I suggest using native Windows tools when appropriate. For example, the Linux programs unzip and tar are used to extract compressed files. Extracting files saved on a Windows disk using these tools can be extremely slow if you’re working from WSL. Instead, you can just use a Windows archive program like 7-Zip, as you did in Chapter 4 to unzip BlueLeaks. Since extracting compressed files with 7-Zip doesn’t involve WSL at all, there’s no disk performance problem.

Likewise, when you’re running Python scripts, you can use the Windows version of Python (downloadable from https://www.python.org) rather than the Ubuntu version. This allows your Python scripts to crunch data from your Windows-formatted USB disk, bypassing the WSL performance problem, which is why Chapter 7 advises you to use PowerShell instead of WSL.

This technique can only take you so far, though. There aren’t always native Windows alternatives. For instance, you can’t run Docker containers, which are required for running software like Aleph, without WSL. For those cases, you’re better off storing your data in a Linux filesystem using one of the following solutions.

Solving the Disk Performance Problem

This section covers two potential solutions to the WSL disk performance problem: storing only datasets you’re actively working with in your WSL Linux partition, or storing your entire WSL Linux partition on a USB disk with more disk space than your C: drive has available. I recommend the first, simpler option if you just want to be able to easily work through this book. The second option is a better long-term solution if you plan to routinely work with large datasets using Linux tools in Windows in the future.

Storing Only Active Datasets in Linux

If you don’t have enough free space for all your datasets on your C: drive, you can store just those you need for your current work in your Linux filesystem, keeping the rest on your USB disk. For example, you could copy the folders that you’re actively working with into your Ubuntu home folder (that is, in ~/datasets) and keep them there while you’re conducting a specific investigation. When you no longer need to work with these files, copy them back to your external Windows disk and delete them from your home folder to clear up space.

Storing Your Linux Filesystem on a USB Disk

Another option is to move your whole Ubuntu installation—that is, the ext4 .vhdx file that contains your Linux filesystem—to a USB disk with lots of free space. This way, if you store all your datasets in your Ubuntu home folder (in ~/datasets), the data will physically be stored on your USB disk instead of on your internal C: drive.

With this solution, you won’t be able to open an Ubuntu terminal unless your USB disk is plugged in and mounted, which is less convenient. If you try opening an Ubuntu terminal while your USB disk isn’t mounted, you’ll get the error message The system cannot find the path specified. However, I recommend using this method if you plan to investigate your own datasets in the future. Here’s how to implement it.

Open a PowerShell terminal and create a new folder on your external USB disk to store your Linux filesystem by running the following command:

New-Item -Path D:\WSL -ItemType 'directory'

In PowerShell, the New-Item command creates a new file or folder. In this case, it’s creating a new folder at D:\WSL. If you’d like to store your ext4.vhdx file somewhere else, you can change the path when you run this command.

WSL lets you install multiple Linux distributions (that is, versions of Linux) at the same time. In order to move one of them to your USB disk, you’ll need to know its name. It’s probably called Ubuntu, but you can check by running the wsl --list command:

PS C:\Users\micah> wsl --list
Windows Subsystem for Linux Distributions:
Ubuntu (Default)
docker-desktop-data
docker-desktop

This output lists each WSL distribution that you have installed. In my case, I have Ubuntu (my default distribution), as well as docker-desktop -data and docker-desktop, which are both used by Docker Desktop. If you’ve worked through Chapter 5 and installed Docker Desktop yourself, you should have these WSL distributions too.

The following steps show you how to move a WSL distribution from the C: drive to a USB disk. These instructions focus on the Ubuntu distribution, but if any other WSL distributions are taking up too much space on C: as well, you could follow the same steps to move them to a USB disk, making sure to change the distribution name when you run the commands. I don’t recommend moving the Docker distributions, though, as this might cause issues with Docker Desktop working correctly.

Once you’ve confirmed the name of the distribution you’d like to move to your disk, export a copy of it with the following command, replacing Ubuntu with the appropriate name:

wsl --export Ubuntu D:\WSL\backup.tar

This should save a copy of all of the data from your chosen distribution into the file D:\WSL\backup.tar. Once this finishes, unregister the Ubuntu distribution on your C: drive from WSL by running this command:

wsl --unregister Ubuntu

This will remove the distribution from your computer and delete the Linux filesystem file ext4.vhdx. That’s okay, because you just made a backup.

Next, import your backup, this time telling WSL that you want your data for this distribution to be in D:\WSL:

wsl --import Ubuntu D:\WSL D:\WSL\backup.tar

This command creates a new WSL distribution, in this case called Ubuntu. Now make that distribution into your default WSL distribution by running the following command:

wsl --set-default Ubuntu

When you later open a WSL terminal, it should now open a shell in the default distribution you just chose.

At this point, you’ve moved your Ubuntu filesystem from the C: drive to the D: drive (or whatever path you changed it to when you ran your own commands). The original ext4.vhdx file stored on C: should no longer exist, and you should have a new one in D:\WSL\ext4.vhdx.

Now that you’ve restored the temporary backup file, D:\WSL\backup.tar, you can delete it by running the following command in your PowerShell terminal:

Remove-Item -Path D:\WSL\backup.tar

Open a new Ubuntu terminal. It should work, with one problem: you’ll automatically be logged in as the root user, while before you would automatically log in as an unprivileged user. To fix this, you’ll create a file called /etc/wsl.conf, using the nano text editor described in Chapter 4.

Run the following command to open nano:

nano /etc/wsl.conf

Enter the following two lines into the text editor file to set your default user, changing micah to whatever your username was before you moved ext4 .vhdx to a USB disk:

[user]
default=micah

Press CTRL-O, followed by ENTER, to save the file, and then press CTRL-X to exit. Back in your PowerShell terminal, shut down WSL by running this command:

wsl --shutdown

When you open a new Ubuntu terminal, you should now be logged in as your normal unprivileged user, rather than the root user.

Running your Linux filesystem off of a USB disk should now work, but your Linux filesystem is still limited to the default 256GB of data, even if your USB disk is bigger. The final step is to expand the size of your Linux filesystem so that it can take up as much space as you have available on your USB disk.

In PowerShell, shut down WSL by running:

wsl --shutdown

Next, open a Command Prompt shell as an administrator. (Since you need to open it as an administrator, it’s simplest to just open Command Prompt directly instead of opening a Command Prompt tab in Windows Terminal.) Click Start, search for command prompt, right-click Command Prompt, and click Run as Administrator. In your administrator Command Prompt, open the Windows CLI program DiskPart by running the following:

diskpart

This program helps you manage your computer’s drives and partitions. When you open it, it should drop you into an interactive shell similar to the Python shell. In DiskPart, you must first select the disk that you’d like to resize, in this case the ext4.vhdx file. Run this command, substituting the correct path to ext4.vhdx on your machine:

DISKPART> select vdisk file="D:\WSL\ext4.vhdx"

DiskPart successfully selected the virtual disk file.

You can then check the current size of your Linux filesystem by running this command:

DISKPART> detail vdisk

Device type ID: 0 (Unknown)
Vendor ID: {00000000-0000-0000-0000-000000000000} (Unknown)
State: Added
Virtual size:  256 GB
Physical size: 7664 MB
Filename: D:\WSL\ext4.vhdx
Is Child: No
Parent Filename:
Associated disk#: Not found.

The output of detail vdisk shows you information about the virtual disk that you’re inspecting, including the total maximum size of your Linux filesystem in the Virtual size field, along with the actual disk space the Linux filesystem is currently using in the Physical size field.

Next, you’ll resize your virtual disk to be larger than 256GB. First you need to determine how big you want it to be. If your USB disk is empty except for this ext4.vhdx file, then it’s reasonable to select the size of the entire USB disk. You can find the total size of your USB disk by right-clicking your disk in File Explorer and clicking Properties. In the General tab, you’ll see information about disk space usage, including the total capacity of the disk. You should avoid making your virtual disk larger than the total size of your USB disk; if you do, you might run out of disk space in your USB disk without WSL realizing it, leading to unpredictable Linux problems.

Once you determine how big you want your Linux partition to be, make sure you know that number in megabytes. For example, if you want its maximum size to be 1TB, then that would be 1,048,576MB. Check the File Size Units and Conversions box in Chapter 4 for information on how to calculate this. Now run the following command to resize it, replacing 1048576 with your chosen maximum size in megabytes:

DISKPART> expand vdisk maximum=1048576

  100 percent completed

DiskPart successfully expanded the virtual disk file.

Run detail vdisk again to confirm that the Virtual size field now shows the new maximum size you just set, then exit DiskPart by running the exit command:

DISKPART> exit

Leaving DiskPart...

Close the administrator Command Prompt.

You’ve now expanded the virtual disk size in ext4.vhdx, but you still need to expand the actual Linux partition on this disk. To do that, you need to determine the path to the virtual disk itself.

First, open an Ubuntu terminal and run the following command:

sudo mount -t devtmpfs none /dev

This command checks to be sure that /dev is mounted in WSL. This is a special Linux folder containing files that each represent a piece of hardware attached to your computer, including hard drives. When I run this command, I get the following output:

[sudo] password for micah:
mount: /dev: none already mounted on /dev.

Since that command uses sudo, you’ll first need to type your password to proceed. If you see the warning message mount: /dev: none already mounted on /dev you can safely ignore it; this means that /dev was already mounted.

Run the following command:

mount | grep ext4

This command runs mount, which outputs all of the filesystems that are mounted in Linux, then pipes that output to grep to filter that down to just the ext4 filesystems. The output should show you the path to the virtual hard drive. For example, my output tells me the path to my hard drive is /dev/sdc:

/dev/sdc on / type ext4 (rw,relatime,discard,errors=remount-ro,data=ordered)

Finally, resize your Linux partition to take up as much space as it can by running the following command, making sure to use the correct path to your virtual hard drive from the previous command:

sudo resize2fs /dev/sdc

When I run this, I get the following output:

resize2fs 1.45.5 (07-Jan-2020)
Filesystem at /dev/sdc is mounted on /; on-line resizing required
old_desc_blocks = 32, new_desc_blocks = 128
The filesystem on /dev/sdc is now 268435456 (4k) blocks long.

If all went well, you should see a message confirming that the partition has been resized and is now as large as the full virtual disk. You’re done! You’ve expanded your Linux filesystem so that you can fit many more datasets on it.

Next Steps

In my opinion, Microsoft made Windows a considerably more useful operating system by building WSL into it, particularly when you need to analyze hacked and leaked datasets. If you’re going to use WSL on a regular basis, I recommend that you read through the official documentation at https://learn.microsoft.com/en-us/windows/wsl/. Those docs cover topics beyond the scope of this appendix, such as using WSL with VS Code, installing SQL database software directly in WSL instead of using it in Docker, running graphical Linux apps directly in Windows, and more.

In addition to using WSL, you may want to familiarize yourself with Linux in general by working through this book, trying out various Linux distributions by running them in VMs on your computer (see Chapter 1), and playing with Linux servers in the cloud (see Chapter 4).

Buy Now Contents ⬅️ Afterword Appendix B ➡️ Back Home

Information Wants to be Free

Everyone should have access to the information in this book. To remove barriers to access, I've made Hacks, Leaks, and Revelations available for free online under a Creative Commons license. If you can afford it, show your support and buy a copy today!