Since I started writing this book a year and a half ago, the feeling that I’m drowning in datasets has only increased. In a single recent week, I received 18GB of data from a police intelligence firm, another 100GB of data from a mass transit agency’s police department, and a copy of the Transportation Security Agency’s No Fly List. This is a fairly typical week, and, as usual, I haven’t had a chance to look at any of them yet.
I’m so happy that you’ve finished reading this book, because now you can use your newfound skills to help investigate this never-ending flood of datasets. There aren’t nearly enough of us with these skills, so I’m excited that you’ve joined the ranks. I hope you’ll use your skills to discover and publish secret revelations and make a positive impact on the world while you’re at it.
This book is crammed with technical information, but it’s far from a comprehensive guide to investigating leaked and hacked datasets. I merely scratched the surface on a wide swath of technologies that come into play, like using the command line, programming in Python, using Docker containers, working with SQL databases, and analyzing structured data. There are countless books dedicated to each of these topics. But while there’s a lot left to learn, you should now have a solid foundation to build on.
The best way to gain confidence in these skills, and to learn more, is to jump in headfirst and just start using them. Go to the DDoSecrets website, see what the collective has published recently, and subscribe to its newsletter so you’ll get email alerts when new datasets are released. If you find a dataset that looks interesting and is available for anyone to download, launch your BitTorrent client, download it, and see if you can make sense of it. If you find a dataset released under limited distribution, meaning that DDoSecrets will give it only to journalists and researchers (like you!), request access. As long as you plan on publishing any revelations you find, you shouldn’t have a problem gaining access.
Depending on the dataset you’re looking at, you might hit technical hurdles that aren’t covered in this book and that you don’t know how to solve. I often come across data that I don’t recognize and don’t know how to proceed with. Most of the time, I end up searching the internet to figure out my next steps. Sometimes I even learn how to use new technologies that I have no prior experience with, like new types of databases or software, so I can import and explore the data. As your skills grow, you’ll be able to do the same using online documentation and, most importantly, trial and error. Don’t be afraid to experiment.
As you’re exploring new datasets, automate as much of your work as possible by writing simple Python scripts like the ones sprinkled throughout this book. Regularly writing code is, by far, the best way to get better at programming. Also publish your interesting findings, even if they’re minor. If you don’t work for a newsroom, start a blog and publish your work there. The more investigations you publish, the more likely it is that potential sources will notice you, start up secure communications with you, and send you datasets to analyze. Be precise in your reporting and, as much as possible, show your work. Investigating leaked and hacked datasets is cool, and people will love to read about the details that you’ve discovered, how you discovered them, and how you verified that they’re true.
Good luck! Get in touch at [email protected] to let me know if you find any revelations.