Decoding a Data Story
The journey of a data story is often an unpredictable one. My first story with Citizen Codex was no different, starting from a single question that evolved over five months and took the entire team on board to get over the finish line. This project pushed me to wear multiple hats—researcher, writer, data analyst, and web developer—challenging me to exercise a wide range of skill sets and collaborate with a variety people to bring it all together.
Inspired by behind-the-scenes looks at cool data journalism projects, like Alvin Chang’s Big Stacks and Kontinentalist’s Kawan, I wanted to break down my process and share my biggest takeaways from developing this story from start to finish.
The Pitch
Landing on the ‘right’ idea for a data story is often the most difficult part. It’s easy to ask questions, but harder to scope the process of answering them when data is involved. To start, I pitched four article ideas that each contained an overview of potential data sources, a hypothesis, and my inspiration or motivation. I tried my best to justify the following questions: Is it worth talking about? Why is it interesting? What unique perspective can I offer?
Out of the four, the book ban idea stood out to me as the most promising. I’d identified a clean, straightforward data source and the question seemed topically relevant, especially during this election year. Most importantly, having worked at my campus library for four years in college and grown up in the censorship-prone country of Indonesia, I had a personal interest in the story. You’re not always going to work on stories that are personally relevant to you, but there’s often a bit of extra passion and curiosity when you do.
The Analysis
This is when the road really started twisting and turning.
Since so much had been written on the topic of book bans, I first attempted to start my analysis using data that was already available: the American Library Association’s annual list of the top 10 most challenged books from 2001 to 2022. I visualized thematic trends over time and highlighted the themes that sparked controversy for each book.
The results were interesting, but it was quickly clear there wasn’t much of a story. One issue was the sample size—with only 10 books each year and no ban counts for each title, the frequency of the data was too low to be meaningful. Using an existing and simple dataset limited my ability to provide an original analysis since I’d only be reiterating what ALA and other literacy groups have already concluded.
After brainstorming with the team, I pivoted to pursue a more novel angle: the groups behind these bans themselves. Specifically, I was interested in their funding sources. The same groups kept cropping up in my research, seemingly with tons of volunteers and money to disrupt schools and libraries across the country. This left me with more questions: Where were they getting their money from? How much money does it take to organize a book banning campaign?
I compiled a list of all the book-banning organizations I could find and used ProPublica’s Nonprofit Explorer to track down data on their finances. While the breadth of data contained in these tax forms seemed promising, another roadblock quickly emerged: only three organizations on the list had grossed enough revenue (above $50,000) to necessitate filing the Form 990, which brought me back to the same problem—too few data points.
Still, the numbers were significant: the forms showed that millions of dollars were collectively being funneled into these book banning efforts, even just across three groups. This pointed to a systematic, well-funded campaign, subverting some reports that attributed many of these bans to just a handful of overly concerned individuals. Naturally, this led to even more questions: Where and how were these groups spending their money? Can we map out their presence and see how that affects book bans locally?
Out of all the groups we tracked, Moms for Liberty had the strongest geographic presence, with a Facebook page for each chapter and the most buzz around their activities. I scraped their Facebook page for details on their chapters using Selenium and, after a lot of data cleaning, combined the dataset with PEN America’s indexes of school book bans, using the counties as a location match. PEN America was gracious enough to guide us through the dataset and offer their insight into the national discourse around book bans, which is central to their work.
After most of the data analysis was completed, I worked with our design team to create mock-ups of the page and developed it using SvelteKit. Overall, the project took just under five months to finish from end to end.
Final Takeaways
At every stage of the project, I constantly had to pivot and adapt to unforeseen obstacles, data limitations, and design failures. This flexibility can feel frustrating, but it’s a crucial part of the process; only by letting the story grow and evolve organically can it become the best version of itself, even if it isn’t how you pictured it at the start.
At the end of the day, each story is just going to shape up in its own way. It’s the double-edged sword of data journalism, as I’ve come to learn: you can’t exactly follow a formula when creating distinct, interesting, and visually compelling data stories, so each new project is going to be a challenge. You just have to embrace it.