Saving New Sounds: Podcasts and Preservation
Jeremy Wade Morris / University of Wisconsin-Madison
To the dismay of many radio and podcast scholars (or maybe just me?), the 2014 release of Serial – the investigative, true-crime, narrative non-fiction podcast produced by Sarah Koenig, Julie Snyder and This American Life – stands as a watershed moment in podcasting’s history. Despite over a decade’s worth of interesting and engaging web audio experiments taking place for most of the 2000s, many pop press narratives around podcasting suggest it wasn’t until the slickly produced and undeniably captivating reexamination of the case of Hae Min Lee and Adnan Syed that podcasts had finally arrived as a media form worthy of attention. Thanks to Serial, podcasts officially became water cooler talk  ; every bit as worthy of everyday conversation as an episode of The Office, The Bachelorette or the latest superhero franchise film.
While I may personally quibble with the outsized attention Serial gets in the much longer history of digital audio and podcasting, as someone interested in preserving and tracing podcasting’s history, there’s no doubt the moment Serial represents is a notable one, precisely because of the cultural traction with certain audiences that the show was able to achieve. 
Given this significance, it would have seemed odd *not* to include a copy of the show when we began building PodcastRE, so it was one of the first shows we added to the database. But the show, like so many others in the database, highlights some of the difficulties podcasts represent as digital objects, and as sonic artifacts we’d like to preserve.
For example, here’s the opening 35 seconds of the first episode of Serial, taken from the version that is stored on our database’s servers, which log files tell me we captured on 2016-08-04 at 17:02:08.
Brought to you by RocketMortgage? Wait? Where’s the iconic MailChimp ad? The one that was the source of so much online review, discussion and meme–ing? The one that was as much part of the show as any of Koenig’s cell phone calls or Best Buy parking lot maps? The simple answer to the “where’s mailchimp” question is that after its initial run, Serial took on a number of other sponsors beyond MailChimp so the RocketMortgage ad is there thanks to a process called dynamic insertion (or dynamic advertising), where new ads are put into old episodes in order to satisfy current advertising partnerships and contracts. The more difficult answer is that technologies like dynamic insertion show how modular and variable digital artifacts are, and how saving anything digital entails difficult questions about what to save and how to do so in a systematic way. Media archivists of all kinds face similar questions, to be sure: TV scholars, for example, might debate whether or not we should be saving the show or the flow. But given how integral and integrated host-read and other inventive advertisements are in the podcasting format, it’s often difficult to distinguish between text and paratext. Podcast ads seem in some ways closer to product placement in a film than to a 30 second commercial.
I also have access to the original MailChimp ad, thanks to a file I had downloaded for a class I was teaching back in 2014, during the original launch of the show.
But to complicate things a little further, here’s what you’ll get if you visit the Serial episode 1 record in PodcastRE today, which streams live from whatever is on Serial’s current website.
In the first minute, you’ll hear an update from Koenig about their newer S-Town podcast along with the same RocketMortgage ad from our downloaded version. You can also see from the transcript – provided through a partnership with the kind folks at the now defunct AudioSear.ch – that their transcription was based on a version of the file where Squarespace was the sponsor (and that their automated transcription bot had a hard time distinguishing between cereal and serial).
I share these various versions to show that podcasts, like so many artifacts of digital culture, are unstable objects. As downloadable files, podcasts can be moved, copied and played and so they seem stable enough to save easily. In fact, for the first decade or so of podcasting’s existence (and for many non-ad-supported podcasts still), they were largely static and unchanging. But as the podcasting industry grows up and experiments with new forms of, and technologies for, advertising and monetization, the coherence of the audio file is more in question now than it ever has been. Should a database account for these multiple versions?
Beyond the audio, there’s also multiple versions of metadata for the show. The RSS feed for the podcast delivers a series of details about the episode (run time, author, title, genre, sample rate, etc.), but so too does the ID3 tag embedded in the audio file of the show. For many of the shows in our database, it’s clear the two different sets of metadata don’t always match, and even the metadata within the RSS feed for the same podcast can differ depending on which platform it’s hosted on (e.g. iTunes, Stitcher, Soundcloud, etc.). Early data from podcastRE’s MySQL tables serve as keen reminders there are no true standards for encoding files or feeds with metadata.
I could make several arguments, some silly and some serious, about why the MailChimp ad is a crucial part of the show and one that media historians should be concerned about preserving (e.g. it resonated with thousands of the show’s listeners, it was a palpable demonstration of attempts to monetize podcasting, it’s an oddly, audibly engaging and perhaps culturally troubling ad, etc.). The key point though is less about the specific ad, or even this particular show. Rather, it’s about the challenge that saving digital sounds of all kinds entails. How do we address objects that are at once sonic, but also subject to filetypes and formats that change according industrial, economic, or technical demands? How do we prioritize which metadata to save or use in building search tools given how highly variable and unstandardized podcast metadata is?
So we can quibble about whether or not Serial really was as watershed a moment for podcasting as it is often presented. But before we even get to the larger, more culturally pressing questions of *which* podcasts are notable or significant or worth preserving, there’s a whole slew of more technical and mundane questions about *how* to save these new sounds that need to be addressed.
1. Jeremy Wade Morris considers some of the oddities/challenges that podcasts present as digital objects.
2. PodcastRE record for Serial, S01E01 (author’s screen grab)
3. 3 versions of metadata for Episode 32 of the Aca-Media podcast. On the left, ID3 tag data from the iTunes version of the file. In the center, the same ID3 tag data from the version of the file available on the Aca-media site, with a different genre and fewer overall categories. On the right, metadata from the RSS feed at the show and episode level. (author’s screen grab)
Please feel free to comment.NOTES