Saving New Sounds: The Sonic Web
Jeremy Wade Morris/ University of Wisconsin-Madison

soundwave

Screenshot of soundwave with absences

All web histories are marked by their absences – by what cannot be captured in a dynamic and often-changing environment of code, objects, pages, sites and spheres.[ ((Brügger, N. (2009) ‘Website History and the Website as an Object of Study.’ New Media & Society, 11(1&2): 115-32.))] But web sounds are doubly vulnerable. First, most of the web archiving tools available today, like the Internet Archive’s Wayback Machine, Archive.Today or Pinboard, are primarily built on visual metaphors and thus neglect, or at least push to the background, the role of audio. Second, preserving audio formats often requires preserving the sounds themselves as well as the technologies on which to play those sounds. Like so many other digital media, web audio is hard to hear not just because it is hard to find and save but because it is hard to know what to save along with it to make it playable in the future. Driscoll and Diaz’s claim that ‘the role of music, sound, and noise in computer games remains relatively under examined'[ ((Driscoll, K. & Diaz, J. (2009) ‘Endless Loop a Brief History of Chip Tunes,’Transformative Works and Culture, 2(1).))] also applies to web sounds more generally, and even more acutely to attempts to preserve the longer history of the web’s soundscapes.

The web has a rich sonic history, from early websites and bulletin board sites where users traded sound-related texts like MIDI files, song lyrics, guitar tablature, to the noises of new technologies and practices like the sound of the dial-up modem or ‘You’ve got Mail!’, to higher-bandwidth audio practices like file sharing, mp3 downloads, streaming music and podcasts. But the current tools for archiving and displaying this history are highly visual. The Internet Archive’s Wayback machine, for example, can certainly help us to see what Napster’s website looked like between 1998 and 2001, or what people were talking about in the forums of the web’s popular sonic hangouts (i.e. Winamp, Shoutcast, Mp3.com, MySpace, etc). There’s even early screengrabs of the Internet Underground Music Archive (IUMA), one of the earliest large-scale communities for sharing and selling sounds, discovering new music and connecting with independent musicians, dating back to 1996. But trying to access the sounds from website snapshots proves a difficult task.

Take the screenshot below: users can see that in 1996 the IUMA was boasting that the site was ‘Now Real Audio Enriched!’. Though the images are missing, users can get a sense of the site’s purpose and mission.

Wayback Machine

Screenshot of IUMA.com from the Wayback Machine Oct. 25, 1996

Clicking through the Real Audio link redirects users to another snapshot from Dec. 1996, a link which even retains some of the site’s images. Users can select music by genre, or find bands by a specific location, by random selection, or by a direct search.

Wayback Machine2

Screenshot of IUMA.com from the Wayback Machine Dec. 19, 1996

Here’s where the history stops though…trying to navigate to any of the sonic content in these links returns error pages. At one point, I was promisingly redirected to a more recent page, from May 2003, but the mp3 and Real Audio links did not resolve.

Wayback Machine3

Screenshot of IUMA.com from the Wayback Machine May 10, 2003

Like so many of the other audio and media links in the Wayback Machine’s archives, the source audio has either moved or is no longer playable. This snapshot-based approach to web preservation and display is crucial, allowing users to time-hop through various representations of the past, but it foregrounds the visual and ensures a site’s text/layout is preserved while capturing less of the other media associated with historical sites. Even if I was somehow able to grab the Real Audio file, the likelihood of being able to play that file on today’s technology remains slim. Before .mp3s, .m4as, AACs and wav files became ubiquitous, there were dozens of proprietary formats, like Real Audio, a2b, liquid audio, that never had the staying power of these more popular formats. Still, they hold pieces of the web’s sonic past.

The Internet Archive is an amazing resource for scholars and historians, and its Wayback Machine is not the only tool for tracking down the web’s sounds. The site has a robust audio collection of old radio, podcasts, community audio, audiobooks and a vast collection of live concerts. Following the thread above, they also have a sizeable collection of audio from the IUMA (over 45,000 files at last check), including one song by Tom Andes, the random artist I was redirected to above. Sounds like these are invaluable and some of the only remaining traces we have of these sonic communities. But they are also stripped of much of their original context. The IUMA sound collection provides access to sounds but doesn’t really provide a sense of how these songs and sounds appeared within the original site. If the visual snapshots from the Wayback Machine are silent, the sound collections are hard to visualize.

Schafer’s original conception of the soundscape is primarily a sonic one,[ ((Schafer, R. M. (1977) The Tuning of the World. New York, NY: Knopf.))] but the link to landscapes emphasizes how sound is shaped by the structures, bodies and objects that make up any given scene. On the Web, it’s often not enough to save the sound of a particular song, technology or web object; the contextual material that helps create that sound must be preserved as well. Preserving the soundscape means preserving much more than just the sounds; it requires a multi-sensorial effort to document the Web in all its multi-mediated complexity. If ‘digital sources necessitate a rethinking of the historian’s toolkit,'[ ((Milligan, I. (2012) ‘Mining the ‘Internet Graveyard’: Rethinking the Historians’ Toolkit.’ Journal of the Canadian Historical Association, 23(2): 21-64.))] they also require rethinking how the very practices of archiving and historiography take place.

All history is marked by its absences. As Jonathan Sterne notes, the very endeavor of doing media historiography would not be possible without absence: ‘It is the absence of the past, the impossibility of finding direct access to it, that makes possible the writing, reading, and contemplation of history. History’s condition of impossibility— the irreducible distance of finitude—is thus its condition of possibility.'[ ((Sterne, J. (2010) ‘Rearranging the Files: On Interpretation in Media History.’ The Communication Review, 13(1): 75-87.))] We can recover sounds from the Web’s past but even the most complete archive (like a download of all the IUMA files) is still merely an archive of fragments and gaps.

*Note: This paper expands on an analysis in the forthcoming chapter “Hearing the Past:
The Sonic Web from MIDI to Music.”

Image Credits:

1. Screenshot of soundwave with absences (author’s screen grab)
2. Screenshot of IUMA.com from the Wayback Machine Oct. 25, 1996 (author’s screen grab)
3. Screenshot of IUMA.com from the Wayback Machine Dec. 19, 1996 (author’s screen grab)
4. Screenshot of IUMA.com from the Wayback Machine May 10, 2003 (author’s screen grab)

Please feel free to comment.




Saving New Sounds: Podcasts and Preservation
Jeremy Wade Morris / University of Wisconsin-Madison

podcasts

Jeremy Wade Morris considers some of the oddities/challenges that podcasts present as digital objects

To the dismay of many radio and podcast scholars (or maybe just me?), the 2014 release of Serial – the investigative, true-crime, narrative non-fiction podcast produced by Sarah Koenig, Julie Snyder and This American Life – stands as a watershed moment in podcasting’s history. Despite over a decade’s worth of interesting and engaging web audio experiments taking place for most of the 2000s, many pop press narratives around podcasting suggest it wasn’t until the slickly produced and undeniably captivating reexamination of the case of Hae Min Lee and Adnan Syed that podcasts had finally arrived as a media form worthy of attention. Thanks to Serial, podcasts officially became water cooler talk [ ((See Berry’s (2015) recap of the press coverage.))] ; every bit as worthy of everyday conversation as an episode of The Office, The Bachelorette or the latest superhero franchise film.

While I may personally quibble with the outsized attention Serial gets in the much longer history of digital audio and podcasting, as someone interested in preserving and tracing podcasting’s history, there’s no doubt the moment Serial represents is a notable one, precisely because of the cultural traction with certain audiences that the show was able to achieve. [ ((Berry, Richard. 2015. “A Golden Age of Podcasting? Evaluating Serial in the Context of Podcast Histories” Journal of Radio & Audio Media 22 (2):170-178. doi: 10.1080/19376529.2015.1083363
))]

Given this significance, it would have seemed odd *not* to include a copy of the show when we began building PodcastRE, so it was one of the first shows we added to the database. But the show, like so many others in the database, highlights some of the difficulties podcasts represent as digital objects, and as sonic artifacts we’d like to preserve.

For example, here’s the opening 35 seconds of the first episode of Serial, taken from the version that is stored on our database’s servers, which log files tell me we captured on 2016-08-04 at 17:02:08.

Brought to you by RocketMortgage? Wait? Where’s the iconic MailChimp ad? The one that was the source of so much online review, discussion and memeing? The one that was as much part of the show as any of Koenig’s cell phone calls or Best Buy parking lot maps? The simple answer to the “where’s mailchimp” question is that after its initial run, Serial took on a number of other sponsors beyond MailChimp so the RocketMortgage ad is there thanks to a process called dynamic insertion (or dynamic advertising), where new ads are put into old episodes in order to satisfy current advertising partnerships and contracts. The more difficult answer is that technologies like dynamic insertion show how modular and variable digital artifacts are, and how saving anything digital entails difficult questions about what to save and how to do so in a systematic way. Media archivists of all kinds face similar questions, to be sure: TV scholars, for example, might debate whether or not we should be saving the show or the flow. But given how integral and integrated host-read and other inventive advertisements are in the podcasting format, it’s often difficult to distinguish between text and paratext. Podcast ads seem in some ways closer to product placement in a film than to a 30 second commercial.

I also have access to the original MailChimp ad, thanks to a file I had downloaded for a class I was teaching back in 2014, during the original launch of the show.

But to complicate things a little further, here’s what you’ll get if you visit the Serial episode 1 record in PodcastRE today, which streams live from whatever is on Serial’s current website.

Screenshot from the Serial episode 1 record.

Author screenshot of the record in PodcastRE for Serial, S01E01

In the first minute, you’ll hear an update from Koenig about their newer S-Town podcast along with the same RocketMortgage ad from our downloaded version. You can also see from the transcript – provided through a partnership with the kind folks at the now defunct AudioSear.ch – that their transcription was based on a version of the file where Squarespace was the sponsor (and that their automated transcription bot had a hard time distinguishing between cereal and serial).

I share these various versions to show that podcasts, like so many artifacts of digital culture, are unstable objects. As downloadable files, podcasts can be moved, copied and played and so they seem stable enough to save easily. In fact, for the first decade or so of podcasting’s existence (and for many non-ad-supported podcasts still), they were largely static and unchanging. But as the podcasting industry grows up and experiments with new forms of, and technologies for, advertising and monetization, the coherence of the audio file is more in question now than it ever has been. Should a database account for these multiple versions?

Beyond the audio, there’s also multiple versions of metadata for the show. The RSS feed for the podcast delivers a series of details about the episode (run time, author, title, genre, sample rate, etc.), but so too does the ID3 tag embedded in the audio file of the show. For many of the shows in our database, it’s clear the two different sets of metadata don’t always match, and even the metadata within the RSS feed for the same podcast can differ depending on which platform it’s hosted on (e.g. iTunes, Stitcher, Soundcloud, etc.). Early data from podcastRE’s MySQL tables serve as keen reminders there are no true standards for encoding files or feeds with metadata.

description of image

Author screenshot of 3 versions of metadata for Episode 32 of the Aca-Media podcast (on the left, ID3 tag data from the iTunes version of the file. In the center, the same ID3 tag data from the version of the file available on the Aca-media site, with a different genre and fewer overall categories. On the right, metadata from the RSS feed at the show and episode level)

I could make several arguments, some silly and some serious, about why the MailChimp ad is a crucial part of the show and one that media historians should be concerned about preserving (e.g. it resonated with thousands of the show’s listeners, it was a palpable demonstration of attempts to monetize podcasting, it’s an oddly, audibly engaging and perhaps culturally troubling ad, etc.). The key point though is less about the specific ad, or even this particular show. Rather, it’s about the challenge that saving digital sounds of all kinds entails. How do we address objects that are at once sonic, but also subject to filetypes and formats that change according industrial, economic, or technical demands? How do we prioritize which metadata to save or use in building search tools given how highly variable and unstandardized podcast metadata is?

So we can quibble about whether or not Serial really was as watershed a moment for podcasting as it is often presented. But before we even get to the larger, more culturally pressing questions of *which* podcasts are notable or significant or worth preserving, there’s a whole slew of more technical and mundane questions about *how* to save these new sounds that need to be addressed.

Image Credits:

1. Photo credit: www.nicolassolop.com
2. PodcastRE record for Serial, S01E01 (author’s screen grab)
3. 3 versions of metadata for Episode 32 of the Aca-Media podcast. On the left, ID3 tag data from the iTunes version of the file. In the center, the same ID3 tag data from the version of the file available on the Aca-media site, with a different genre and fewer overall categories. On the right, metadata from the RSS feed at the show and episode level. (author’s screen grab)

Please feel free to comment.




Saving New Sounds: Podcasts and Preservation
Jeremy Wade Morris / University of Wisconsin-Madison

dog

Golden Age of Podcasts for Everyone!

We are, as commentators have noted, in the midst of a “Golden Age of Podcasts”; a moment where the choice for quality digital audio abounds, and where new voices and listeners connect daily through earbuds, car stereos, home speakers or office computers. Depending on how you define it, podcasting is either just over 10 years old, more than 20 years old, or merely the latest soundwave in radio’s much longer history. [ ((Bottomley, Andrew J. (2016) “Internet Radio: A History of a Medium in Transition.” [Dissertation] Order No. 10154207. The University of Wisconsin – Madison. ProQuest Dissertations.))] However you date it, in the decade since 2004 when the term “podcasting” was inadvertently coined the format has exploded: there are now over 300,000 podcasts and 8 million episodes in over 100 languages, with new ones launching every day. [ ((Hammersley, Ben. (2004, February 12). “Audible Revolution.” The Guardian. Section T1. Accessed July 13, 2007 http://www.theguardian.com/media/2004/feb/12/broadcasting.digitalmedia.))]

Given how ubiquitous and available podcasts are, you might assume they would not face the same preservation risks as, say, old radio tape reels, transcription discs or celluloid film stock. Podcasts are largely free and their near-instant availability on multiple devices makes them seem as if they are in endless supply. They take up relatively few megabytes, which makes it easy to store a lot of them, and they are often available through multiple channels and aggregators (iTunes, Stitcher, SoundCloud, etc.).

premium paywall

Author screenshot of the premium paywall for the WTF with Marc Maron podcast.

But podcasts are surprisingly vulnerable; podcast feeds end abruptly, cease to be maintained, or become housed in proprietary databases, like iTunes, which are difficult to search with any rigor. Many podcasts get put behind paywalls as they get popular, or as back catalogues become a potential source of revenue. Then there’s the precarity of the very platforms that help make up podcasting’s diffuse and sometimes DIY infrastructure: I recently heard from an independent podcaster who had been hosting their show via the file management app/website Dropbox, but once that company made significant changes to its “public folder” feature, the podcaster was left scrambling to find another solution for where to host their files (and had to return to older shows to update the URLs and locations of new files).

Dropbox public folder

Author screenshot of a contingent platform, from a Dropbox press release.

It’s not just under-resourced independent podcasters whose files are at risk though. Well-known Internet entrepreneur and former MTV VJ Adam Curry shares a similar story. In 2014, he sent out a tweet asking his 40,000+ followers a relatively straightforward question: “Looking for a full archive of Daily Source Code mp3s.” The podcast he was trying to track down, the Daily Source Code, was an early (2004), and relatively popular podcast, that helped shaped the emerging format. It was an odd request, in some ways, since Curry was the creator, host and producer of the Daily Source Code (which ran from 2004-2013 and over 860 episodes). It’s not entirely clear what happened to Curry’s original copies of the shows; but it’s clear he doesn’t have them: “For a number of [stupid and careless] reasons, I am not in possession of most of these.” [ ((Curry, Adam (2014, January). “The Daily Source Code Archive Project: Bringing The DSC Back”. [blog] Accessed October 22, 2016 http://blog.curry.com/2014/01/15/theDailySourceCodeArchiveProject.html))] If the very people producing these new artifacts of audio culture aren’t necessarily saving their work, who is?

Adam Curry

Author screenshot of a tweet by Adam Curry looking for archives of his own show.

Of course, we can’t fault Curry for not saving the shows. If you’ve ever produced a podcast, you know that just getting the audio up and running, day after day, week after week, is accomplishment enough. There are countless hosts, producers and engineers without the foresight, budgets or means to label, store and archive their audio. Also, because of the mundane nature of a lot of podcasts, many podcasters probably do not realize the audio they are making is shaping the early stages of this emerging format, and doing so in a way that media historians, scholars and hobbyists might later want to analyze, research, teach and reference.

Unfortunately, we know this from precedent. Much of radio’s history has been lost to vagaries of time and only now are we starting to make sense of what we’re missing. The Radio Preservation Taskforce, for example, is working hard to try and preserve what remains of radio’s past, but claims that close to 75% of historical radio recordings in the U.S. have already been lost, destroyed, or are otherwise inaudible. The numbers are similar, if not worse, for silent films.

Podcasts might be newer than pre-1975 radio, and more digital and accessible than silent films, but this alone doesn’t ensure their continued existence. We are deep enough into our experiences with technologies like the world wide web, spinning disc hard drives, and error 404s to know that digital objects bring new challenges for saving, locating and retrieving data over time. [ ((Brügger, Neils (ed.). 2010. Web History. New York: Peter Lang))] Thankfully, sites like The Internet Archive are addressing some of these challenges, and providing new tools for thinking through, and doing, digital histories. The Internet Archive also has a growing audio database, part of which is devoted to podcasts. There are also a number of libraries that are beginning to bolster their digital audio collections and to take podcasts seriously as a format that deserves attention and long-term stewardship.

For the last few years, I’ve been coordinating a revolving team of students, technicians and faculty (primarily Dr. Eric Hoyt), in order to build a site to preserve podcasts and make them more researchable for audio scholars and enthusiasts. You can try out the beta version of PodcastRE (short for Podcast Research) to search for keywords and metadata associated with the 240,000+ audio files and over 1300 podcast feeds. There are also several thousand interactive transcripts (thanks to the good folks at AudioSearch). It’s far from comprehensive, but it’s growing daily and it will, when it’s complete, make podcasts and other born-digital audio as easy to use and research as textual resources you’d find in a library. It’ll also create a repository for these often vulnerable and ephemeral media texts.

screenshot of PodcastRE

Author screenshot of http://podcastre.org, the beta version of the database we are building to help preserve podcasts and make them more useable for researchers.

Ultimately, we hope the database will allow media and sound researchers to ask questions about podcasts and podcasting: how do podcasts differ, sonically and aesthetically, from radio? What new voices and perspectives do podcasts make audible and which ones do they silence? In what ways are the traditional conventions of the broadcasting industry shaping this new outlet? How are producers and consumers reimagining the broadcasting in light of podcasts? But we’re also hoping researchers from a broad array of disciplines and fields will be able to use podcasts and audio as resources to address a wide range of humanistic and scientific questions.

Whether we’re in some new golden age of audio, or whether we’re just hearing the vibrations of radio reformatted, we can at least hopefully agree that podcasting is a vibrant and growing space for new kinds of listening publics. [ ((Berry, Richard. 2016. “Podcasting: Considering the evolution of the medium and its association with the word ‘radio’.” Radio Journal: International Studies in Broadcast & Audio Media 14 (1):7-22. doi: 10.1386/rjao.14.1.7_1.; Hilmes, Michele. 2013. “The New Materiality of Radio: Sound on Screens.” In Radio’s New Wave: Global Sound in the Digital Era, edited by Jason Loviglio and Michele Hilmes, 43-61. New York: Routledge.; Lacey, Kate. 2013. Listening Publics : The Politics And Experience Of Listening In The Media Age. Cambridge, UK; Malden, MA: Polity Press.))] If so, you’d think we’d have a more comprehensive strategy for saving these new sounds than optimistically assuming podcast producers are keeping proper backup copies of their shows, or that platforms like Dropbox, iTunes or SoundCloud will continue to provide the same kinds of services for the foreseeable future.

By virtue of the fact they are taking part in a format’s infancy, today’s podcasters are making history by default. What today’s podcasters are producing will have value in the future, if not for its content, but for it tells us about radio and audio’s longer history, about who has the right to communicate and by what means. [ ((Sterne, Jonathan, Jeremy Wade Morris, Michael Baker, and Ariana Moscote Freire. 2008. “The Politics of Podcasting.” Fibreculture (13). Available at http://thirteen.fibreculturejournal.org/fcj-087-the-politics-of-podcasting/))] If we’re not making efforts to preserve podcasts now, we’ll likely find ourselves in the same sonic conundrum many radio historians now find themselves in: writing, researching and thinking about a past they can’t fully hear.

Luckily for Curry, shortly after his tweet for help, he discovered that a “super friend of the show” had a copy of the entire Daily Source Code archive and was uploading it and making to available to fans through Bit Torrent Sync. As with much of what we have left of radio’s golden age, fans and enthusiasts were helping rebuild the missing archive. As a result, one of podcasting’s first big shows wasn’t lost to time. The same can’t be said for many other feeds that have already disappeared and the many more that might if we don’t make preserving podcasts a priority.

Image Credits:

1. Golden Age of Podcasts for Everyone!
2. Author screenshot of the premium paywall for the WTF with Marc Maron podcast.(author’s screen grab)
3. Author screenshot of a contingent platform, from a Dropbox press release. (author’s screen grab)
4. Author screenshot of a tweet by Adam Curry looking for archives of his own show. (author’s screen grab)
5. Author screenshot of http://podcastre.org, the beta version of the database we are building to help preserve podcasts and make them more useable for researchers. (author’s screen grab)

Please feel free to comment.