Is Your Podcast Video Unwatchable? Try These Fixes.
Make your remote interviews less static, rethink your voiceover for video, and get creative with your B-roll strategy
Welcome to the Creativity Business, a newsletter about earning attention and differentiating yourself as a marketer or content creator. If you’re not a subscriber, sign up and get content & differentiation strategy delivered to your inbox every two weeks for free.
I was not fully prepared for the interest in my last post about video strategy for audio podcasters. Thank you for all the great feedback and welcome to all the new subscribers!
This is not going to become a video strategy newsletter, but I did want to follow up on some ideas raised in the original post and answer some of the questions I’ve been getting. The biggest theme was curiosity about ways to make more compelling video for show that use a lot of remote interview recordings.
I put on my director/producer/editor hats and tried to figure out how I would approach making remote recordings more visually engaging. I’m sure there are many more and better creative ideas in practice, but this edition of the newsletter features the things I would personally experiment with to see if I could make it work.
But first, let’s address the reason why we need to make remote recordings more engaging in the first place…
Why are remote video recordings a bad video strategy?
Video is obviously a visual medium. Our eyes consume visual material differently than our ears consume audio. Our eyes (and our brains) like variety, movement, and action. Zoom, Meet, Riverside, or Descript recordings are the opposite of what our eyes like. (Riverside and Descript have some great editing tools, and Descript just announced some really smart new ones—I’m just talking about the raw recordings themselves.)
Remote recordings shot on webcams are static, locked-off talking heads. There is no movement. There is no variety. There is nothing interesting visually happening at all.
Imagine if, after a long day, your partner comes home and asks you to watch an hour-long video. “What’s it about?” you ask. “We put a phone on a tripod in the corner of a boardroom and recorded a meeting a work.”
Can you feel the instant dread in your stomach right now? An hour of a locked-off video. No shot changes. No camera movement. Nothing.
Editing and visual variety are at the heart of great video
We will not tolerate unedited video on a tripod. We get bored SO quickly.
Enjoyable video is composed of a variety of shots and a variety of footage. The shots change rapidly.
This means that if you want to earn and sustain attention… you need to have a lot of shots, and you need to make a lot of edits.
Watch the first minute of anything on the first page of Netflix. Turn on your TV and watch any show or any commercial. Watch a popular Mr Beast video on YouTube, like this:
Pay attention to how often the shot changes. Pay attention to the visual pacing created by cuts. This is what audio-first shows are competing against.
Have a look at the editing in the opening of this Diary of a CEO episode. It’s designed to hook you in and make you want to watch the episode (also Mr Beast!)—it’s really well produced and edited. Again, this is what you’re competing against for the time and attention of audiences on YouTube.
Now look at the YouTube video of an audio-first podcast recorded remotely. (Out of courtesy, I will not link to anyone in particular 😜) It’s almost guaranteed to be astonishingly sluggish visually. When your mind begins to wander, look down at the timeline. How long did you last before you got bored? Even more importantly, look at how many views the visually uninteresting video has so far.
I know some of you will argue that many people listen on YouTube instead of watch… but that’s likely in part because it’s visually boring. If it was visually amazing, I’m confident you’d get a lot more eyeballs and engagement.
At bare minimum, if you’re trying to decide what to watch on YouTube—whether it’s on your TV, your phone, a laptop, or anything else—you are looking for signals that your choice will be a valuable use of your time and attention. A remote recording thumbnail or a boring first minute will prevent most people from giving you a chance.
However, if you decided to make your show visually engaging, nothing but positive things could happen, right?
Turn Voiceover Into Standups
Voiceover is the audio tool for connective tissue in a story or interview. You write to bridge gaps, segue between ideas, or add additional information not provided by others. For audio, this is all done in a recording studio (or more likely these days, in your home office).
What if, in addition to doing the voiceover in the pristine audio environment, you also had your host head out into the real world and record them again on video. In news and current affairs, this is called a standup, and it’s the video equivalent of a voiceover.
You know what this looks like from watching any news broadcast. In between interview clips, the reporter is suddenly out on a sidewalk, talking directly to the camera. Sometimes they’re walking. Sometimes they’re demonstrating something visually. Most of the time, they are in a strategic location that is directly or thematically related to the story being told. Visually, it’s powerful because the reporter is looking directly into the lens—they’re talking directly to you.
You might be asking, why don’t you just shoot the video in the audio studio?
Because it’s boring.
See for yourself…
If you don’t have any voiceover in your show… maybe you could find reasons to add some. It could be the introduction to the show, the ending of the show, or transitions between different segments or parts. Creating opportunities like this that add visual variety is not hard creatively.
Is it extra work recording the voiceover in a great audio location and then shooting the same words somewhere else that is visually interesting on video? Yes, it is. But it doesn’t cost much and it will instantly spice up your show visually.
Consider Hybrid Recording: Remote + an IRL Videographer
On remote recordings, the locations are generally not optimized for video because it’s mostly in a home office, a basement, or a workplace. There is no lighting. There is no thought given to the background or location.
Location plays a role in video. Every visual element says something and adds to the story being told. Pre-pandemic, audio companies used to record a lot more often on location with humans showing up to do tape syncs. During the pandemic, we all moved to remote recording software. Now that the pandemic is fading into the background, we are still hooked on remote recording software. We’ve gotten… lazy?
If video is important to your strategy, it might well be worth getting an actual human videographer to meet with an interview subject, take them to an interesting location, and record for both audio and video at the same time. You can prep the videographer with question lines and storylines, or you can still be piped in and conduct the interview yourself remotely. What you need is someone with a visual sensibility, one-to-three cameras, mics, and some basic lighting to make your guest’s interview look amazing.
While on location, get the videographer to record footage (b-roll) of the interview subject doing things. If you’re interviewing someone at a landfill, shoot extra footage of the landfill. Shoot the subject’s car pulling up at the landfill. Shoot the subject walking through the landfill. Shoot them rifling through material in the landfill. Basicaly, shoot them doing things related to what they’re talking about in the interview.
Instantly, the same interview now looks great, is set in an appropriately visually interesting location, and has coverage that can be used to hide edits, add visual variety, and help tell the story visually.
Attention span is different for video than audio
There are exceptions to every rule, but generally speaking, people have much shorter attention spans when they are watching video online compared to listening to an audio podcast. Audio podcasts can deliver spectacular amounts of listening time every episode if they are done well, sometimes getting 85-95% completion rates on 30-45 minute shows.
Video, by comparison, sheds viewers much more quickly. Most videos are lucky to even sniff an audio-like completion rate on a three-minute video. Short-form social videos on TikTok and Reels are training us to swipe to the next video the second we get bored.
Be honest—what is one of the first things you check before you decide to start watching a video on YouTube? It’s either length or viewcount, right? You are far more likely to say “yes” to a video of one minute compared to one hour. Or five minutes compared to thirty minutes. And there aren’t very many audio podcasts shorter than thirty minutes.
What’s the point? Your video really has to be compelling the entire time. And that means it likely needs to be shorter than your audio podcast by a considerable margin. The second you get bored watching your own podcast’s video is the telltale sign that every other viewer has almost certainly already left before you.
Make videos short. Leave people wanting more. Always.
Clips, Segments, or Parts?
So knowing that YouTube generally doesn’t earn the same attention span as audio, how should you proceed? Here are three different strategies you could consider for your show:
Clips
You could start with identifying short clips and use them as your primary video strategy. Not clips designed to promote your audio podcast, but short clips of the remote recordings that are interesting all by themselves. Standalone short-form video content that provides value for a viewer. If you can get several clips from each audio episode that work well and they are short enough where the lack of visual variety isn’t a massive problem, you’re off to the races.
Segments
If your show has a format and the format has several recurring segments, could one or more of those segments be isolated and distributed as a compelling video?
Parts
If your show is telling an extended story or unpacking an idea in a way that is hard to segment, you could consider chunking the episode into parts: Part 1, Part 2, Part 3, etc. TikTok creators do this all the time and it can be a very effective strategy for teasing people from one part into the next one when well executed.
Perhaps there are implications I’m not considering with having the audio and video versions of the show being different from each other, but from my point of view, you have to start with creating the best possible experience on each platform and that should be the priority.
Show, don’t tell. Visuals can tell stories without sound. Not using visual storytelling is a costly mistake.
There is a mantra in video production: show, don’t tell. Don’t have a person tell me a story or tell me how something works. Instead, show me what is happening. Show me how something works visually.
A common quality marker for video is that you should be able to know what’s happening even with the sound muted. This is the antithesis of how audio storytellers work. Human voices, music, sound effects, and well-written scripts are audio’s superpowers—they allow audio storytellers to bring ideas and images to life in our imaginations in ways that video can’t.
It’s no wonder audio storytellers might not find it natural to embrace purely visual storytelling, but that is what great video requires.
If an interview subject is talking about how to check various foods for freshness at a grocery store, it is so much more powerful to show someone picking up different fruits and vegetables in various states of freshness, and squishing them, poking them, etc, than it is to just hear about it.
If you think of video as a series of shots that form sequences, the selection of shots and the order in which they assemble into a sequence are all creative storytelling elements. Most video podcasts of locked-off talking heads obviously can’t pass this visual storytelling test.
So what can you do? In the above example, if you can’t get a videographer to take the subject to a grocery store and show you, grab your phone and go to the grocery store and get lots of footage of picking up fruits and vegetables and then squeezing and poking them. Get loads of different shots and angles.
If you can show people what you’re talking about, the show becomes much more visually engaging.
Creative B-Roll options
If heading out into the real world to shoot fruits and vegetables on your phone isn’t appealing, here are some weirder ways you could add additional visual variety to your show:
Create your own visual elements, like drawings by hand, or diagrams or posters in Canva.
You could try basic animation to add visual variety and storytelling (think South Park, not Pixar)
Stock video is plentiful and even exists inside software like Descript. Search for stock video related to what your subjects are talking about and use it for visual variety. It took me under three minutes in Descript to put this little fruit and vegetable testing montage together:
Stock photos are even more plentiful, many of them free or low-cost. You can pull a Ken Burns (or even use the Ken Burns Effect in iMovie) and have slow pans and zooms of photos of what the interviewee is talking about. Instant visual variety.
You could generate AI images related to what the interviewee is talking about and put a Ken Burns Effect on them.
You can now even generate short pieces of AI video/film with several different AI platforms that can serve as custom b-roll. Here’s the fairly surreal offering from Canva’s AI video generator. It took one minute to generate this grocery store b-roll:
Takeaways
How can you add visual variety to the video of your audio podcast?
What can you afford to add to your production process to make your videos worth the time and attention of viewers?
What elements can you rethink - like voiceover - that could instantly add production value and variety to your show?
What’s a B-roll strategy that works for you and your production workflows?
How long should the video version of your audio podcast be? How can you chunk it into smaller pieces that are better suited for video consumption? Is that something you’re comfortable with?
What are the small experiments or pilot projects you could undertake to ‘play’ with video creatively in the near future?
Earn It Updates
Ryan Alford and the Right About Now podcast
Ryan Alford is a force of nature and a ton of fun to talk with about marketing. His passion and energy are contagious and it’s obvious why he’s got such a terrific community around this very popular marketing show. Thanks for a great conversation, Ryan, and hope to do it again, soon.
Pilgrim Video Chat
Tobin Dalrymple is already nailing the idea of turning voiceover into video standups in interesting locations. Check out this snow-themed video chat - I love it!
Radiodays Europe in Athens March 9-11
The Earn It World Tour 2025 continues! I am getting very excited to head to Athens and talk about (shocker) how to earn attention as a marketer or advertiser, and also about creative ways to develop shows that truly stand out from the crowd. If you’re in audio and based in Europe, this is going to be a terrific event and I hope to see you there! (Use promo code PAR25ATH for 10% off)
Serving Instead of Selling: Quill Webinar March 18th
On March 18th, I’m joining Fatima Zaidi, CEO of Quill and CoHost, for a live webinar to talk about what really makes branded podcasts stand out.
(Spoiler: It’s not downloads or slick production. It’s about delivering a gift to your audience.)
We’ll cover:
✅ Why great branded podcasts are from you, not about you
🎧 How to create content that earns trust (not just airtime)
📊 Metrics that matter beyond downloads
Save the date:
📅 Date: March 18th
🕒 Time: 12:00-12:45 PM EST
Secure your spot: https://quill-podcasting.wistia.com/live/events/fbxwelqvk8
What’s Earned My Attention Recently
PSYCHEDUP Mental Illness podcast
I’ve had the pleasure of helping Dr. Diane McIntosh and the team at JAR Audio bring this podcast about mental illness to life. Each episode is a deep dive on a single mental illness, explaining what it is, what it’s like to have it, how to treat it with both therapy and medication, and debunking the cultural stigmas. This season focuses on ADHD, Depression, OCD, Panic Disorder, Bipolar Disorder, and PTSD. If you or anyone you know and love is suffering from one of these illnesses, PSYCHEDUP is a really informative show.
So Win
I know I’m late to be talking about a Superbowl ad, but I love this one. (So Win makes me feel the same way as when I think about the concept of '“earning it”. Set a high bar. Do the work. Reap the rewards.)
“Attention is the rarest and purest form of generosity. “
Simone Weil
(H/T Waking Up app from Sam Harris)
That’s it for this edition of The Creativity Business. What other video strategies are you experimenting with? Have you seen any unique video strategies for audio-first shows? Let me know what you’ve found!
And if you’re in the audio business, find these ideas interesting, and would like to talk more about video strategy, just reply back to this email and we can set up a call.
Thanks for reading,
Steve
Really insightful breakdown, Steve! Your points about adding visual variety and rethinking voiceovers as standups resonated deeply. Especially appreciated the reminder that video storytelling should be compelling even with the sound off—that's a powerful creative benchmark.
I’m curious if you've experimented much with AI-generated B-roll. The grocery store example was surreal, but it feels like AI might soon offer more realistic or creatively distinct visuals that could bridge the gap for remote interviews.
Thanks again for these practical tips!
Awesome suggestions, thank you!