
Audio Guide
Audio Decision Framework
The 5-step process: define emotional goal β identify audience β match format β select genre and voice β validate and comply.
What you'll learn in this guide
5 Steps
A repeatable decision process that eliminates audio guesswork
ZorgSocial Audio Framework
6
Core emotional goals that cover 95% of advertising objectives
Emotional Branding Research
3Γ
Faster audio production when teams follow a documented decision framework
ZorgSocial Workflow Analytics
40%
Reduction in creative revision cycles with upfront audio strategy
Agency Production Benchmarks 2024
Audio Decision Framework
When you are unsure where to start, the Audio Decision Framework guides you through five sequential steps: defining your emotional goal, identifying your audience, matching format to audio, selecting genre and voice, and validating compliance.
Audio Decision Quick-Reference Matrix
| Decision Step | Key Question | Primary Input | Output / Deliverable |
|---|---|---|---|
| 1. Emotional Goal | What single emotion should the audience feel? | Campaign brief, brand values | One primary emotion + one secondary emotion |
| 2. Audience ID | Who are we speaking to and what do they listen to? | Audience personas, listening data | Audience audio profile (age, culture, habits) |
| 3. Format Match | What format and platform is the ad running on? | Media plan, platform specs | Audio format brief (duration, specs, constraints) |
| 4. Genre & Voice | Which genre and voice deliver the target emotion to this audience? | Genre table (G-02), Voice table (G-03) | Genre selection + voice casting brief |
| 5. Validate & Comply | Is this audio legally and culturally compliant? | Industry rules (G-09), licensing (G-12) | Compliance sign-off + licence confirmation |
Step 1 β Define Your Emotional Goal
Every piece of audio advertising exists to make someone feel something. Before you choose a genre, a voice, a tempo, or a sound effect, you must answer one question: What single emotion should the audience feel after hearing this ad?
This is the foundational step that everything else flows from. Get this wrong, and every downstream decision β genre, voice, tempo, SFX β will be misaligned. Get it right, and the rest of the framework becomes almost intuitive.
The Six Core Emotional Goals:
Most advertising objectives map to one of six emotional targets. Choose ONE as your primary goal and optionally one as a secondary:
Trust The audience should feel confident, reassured, and safe. Trust is the primary emotion for financial services, healthcare, insurance, legal services, and government communications.
- Audio cues: warm voice, moderate tempo (80β100 BPM), acoustic or orchestral genre, minimal SFX
- What to avoid: aggressive music, fast-talking voiceover, electronic beats (can feel impersonal)
Joy The audience should feel happy, uplifted, and positive. Joy is the primary emotion for lifestyle brands, food and beverage, entertainment, children's products, and celebration campaigns.
- Audio cues: bright voice, upbeat tempo (110β130 BPM), pop or acoustic genre, playful SFX
- What to avoid: minor keys, slow tempos, serious or authoritative voice tones
Excitement The audience should feel energised, anticipating, and motivated to act. Excitement is the primary emotion for product launches, sales events, sports, gaming, and limited-time offers.
- Audio cues: energetic voice, fast tempo (120β140 BPM), electronic or hip-hop genre, dynamic SFX (whooshes, impacts)
- What to avoid: slow builds, ambient textures, whispery voices
Calm The audience should feel peaceful, centred, and in control. Calm is the primary emotion for wellness, luxury, spa/hospitality, meditation apps, and premium real estate.
- Audio cues: gentle voice, slow tempo (60β80 BPM), ambient or classical genre, nature SFX (water, birds)
- What to avoid: percussion-heavy music, fast speech, staccato rhythms
Inspiration The audience should feel motivated, aspirational, and empowered. Inspiration is the primary emotion for education, nonprofit, career platforms, personal development, and brand purpose campaigns.
- Audio cues: building voice (starts quiet, grows confident), rising tempo, cinematic or orchestral genre, crescendo
- What to avoid: static energy, monotone delivery, repetitive loops
Urgency The audience should feel time-pressure, scarcity, and the need to act immediately. Urgency is the primary emotion for flash sales, countdown campaigns, limited inventory, and direct-response advertising.
- Audio cues: fast-paced voice, high tempo (130+ BPM), percussive or electronic genre, ticking/countdown SFX
- What to avoid: relaxed pacing, ambient music, meandering intros
The Single-Emotion Rule: Choose ONE primary emotion. An ad that tries to make people feel "calm AND excited" will make them feel nothing. If you have a secondary emotion, it should complement the primary β not contradict it. For example: Trust (primary) + Calm (secondary) works. Trust (primary) + Urgency (secondary) creates confusion.
Document it. Write your emotional goal on the campaign brief before any audio production begins. This becomes the measuring stick for every creative decision that follows.
Step 2 β Identify the Audience
The same emotional goal requires completely different audio execution depending on WHO the audience is. Trust sounds different to a 25-year-old fintech user than to a 55-year-old private banking client. Joy sounds different to a teenager than to a parent.
Build an Audience Audio Profile:
For every campaign, answer these five questions about your target audience before making any audio decisions:
1. Age Range β What generation are they? Age is the strongest predictor of genre preference:
- Gen Z (18β27): Hip-hop, lo-fi beats, electronic, trending TikTok sounds. Short attention span β audio must hook in first 1β2 seconds
- Millennials (28β43): Indie, pop, acoustic, podcast-style narration. Comfortable with longer formats but expect production quality
- Gen X (44β59): Rock, R&B, classic pop, authoritative voiceover. Value substance over style
- Boomers (60+): Classical, jazz, easy listening, warm broadcast voice. Prefer clear, measured delivery
2. Cultural Background β What musical traditions resonate? This is critical for MENA markets:
- Gulf Arabic audience: Khaleeji music cues, oud and percussion, Gulf dialect voiceover
- Levantine audience: Lebanese/Syrian pop influences, Levantine dialect
- Egyptian audience: Shaabi rhythms, Egyptian Arabic dialect, dramatic vocal style
- Pan-Arab (MSA) audience: Modern Arabic pop, MSA voiceover, avoid regional-specific dialects
- Expatriate/international audience: Western pop, English voiceover, globally recognisable sounds
- Mixed audience: Test both Arabic and English variants (see G-11 A/B Testing)
3. Platform Habits β Where do they consume content?
- TikTok-first audience: Expect trending sounds, music-forward, short-form. Audio must work without context β scrollers decide in 0.5 seconds
- Instagram/Facebook audience: Sound-off default. Audio enhances but must not be essential. Strong visual-first, audio-second design
- YouTube audience: Longer attention span, higher audio expectations. Pre-roll audio must differentiate from the content
- Podcast audience: Highly audio-literate. Expect native, conversational ads. Reject anything that sounds "addy"
- LinkedIn audience: Professional, measured, authoritative. Audio should feel like a boardroom, not a nightclub
4. Content Sensitivity β What topics require careful audio treatment? Some campaign topics require audio restraint:
- Healthcare and illness: gentle, empathetic tone. No upbeat music behind serious health messaging
- Financial hardship: no luxury-sounding music. Avoid anything that feels tone-deaf to the audience's situation
- Loss and bereavement: minimal music, soft voice. Silence can be more powerful than sound
- Regulatory topics: clear, measured delivery. No dramatic music that might seem manipulative
5. What Do They Already Listen To? The most valuable audience insight for audio decisions is: what does this audience choose to listen to in their free time? Use Spotify Wrapped data, podcast listenership surveys, radio format ratings, and social listening to understand your audience's existing audio preferences. Your ad audio should feel at home in their listening environment β not alien to it.
The Audience Audio Profile Template: Document your findings in a one-page Audience Audio Profile:
- Target age: [range]
- Cultural/linguistic background: [details]
- Primary platform(s): [list]
- Content sensitivities: [notes]
- Preferred listening: [genres, artists, podcast types]
- Audio "do nots": [specific sounds or styles to avoid]
This profile becomes the filter for Step 4 (Genre & Voice selection).
Step 3 β Match Format to Audio
Different ad formats impose different audio constraints. A 6-second bumper ad on YouTube requires fundamentally different audio thinking than a 60-second podcast mid-roll. The format determines your audio budget β how many elements you can include and how complex your sound design can be.
Format-to-Audio Mapping:
6-Second Bumper (YouTube, Social Pre-Roll)
- Audio budget: ONE element only β either a sonic logo, a single voice line, or a music sting
- No time for music beds, voiceover AND SFX together. Pick one and make it count
- Best use: brand recall (sonic logo), single message ("Sale starts Friday"), or attention grab (SFX hook)
- Worst mistake: trying to cram a 30-second script into 6 seconds with speed-reading
15-Second Social Ad (TikTok, Reels, Stories)
- Audio budget: voice OR music-forward, rarely both at full intensity
- Structure: 2-second hook (SFX or voice question) β 10-second message β 3-second CTA
- Music role: set the emotional tone in the first 2 seconds. Trending sounds on TikTok can boost algorithmic reach
- Voice role: one clear message delivered conversationally. No "announcer voice" β it triggers scroll-away
30-Second Spot (Digital, Radio, TV)
- Audio budget: full production β voice, music bed, SFX accents, and sonic logo
- Structure: 5-second hook β 15-second body β 5-second CTA β 5-second brand close
- This is the workhorse format. Music bed should support the voice without competing. Dynamic range matters β create a mini arc with a beginning, middle, and end
- Mix priority: voice sits 6β8 dB above music bed. SFX are used sparingly for emphasis, not decoration
60-Second Podcast Mid-Roll
- Audio budget: voice-primary with optional subtle music bed
- Structure: host-read or conversational. Should sound like part of the podcast, not an interruption
- Music bed: if used, it should be minimal β a light texture underneath, not a produced track
- Key principle: authenticity. Podcast listeners are audio-literate and will immediately detect (and resent) a pre-produced "radio ad" inserted into their show
Long-Form Audio (Brand Story, Audio Documentary, 2+ Minutes)
- Audio budget: full cinematic production β multiple voice tracks, music movements, layered SFX, soundscaping
- Structure: narrative arc with distinct chapters. Music evolves (not loops). Voice delivery changes with the emotional journey
- This format rewards production quality. It is the audio equivalent of a brand film
Platform-Specific Audio Specs:
- TikTok: Sound-on default. Audio is essential. Target β12 to β14 LUFS. Trending sounds boost reach
- Instagram Reels: Sound-off default. Audio enhances but is not required. Caption-first design
- YouTube: Pre-roll is skippable after 5 seconds. The first 5 seconds of audio must hook or you are wasted spend
- Spotify/Audio Streaming: Audio-only, high attention. Full production quality expected. β14 LUFS
- Radio: Broadcast standards. β23 LUFS. Disclaimer requirements are strict
- LinkedIn: Professional context. Authoritative voice, measured pace. Music should feel corporate-appropriate
Device Considerations:
- Mobile-first (most social): optimise for phone speakers (see G-12 Mobile Optimisation)
- Desktop (LinkedIn, YouTube long-form): wider frequency range available, but still design for phone speakers as backup
- Smart speakers and connected audio (podcast, streaming): high-quality audio expected. Full frequency range
Step 4 β Select Genre and Voice
With your emotional goal defined (Step 1), your audience profiled (Step 2), and your format constraints understood (Step 3), you now have the context needed to make the two most impactful audio decisions: genre and voice.
Genre Selection: The Emotional-Audience Intersection
The right genre sits at the intersection of your emotional goal and your audience's preferences. Use the Genre Selector tool (or G-02 Music Genres reference table) and apply this logic:
Start with the emotional goal:
- Trust β Acoustic, Classical, Ambient, Soft Jazz
- Joy β Pop, Acoustic Pop, Afrobeats, Reggae
- Excitement β Electronic, Hip-Hop, Rock, Drum & Bass
- Calm β Ambient, Classical, Lo-fi, Nature Soundscapes
- Inspiration β Cinematic/Orchestral, Indie, Gospel, Acoustic
- Urgency β Electronic, Percussive, Trap, Drum & Bass
Then filter by audience:
- If the emotional goal suggests "Acoustic" but the audience is Gen Z TikTok-first β switch to Lo-fi or Indie Electronic (same emotional register, audience-appropriate genre)
- If the goal suggests "Electronic" but the audience is 55+ private banking β switch to Orchestral or Modern Classical (same energy, more culturally aligned)
- If the audience is Gulf Arabic β consider Khaleeji-influenced versions of any genre, or traditional Arabic instrumentation (oud, qanun) with modern production
The Neutrality Principle: When in doubt β when you are not sure which genre to choose β select neutral audio that will not alienate any part of your audience. Ambient textures, soft acoustic guitar, and gentle piano are almost universally inoffensive. They may not excite, but they will not repel. This is safer than choosing a genre that thrills 50% of the audience and annoys the other 50%.
Voice Selection: The Character of Your Brand's Sound
Voice is the most personal audio element. The right voice creates an instant connection; the wrong voice creates an instant barrier. Use the Voice Matcher tool (or G-03 Voice Styles reference table).
Key Voice Decisions:
Male vs. Female: There is no universal "better" β it depends on the audience and the emotional goal. Test both (see G-11 A/B Testing). General patterns:
- Male voices tend to score higher on authority and depth
- Female voices tend to score higher on warmth and approachability
- For gender-neutral positioning: consider non-binary or androgynous voice options
Delivery Style:
- Authoritative: boardroom energy, measured pace, clear diction. Best for B2B, finance, healthcare
- Conversational: friend-talking-to-friend, natural pauses, casual language. Best for social ads, D2C, lifestyle
- Storyteller: narrative arc, emotional variation, cinematic. Best for brand stories, awareness campaigns
- Energetic: high-energy, fast-paced, enthusiastic. Best for product launches, sales events, sports
- Intimate: close-mic, whisper-adjacent, personal. Best for luxury, wellness, late-night content
Language and Dialect (MENA Markets): This decision is as important as voice style in MENA:
- Gulf Arabic: for Saudi, UAE, Kuwait, Qatar, Bahrain, Oman audiences
- Levantine Arabic: for Lebanon, Syria, Jordan, Palestine audiences
- Egyptian Arabic: for Egyptian audiences (also widely understood across MENA due to media influence)
- MSA (Modern Standard Arabic): for pan-Arab campaigns or when no single dialect fits. Sounds formal β use only when formality is appropriate
- English: for international/expatriate segments or code-switching brands
- Bilingual (Arabic + English): for brands that straddle both. Mix must feel natural, not forced
Voice + Genre Harmony: Voice and genre must complement each other. An authoritative deep voice over lo-fi beats creates dissonance. A whisper-intimate voice over high-energy electronic creates confusion. Match the energy:
- Low-energy voice β low-energy genre
- High-energy voice β high-energy genre
- If the voice and genre fight each other, the audience feels uncomfortable without knowing why
Step 5 β Validate and Comply
You have defined your emotional goal, profiled your audience, matched the format, and selected your genre and voice. Before any audio goes into production, Step 5 is the final gate: legal, cultural, and regulatory validation.
Skipping this step is the most expensive mistake in audio advertising. A single compliance violation can result in ad takedowns, regulatory fines, brand damage, and wasted production budget. Validation takes 30 minutes. Recovering from a compliance failure takes weeks.
Legal Validation:
Music Licensing Check:
- Is every piece of music properly licensed? (See G-12 Music Licensing for full requirements)
- Does the licence cover all target territories? A licence for UAE may not cover Saudi Arabia
- Does the licence cover the specific platforms? "Social media" licences may exclude paid advertising
- What is the licence duration? A 1-year licence means the ad must be pulled after 12 months
- If using AI-generated music: confirm the platform terms permit commercial advertising use
Voice Talent Rights:
- Is there a signed voice talent agreement covering commercial use, territory, duration, and media?
- Does the agreement allow AI voice cloning or modification? (Increasingly important with AI voice tools)
- For celebrity or influencer voices: are likeness rights and endorsement terms clear?
- For AI-generated voices: confirm the voice model was created ethically and legally (not cloned from a real person without consent)
Trademark and Brand References:
- Does the script mention any competitor brands? If so, is the reference factual and non-disparaging?
- Are all brand name pronunciations correct? (Especially important for Arabic transliteration of English brand names)
Cultural Validation (Critical for MENA):
Religious and Social Sensitivity:
- Does the audio respect religious sensitivities? No music or content that could be considered disrespectful during religious observances
- Ramadan-specific: audio tone should shift to reflective, community-focused, and generous during Ramadan. Avoid hard-sell urgency during the holy month
- National Day celebrations: audio should feel patriotic and respectful. Avoid trivialising national identity
- Gender representation in voiceover: ensure representation aligns with local norms and brand values
Language Quality Check:
- Arabic grammar and diacritics (tashkeel) verified by a native speaker β not just a translator
- Dialect consistency: if you chose Gulf Arabic, every word must be Gulf Arabic. One Egyptian word breaks immersion
- Bilingual content: code-switching must feel natural. Forced Arabic-English mixing sounds amateur
- Pronunciation of numbers, dates, and technical terms: verify these are natural in the chosen dialect
Regulatory Validation by Industry:
Financial Services (CBUAE, SAMA, CMA):
- Risk disclaimers read at comparable speed and volume to main claims
- "Past performance" disclaimers for investment products
- Interest rate and fee disclosures must be complete and audible
Healthcare and Pharma (DOH, MOH, SFDA):
- Side effects listed at the same pace and volume as efficacy claims
- "Consult your doctor" statement required for OTC and prescription products
- No audio that implies guaranteed outcomes
Real Estate (RERA, DLD):
- Project registration numbers must be stated
- "Prices subject to change" disclaimer required
- Off-plan marketing regulations vary by emirate
Food and Beverage:
- Health claims must be substantiated and qualified
- "Part of a balanced diet" or equivalent qualifying statement
- Halal certification references must be accurate
The Compliance Sign-Off: Before production begins, create a Compliance Sign-Off document:
- Music licensing: confirmed β (with licence reference numbers)
- Voice talent agreement: signed β
- Cultural review: approved by regional team β
- Regulatory check: approved by legal/compliance team β
- Disclaimer text: finalised and approved β
ZorgSocial Compliance Checker automates the regulatory validation step β it cross-references your audio content against industry-specific rules for your target markets and flags potential violations before production. Use it as the first pass; human compliance review remains the final authority.
Putting It All Together: Real-World Walkthroughs
Theory becomes clear through practice. Here are three real-world scenarios showing the Audio Decision Framework applied end-to-end.
Scenario 1: UAE Bank β New Savings Account Campaign
Step 1 β Emotional Goal: Trust (primary), Calm (secondary). Customers should feel their money is safe and their future is secure.
Step 2 β Audience: 30β50 year-old UAE residents (mix of Emirati nationals and long-term expats). High financial literacy. Primarily Arabic-speaking with English as second language. Platform habits: Instagram, YouTube, banking app push.
Step 3 β Format: 30-second Instagram video ad + 15-second YouTube pre-roll + 60-second podcast mid-roll on a popular UAE finance podcast.
Step 4 β Genre & Voice: Genre: Modern Classical (piano + subtle strings) β sophisticated, trustworthy, not stuffy. Voice: Male, warm-authoritative, Gulf Arabic dialect. Moderate pace (140 WPM). Music bed at β14 LUFS, voice 7 dB above.
Step 5 β Validate: CBUAE compliance for savings products β "Terms and conditions apply" disclaimer at comparable volume. Music licensed for UAE + broader GCC. Voice talent agreement for 12 months across digital platforms.
Result: Professional, trustworthy audio that feels premium without being cold. The Gulf Arabic dialect creates local connection. The modern classical genre signals sophistication without feeling old-fashioned.
Scenario 2: Saudi E-Commerce β Ramadan Flash Sale
Step 1 β Emotional Goal: Urgency (primary), but tempered with cultural respect for Ramadan. Not aggressive urgency β more "generous opportunity you do not want to miss."
Step 2 β Audience: 18β35, Saudi Arabia. Digital-native, TikTok-first. Arabic-speaking. Price-conscious but brand-aware. Shopping behaviour peaks post-iftar (after sunset).
Step 3 β Format: 15-second TikTok ad (vertical, sound-on) + 6-second YouTube bumper + in-app push notification audio.
Step 4 β Genre & Voice: Genre: Modern Arabic Pop with Khaleeji influence β energetic but culturally appropriate for Ramadan. NOT heavy electronic beats. Voice: Female, young, enthusiastic Saudi dialect. Fast but clear (160 WPM). Includes countdown SFX ("3 days left") but NO aggressive alarm sounds.
Step 5 β Validate: No content that trivialises Ramadan. Sale messaging framed as "Ramadan generosity" not "panic buying." Music does not include inappropriate content. Saudi e-commerce disclosure requirements met. Scheduled for post-iftar time slots.
Result: Energetic and timely, but culturally respectful. The Saudi female voice feels authentic to the audience. Khaleeji pop genre signals "local brand that understands us."
Scenario 3: Global SaaS β Product Launch for MENA Market
Step 1 β Emotional Goal: Excitement (primary), Inspiration (secondary). This is a breakthrough product that will transform how businesses operate.
Step 2 β Audience: 28β45, C-suite and senior managers across GCC + Egypt. Bilingual (Arabic/English). LinkedIn-first, YouTube secondary. Tech-savvy, time-poor, sceptical of hype.
Step 3 β Format: 30-second LinkedIn video ad + 60-second YouTube explainer with audio narration.
Step 4 β Genre & Voice: Genre: Cinematic/Orchestral with modern electronic undertones β signals innovation and scale. NOT startup-quirky (this audience is corporate). Voice: Male, authoritative but approachable, MSA with slight Gulf inflection. English version also produced for the same campaign (bilingual A/B test per G-11).
Step 5 β Validate: No superlative claims ("the best," "number one") without substantiation. Tech terminology verified for accurate Arabic translation. Music licensed for pan-MENA + English-speaking markets. Both Arabic and English versions tested for brand name pronunciation.
Result: Professional and aspirational without being overhyped. Cinematic genre signals "this is a big deal" without the startup clichΓ©s. MSA voice reaches the broadest pan-Arab professional audience.
The Framework as a Living Document: Print or save the 5-step framework as a one-page reference. Tape it to the wall of your creative studio or pin it in your team's Slack channel. Run every campaign through it before production begins. Over time, your team will internalise the steps and make faster, better audio decisions instinctively.
Apply what you learned in ZorgSocial
Build your Audio Brief with the Decision Framework
Every concept in this guide maps directly to ZorgSocial tools. Explore the step-by-step tutorials for hands-on application.
Next Step
Apply this inside ZorgSocial
Use ZorgSocial AI tools to build your audio campaign.
Production Standards