G-136 min

Audio Guide

Audio Decision Framework

The 5-step process: define emotional goal → identify audience → match format → select genre and voice → validate and comply.

What you'll learn in this guide

Step 1: Define emotional goal

Step 2: Identify the audience

Step 3: Match format to audio

Step 4: Select genre and voice

Step 5: Validate and comply

1Key Statistics

5 Steps

A repeatable decision process that eliminates audio guesswork

ZorgSocial Audio Framework

Core emotional goals that cover 95% of advertising objectives

Emotional Branding Research

3×

Faster audio production when teams follow a documented decision framework

ZorgSocial Workflow Analytics

40%

Reduction in creative revision cycles with upfront audio strategy

Agency Production Benchmarks 2024

2Overview

Audio Decision Framework

When you are unsure where to start, the Audio Decision Framework guides you through five sequential steps: defining your emotional goal, identifying your audience, matching format to audio, selecting genre and voice, and validating compliance.

3Audio Decision Quick-Reference Matrix

Audio Decision Quick-Reference Matrix

Decision Step	Key Question	Primary Input	Output / Deliverable
1. Emotional Goal	What single emotion should the audience feel?	Campaign brief, brand values	One primary emotion + one secondary emotion
2. Audience ID	Who are we speaking to and what do they listen to?	Audience personas, listening data	Audience audio profile (age, culture, habits)
3. Format Match	What format and platform is the ad running on?	Media plan, platform specs	Audio format brief (duration, specs, constraints)
4. Genre & Voice	Which genre and voice deliver the target emotion to this audience?	Genre table (G-02), Voice table (G-03)	Genre selection + voice casting brief
5. Validate & Comply	Is this audio legally and culturally compliant?	Industry rules (G-09), licensing (G-12)	Compliance sign-off + licence confirmation

4Step 1 — Define Your Emotional Goal

Step 1 — Define Your Emotional Goal

Every piece of audio advertising exists to make someone feel something. Before you choose a genre, a voice, a tempo, or a sound effect, you must answer one question: What single emotion should the audience feel after hearing this ad?

This is the foundational step that everything else flows from. Get this wrong, and every downstream decision — genre, voice, tempo, SFX — will be misaligned. Get it right, and the rest of the framework becomes almost intuitive.

The Six Core Emotional Goals:

Most advertising objectives map to one of six emotional targets. Choose ONE as your primary goal and optionally one as a secondary:

Trust The audience should feel confident, reassured, and safe. Trust is the primary emotion for financial services, healthcare, insurance, legal services, and government communications.

Audio cues: warm voice, moderate tempo (80–100 BPM), acoustic or orchestral genre, minimal SFX
What to avoid: aggressive music, fast-talking voiceover, electronic beats (can feel impersonal)

Joy The audience should feel happy, uplifted, and positive. Joy is the primary emotion for lifestyle brands, food and beverage, entertainment, children's products, and celebration campaigns.

Audio cues: bright voice, upbeat tempo (110–130 BPM), pop or acoustic genre, playful SFX
What to avoid: minor keys, slow tempos, serious or authoritative voice tones

Excitement The audience should feel energised, anticipating, and motivated to act. Excitement is the primary emotion for product launches, sales events, sports, gaming, and limited-time offers.

Audio cues: energetic voice, fast tempo (120–140 BPM), electronic or hip-hop genre, dynamic SFX (whooshes, impacts)
What to avoid: slow builds, ambient textures, whispery voices

Calm The audience should feel peaceful, centred, and in control. Calm is the primary emotion for wellness, luxury, spa/hospitality, meditation apps, and premium real estate.

Audio cues: gentle voice, slow tempo (60–80 BPM), ambient or classical genre, nature SFX (water, birds)
What to avoid: percussion-heavy music, fast speech, staccato rhythms

Inspiration The audience should feel motivated, aspirational, and empowered. Inspiration is the primary emotion for education, nonprofit, career platforms, personal development, and brand purpose campaigns.

Audio cues: building voice (starts quiet, grows confident), rising tempo, cinematic or orchestral genre, crescendo
What to avoid: static energy, monotone delivery, repetitive loops

Urgency The audience should feel time-pressure, scarcity, and the need to act immediately. Urgency is the primary emotion for flash sales, countdown campaigns, limited inventory, and direct-response advertising.

Audio cues: fast-paced voice, high tempo (130+ BPM), percussive or electronic genre, ticking/countdown SFX
What to avoid: relaxed pacing, ambient music, meandering intros

The Single-Emotion Rule: Choose ONE primary emotion. An ad that tries to make people feel "calm AND excited" will make them feel nothing. If you have a secondary emotion, it should complement the primary — not contradict it. For example: Trust (primary) + Calm (secondary) works. Trust (primary) + Urgency (secondary) creates confusion.

Document it. Write your emotional goal on the campaign brief before any audio production begins. This becomes the measuring stick for every creative decision that follows.

5Step 2 — Identify the Audience

Step 2 — Identify the Audience

The same emotional goal requires completely different audio execution depending on WHO the audience is. Trust sounds different to a 25-year-old fintech user than to a 55-year-old private banking client. Joy sounds different to a teenager than to a parent.

Build an Audience Audio Profile:

For every campaign, answer these five questions about your target audience before making any audio decisions:

1. Age Range — What generation are they? Age is the strongest predictor of genre preference:

Gen Z (18–27): Hip-hop, lo-fi beats, electronic, trending TikTok sounds. Short attention span — audio must hook in first 1–2 seconds
Millennials (28–43): Indie, pop, acoustic, podcast-style narration. Comfortable with longer formats but expect production quality
Gen X (44–59): Rock, R&B, classic pop, authoritative voiceover. Value substance over style
Boomers (60+): Classical, jazz, easy listening, warm broadcast voice. Prefer clear, measured delivery

2. Cultural Background — What musical traditions resonate? This is critical for MENA markets:

Gulf Arabic audience: Khaleeji music cues, oud and percussion, Gulf dialect voiceover
Levantine audience: Lebanese/Syrian pop influences, Levantine dialect
Egyptian audience: Shaabi rhythms, Egyptian Arabic dialect, dramatic vocal style
Pan-Arab (MSA) audience: Modern Arabic pop, MSA voiceover, avoid regional-specific dialects
Expatriate/international audience: Western pop, English voiceover, globally recognisable sounds
Mixed audience: Test both Arabic and English variants (see G-11 A/B Testing)

3. Platform Habits — Where do they consume content?

TikTok-first audience: Expect trending sounds, music-forward, short-form. Audio must work without context — scrollers decide in 0.5 seconds
Instagram/Facebook audience: Sound-off default. Audio enhances but must not be essential. Strong visual-first, audio-second design
YouTube audience: Longer attention span, higher audio expectations. Pre-roll audio must differentiate from the content
Podcast audience: Highly audio-literate. Expect native, conversational ads. Reject anything that sounds "addy"
LinkedIn audience: Professional, measured, authoritative. Audio should feel like a boardroom, not a nightclub

4. Content Sensitivity — What topics require careful audio treatment? Some campaign topics require audio restraint:

Healthcare and illness: gentle, empathetic tone. No upbeat music behind serious health messaging
Financial hardship: no luxury-sounding music. Avoid anything that feels tone-deaf to the audience's situation
Loss and bereavement: minimal music, soft voice. Silence can be more powerful than sound
Regulatory topics: clear, measured delivery. No dramatic music that might seem manipulative

5. What Do They Already Listen To? The most valuable audience insight for audio decisions is: what does this audience choose to listen to in their free time? Use Spotify Wrapped data, podcast listenership surveys, radio format ratings, and social listening to understand your audience's existing audio preferences. Your ad audio should feel at home in their listening environment — not alien to it.

The Audience Audio Profile Template: Document your findings in a one-page Audience Audio Profile:

Target age: [range]
Cultural/linguistic background: [details]
Primary platform(s): [list]
Content sensitivities: [notes]
Preferred listening: [genres, artists, podcast types]
Audio "do nots": [specific sounds or styles to avoid]

This profile becomes the filter for Step 4 (Genre & Voice selection).

6Step 3 — Match Format to Audio

Step 3 — Match Format to Audio

Different ad formats impose different audio constraints. A 6-second bumper ad on YouTube requires fundamentally different audio thinking than a 60-second podcast mid-roll. The format determines your audio budget — how many elements you can include and how complex your sound design can be.

Format-to-Audio Mapping:

6-Second Bumper (YouTube, Social Pre-Roll)

Audio budget: ONE element only — either a sonic logo, a single voice line, or a music sting
No time for music beds, voiceover AND SFX together. Pick one and make it count
Best use: brand recall (sonic logo), single message ("Sale starts Friday"), or attention grab (SFX hook)
Worst mistake: trying to cram a 30-second script into 6 seconds with speed-reading

15-Second Social Ad (TikTok, Reels, Stories)

Audio budget: voice OR music-forward, rarely both at full intensity
Structure: 2-second hook (SFX or voice question) → 10-second message → 3-second CTA
Music role: set the emotional tone in the first 2 seconds. Trending sounds on TikTok can boost algorithmic reach
Voice role: one clear message delivered conversationally. No "announcer voice" — it triggers scroll-away

30-Second Spot (Digital, Radio, TV)

Audio budget: full production — voice, music bed, SFX accents, and sonic logo
Structure: 5-second hook → 15-second body → 5-second CTA → 5-second brand close
This is the workhorse format. Music bed should support the voice without competing. Dynamic range matters — create a mini arc with a beginning, middle, and end
Mix priority: voice sits 6–8 dB above music bed. SFX are used sparingly for emphasis, not decoration

60-Second Podcast Mid-Roll

Audio budget: voice-primary with optional subtle music bed
Structure: host-read or conversational. Should sound like part of the podcast, not an interruption
Music bed: if used, it should be minimal — a light texture underneath, not a produced track
Key principle: authenticity. Podcast listeners are audio-literate and will immediately detect (and resent) a pre-produced "radio ad" inserted into their show

Long-Form Audio (Brand Story, Audio Documentary, 2+ Minutes)

Audio budget: full cinematic production — multiple voice tracks, music movements, layered SFX, soundscaping
Structure: narrative arc with distinct chapters. Music evolves (not loops). Voice delivery changes with the emotional journey
This format rewards production quality. It is the audio equivalent of a brand film

Platform-Specific Audio Specs:

TikTok: Sound-on default. Audio is essential. Target –12 to –14 LUFS. Trending sounds boost reach
Instagram Reels: Sound-off default. Audio enhances but is not required. Caption-first design
YouTube: Pre-roll is skippable after 5 seconds. The first 5 seconds of audio must hook or you are wasted spend
Spotify/Audio Streaming: Audio-only, high attention. Full production quality expected. –14 LUFS
Radio: Broadcast standards. –23 LUFS. Disclaimer requirements are strict
LinkedIn: Professional context. Authoritative voice, measured pace. Music should feel corporate-appropriate

Device Considerations:

Mobile-first (most social): optimise for phone speakers (see G-12 Mobile Optimisation)
Desktop (LinkedIn, YouTube long-form): wider frequency range available, but still design for phone speakers as backup
Smart speakers and connected audio (podcast, streaming): high-quality audio expected. Full frequency range

7Step 4 — Select Genre and Voice

Step 4 — Select Genre and Voice

With your emotional goal defined (Step 1), your audience profiled (Step 2), and your format constraints understood (Step 3), you now have the context needed to make the two most impactful audio decisions: genre and voice.

Genre Selection: The Emotional-Audience Intersection

The right genre sits at the intersection of your emotional goal and your audience's preferences. Use the Genre Selector tool (or G-02 Music Genres reference table) and apply this logic:

Start with the emotional goal:

Trust → Acoustic, Classical, Ambient, Soft Jazz
Joy → Pop, Acoustic Pop, Afrobeats, Reggae
Excitement → Electronic, Hip-Hop, Rock, Drum & Bass
Calm → Ambient, Classical, Lo-fi, Nature Soundscapes
Inspiration → Cinematic/Orchestral, Indie, Gospel, Acoustic
Urgency → Electronic, Percussive, Trap, Drum & Bass

Then filter by audience:

If the emotional goal suggests "Acoustic" but the audience is Gen Z TikTok-first → switch to Lo-fi or Indie Electronic (same emotional register, audience-appropriate genre)
If the goal suggests "Electronic" but the audience is 55+ private banking → switch to Orchestral or Modern Classical (same energy, more culturally aligned)
If the audience is Gulf Arabic → consider Khaleeji-influenced versions of any genre, or traditional Arabic instrumentation (oud, qanun) with modern production

The Neutrality Principle: When in doubt — when you are not sure which genre to choose — select neutral audio that will not alienate any part of your audience. Ambient textures, soft acoustic guitar, and gentle piano are almost universally inoffensive. They may not excite, but they will not repel. This is safer than choosing a genre that thrills 50% of the audience and annoys the other 50%.

Voice Selection: The Character of Your Brand's Sound

Voice is the most personal audio element. The right voice creates an instant connection; the wrong voice creates an instant barrier. Use the Voice Matcher tool (or G-03 Voice Styles reference table).

Key Voice Decisions:

Male vs. Female: There is no universal "better" — it depends on the audience and the emotional goal. Test both (see G-11 A/B Testing). General patterns:

Male voices tend to score higher on authority and depth
Female voices tend to score higher on warmth and approachability
For gender-neutral positioning: consider non-binary or androgynous voice options

Delivery Style:

Authoritative: boardroom energy, measured pace, clear diction. Best for B2B, finance, healthcare
Conversational: friend-talking-to-friend, natural pauses, casual language. Best for social ads, D2C, lifestyle
Storyteller: narrative arc, emotional variation, cinematic. Best for brand stories, awareness campaigns
Energetic: high-energy, fast-paced, enthusiastic. Best for product launches, sales events, sports
Intimate: close-mic, whisper-adjacent, personal. Best for luxury, wellness, late-night content

Language and Dialect (MENA Markets): This decision is as important as voice style in MENA:

Gulf Arabic: for Saudi, UAE, Kuwait, Qatar, Bahrain, Oman audiences
Levantine Arabic: for Lebanon, Syria, Jordan, Palestine audiences
Egyptian Arabic: for Egyptian audiences (also widely understood across MENA due to media influence)
MSA (Modern Standard Arabic): for pan-Arab campaigns or when no single dialect fits. Sounds formal — use only when formality is appropriate
English: for international/expatriate segments or code-switching brands
Bilingual (Arabic + English): for brands that straddle both. Mix must feel natural, not forced

Voice + Genre Harmony: Voice and genre must complement each other. An authoritative deep voice over lo-fi beats creates dissonance. A whisper-intimate voice over high-energy electronic creates confusion. Match the energy:

Low-energy voice → low-energy genre
High-energy voice → high-energy genre
If the voice and genre fight each other, the audience feels uncomfortable without knowing why

8Step 5 — Validate and Comply

Step 5 — Validate and Comply

You have defined your emotional goal, profiled your audience, matched the format, and selected your genre and voice. Before any audio goes into production, Step 5 is the final gate: legal, cultural, and regulatory validation.

Skipping this step is the most expensive mistake in audio advertising. A single compliance violation can result in ad takedowns, regulatory fines, brand damage, and wasted production budget. Validation takes 30 minutes. Recovering from a compliance failure takes weeks.

Legal Validation:

Music Licensing Check:

Is every piece of music properly licensed? (See G-12 Music Licensing for full requirements)
Does the licence cover all target territories? A licence for UAE may not cover Saudi Arabia
Does the licence cover the specific platforms? "Social media" licences may exclude paid advertising
What is the licence duration? A 1-year licence means the ad must be pulled after 12 months
If using AI-generated music: confirm the platform terms permit commercial advertising use

Voice Talent Rights:

Is there a signed voice talent agreement covering commercial use, territory, duration, and media?
Does the agreement allow AI voice cloning or modification? (Increasingly important with AI voice tools)
For celebrity or influencer voices: are likeness rights and endorsement terms clear?
For AI-generated voices: confirm the voice model was created ethically and legally (not cloned from a real person without consent)

Trademark and Brand References:

Does the script mention any competitor brands? If so, is the reference factual and non-disparaging?
Are all brand name pronunciations correct? (Especially important for Arabic transliteration of English brand names)

Cultural Validation (Critical for MENA):

Religious and Social Sensitivity:

Does the audio respect religious sensitivities? No music or content that could be considered disrespectful during religious observances
Ramadan-specific: audio tone should shift to reflective, community-focused, and generous during Ramadan. Avoid hard-sell urgency during the holy month
National Day celebrations: audio should feel patriotic and respectful. Avoid trivialising national identity
Gender representation in voiceover: ensure representation aligns with local norms and brand values

Language Quality Check:

Arabic grammar and diacritics (tashkeel) verified by a native speaker — not just a translator
Dialect consistency: if you chose Gulf Arabic, every word must be Gulf Arabic. One Egyptian word breaks immersion
Bilingual content: code-switching must feel natural. Forced Arabic-English mixing sounds amateur
Pronunciation of numbers, dates, and technical terms: verify these are natural in the chosen dialect

Regulatory Validation by Industry:

Financial Services (CBUAE, SAMA, CMA):

Risk disclaimers read at comparable speed and volume to main claims
"Past performance" disclaimers for investment products
Interest rate and fee disclosures must be complete and audible

Healthcare and Pharma (DOH, MOH, SFDA):

Side effects listed at the same pace and volume as efficacy claims
"Consult your doctor" statement required for OTC and prescription products
No audio that implies guaranteed outcomes

Real Estate (RERA, DLD):

Project registration numbers must be stated
"Prices subject to change" disclaimer required
Off-plan marketing regulations vary by emirate

Food and Beverage:

Health claims must be substantiated and qualified
"Part of a balanced diet" or equivalent qualifying statement
Halal certification references must be accurate

The Compliance Sign-Off: Before production begins, create a Compliance Sign-Off document:

Music licensing: confirmed ✓ (with licence reference numbers)
Voice talent agreement: signed ✓
Cultural review: approved by regional team ✓
Regulatory check: approved by legal/compliance team ✓
Disclaimer text: finalised and approved ✓

ZorgSocial Compliance Checker automates the regulatory validation step — it cross-references your audio content against industry-specific rules for your target markets and flags potential violations before production. Use it as the first pass; human compliance review remains the final authority.

9Putting It All Together: Real-World Walkthroughs

Putting It All Together: Real-World Walkthroughs

Theory becomes clear through practice. Here are three real-world scenarios showing the Audio Decision Framework applied end-to-end.

Scenario 1: UAE Bank — New Savings Account Campaign

Step 1 — Emotional Goal: Trust (primary), Calm (secondary). Customers should feel their money is safe and their future is secure.

Step 2 — Audience: 30–50 year-old UAE residents (mix of Emirati nationals and long-term expats). High financial literacy. Primarily Arabic-speaking with English as second language. Platform habits: Instagram, YouTube, banking app push.

Step 3 — Format: 30-second Instagram video ad + 15-second YouTube pre-roll + 60-second podcast mid-roll on a popular UAE finance podcast.

Step 4 — Genre & Voice: Genre: Modern Classical (piano + subtle strings) — sophisticated, trustworthy, not stuffy. Voice: Male, warm-authoritative, Gulf Arabic dialect. Moderate pace (140 WPM). Music bed at –14 LUFS, voice 7 dB above.

Step 5 — Validate: CBUAE compliance for savings products — "Terms and conditions apply" disclaimer at comparable volume. Music licensed for UAE + broader GCC. Voice talent agreement for 12 months across digital platforms.

Result: Professional, trustworthy audio that feels premium without being cold. The Gulf Arabic dialect creates local connection. The modern classical genre signals sophistication without feeling old-fashioned.

Scenario 2: Saudi E-Commerce — Ramadan Flash Sale

Step 1 — Emotional Goal: Urgency (primary), but tempered with cultural respect for Ramadan. Not aggressive urgency — more "generous opportunity you do not want to miss."

Step 2 — Audience: 18–35, Saudi Arabia. Digital-native, TikTok-first. Arabic-speaking. Price-conscious but brand-aware. Shopping behaviour peaks post-iftar (after sunset).

Step 3 — Format: 15-second TikTok ad (vertical, sound-on) + 6-second YouTube bumper + in-app push notification audio.

Step 4 — Genre & Voice: Genre: Modern Arabic Pop with Khaleeji influence — energetic but culturally appropriate for Ramadan. NOT heavy electronic beats. Voice: Female, young, enthusiastic Saudi dialect. Fast but clear (160 WPM). Includes countdown SFX ("3 days left") but NO aggressive alarm sounds.

Step 5 — Validate: No content that trivialises Ramadan. Sale messaging framed as "Ramadan generosity" not "panic buying." Music does not include inappropriate content. Saudi e-commerce disclosure requirements met. Scheduled for post-iftar time slots.

Result: Energetic and timely, but culturally respectful. The Saudi female voice feels authentic to the audience. Khaleeji pop genre signals "local brand that understands us."

Scenario 3: Global SaaS — Product Launch for MENA Market

Step 1 — Emotional Goal: Excitement (primary), Inspiration (secondary). This is a breakthrough product that will transform how businesses operate.

Step 2 — Audience: 28–45, C-suite and senior managers across GCC + Egypt. Bilingual (Arabic/English). LinkedIn-first, YouTube secondary. Tech-savvy, time-poor, sceptical of hype.

Step 3 — Format: 30-second LinkedIn video ad + 60-second YouTube explainer with audio narration.

Step 4 — Genre & Voice: Genre: Cinematic/Orchestral with modern electronic undertones — signals innovation and scale. NOT startup-quirky (this audience is corporate). Voice: Male, authoritative but approachable, MSA with slight Gulf inflection. English version also produced for the same campaign (bilingual A/B test per G-11).

Step 5 — Validate: No superlative claims ("the best," "number one") without substantiation. Tech terminology verified for accurate Arabic translation. Music licensed for pan-MENA + English-speaking markets. Both Arabic and English versions tested for brand name pronunciation.

Result: Professional and aspirational without being overhyped. Cinematic genre signals "this is a big deal" without the startup clichés. MSA voice reaches the broadest pan-Arab professional audience.

The Framework as a Living Document: Print or save the 5-step framework as a one-page reference. Tape it to the wall of your creative studio or pin it in your team's Slack channel. Run every campaign through it before production begins. Over time, your team will internalise the steps and make faster, better audio decisions instinctively.

10Try This in ZorgSocial

Apply what you learned in ZorgSocial

1Open a new campaign in ZorgSocial and navigate to Audio Strategy → Decision Framework

2Step 1: Select your primary emotional goal from the six options (Trust, Joy, Excitement, Calm, Inspiration, Urgency)

3Step 2: Fill in the Audience Audio Profile — age, culture, platform, sensitivities, and listening preferences

4Step 3: Choose your ad format and platform — the system shows audio constraints and specs automatically

5Step 4: Use the Genre Selector and Voice Matcher tools to find the best genre + voice combination for your inputs

6Step 5: Run the Compliance Checker against your target markets and industry to flag any regulatory issues

7Review the generated Audio Brief — a one-page summary of all five decisions ready for production

8Share the Audio Brief with your production team or proceed directly to AI-assisted audio creation

11In ZorgSocial

Build your Audio Brief with the Decision Framework

Every concept in this guide maps directly to ZorgSocial tools. Explore the step-by-step tutorials for hands-on application.

Browse Tutorials Get Recommendations

Previous guide

Production Standards

Next Step

Apply this inside ZorgSocial

Use ZorgSocial AI tools to build your audio campaign.

Start Free Trial Browse Tutorials