Loading...
Checking authentication...
Native speakers don't pronounce words separately. Explore the phonological rules that govern natural, connected English speech.
Explore our comprehensive pronunciation guides with audio and video examples.
Browse Pronunciation GuidesWhen native English speakers talk naturally, something remarkable happens: words blend, sounds disappear, and new sounds emerge in unexpected places. This phenomenon, called connected speech, explains why "What are you doing?" can sound like "Whatcha doin'?" and why "I'm going to go" becomes "I'm gonna go." Understanding connected speech is essential for achieving native-like fluency and comprehending fast, natural English.
These aren't random shortcuts or lazy speech—they're systematic phonological processes governed by precise rules. Mastering them transforms stilted, word-by-word pronunciation into the smooth, flowing speech that characterizes native speakers.
Connected speech processes exist because the human articulatory system seeks efficiency. When speaking quickly, our mouth, tongue, and lips minimize unnecessary movements. Instead of carefully pronouncing each sound in isolation, we allow adjacent sounds to influence each other, creating a more fluid articulation.
This isn't "incorrect" English—it's how the language actually functions in real-world communication. Even careful, formal speech employs these processes, though perhaps less dramatically than casual conversation.
Assimilation occurs when a sound changes to become more similar to an adjacent sound, making articulation easier and faster.
The most common type of assimilation involves changing where in the mouth a sound is produced to match a following sound:
| Phrase | Citation Form | Connected Speech | Explanation |
|---|---|---|---|
| "ten people" | /ten ˈpiːpəl/ | /tem ˈpiːpəl/ | /n/ (alveolar) becomes /m/ (bilabial) before /p/ |
| "bacon" | /ˈbeɪkɒn/ | /ˈbeɪkŋ̩/ | /n/ becomes /ŋ/ (velar) after /k/ |
| "green bag" | /ɡriːn bæɡ/ | /ɡriːm bæɡ/ | /n/ assimilates to /m/ before /b/ |
| "input" | /ˈɪnpʊt/ | /ˈɪmpʊt/ | /n/ becomes /m/ before /p/ |
This happens because bilabial sounds (/p/, /b/, /m/) require bringing the lips together, while alveolar sounds (/t/, /d/, /n/) require touching the tongue to the alveolar ridge. Changing /n/ to /m/ before bilabial sounds means one less rapid repositioning of articulators.
Sounds can also assimilate in voicing (whether vocal cords vibrate):
When alveolar consonants (/t/, /d/, /s/, /z/) meet the /j/ sound (as in "you"), they often transform into palatal sounds:
| Phrase | Standard | Palatalized | Sound Change |
|---|---|---|---|
| "did you" | /dɪd juː/ | /dɪdʒuː/ | /d/ + /j/ = /dʒ/ |
| "would you" | /wʊd juː/ | /wʊdʒuː/ | /d/ + /j/ = /dʒ/ |
| "can't you" | /kɑːnt juː/ | /kɑːntʃuː/ | /t/ + /j/ = /tʃ/ |
| "miss you" | /mɪs juː/ | /mɪʃuː/ | /s/ + /j/ = /ʃ/ |
| "as you" | /æz juː/ | /æʒuː/ | /z/ + /j/ = /ʒ/ |
This process is so productive that it's created entirely new pronunciations: "What's your name?" can become /wɒtʃər neɪm/, "Did you eat?" becomes /dɪdʒuːˈiːt/, often spelled colloquially as "didja eat?"
Elision is the complete omission of sounds in connected speech. Certain sounds regularly vanish in predictable contexts, especially in rapid or casual speech.
When three or more consonants cluster together, middle consonants often disappear:
Function words (articles, prepositions, pronouns) often lose sounds entirely in connected speech:
| Word | Full Form | Weak Form | Common Elision |
|---|---|---|---|
| "and" | /ænd/ | /ənd/ or /ən/ | /n/ - "fish 'n' chips" |
| "of" | /ɒv/ | /əv/ | /ə/ - "sort o' thing" |
| "them" | /ðem/ | /ðəm/ | /əm/ or /m/ - "give 'em here" |
| "him" | /hɪm/ | /ɪm/ | H-dropping: "tell 'im" |
| "her" | /hɜːr/ | /ər/ | H-dropping: "gave 'er" |
Initial /h/ frequently disappears from unstressed pronouns and auxiliaries:
Important note: This H-dropping occurs only in unstressed function words, not in content words. "He's a happy man" might lose the /h/ in "he's" but never in "happy."
English speakers avoid gaps between words, creating smooth transitions through various linking strategies.
When a word ends in a consonant and the next begins with a vowel, the consonant "links" directly to the following vowel, as if the consonant started the second word:
When two vowel sounds meet across word boundaries, English inserts glide consonants to smooth the transition:
/w/ insertion (after /uː/, /ʊ/, /əʊ/, /aʊ/):
/j/ insertion (after /iː/, /ɪ/, /eɪ/, /aɪ/, /ɔɪ/):
In non-rhotic accents (British RP, Australian, etc.), an /r/ sound appears between vowels even where there's no 'r' in the spelling:
This process is called "intrusive" because the /r/ has no historical justification—it's purely a phonological strategy for linking vowels.
Most English function words have two distinct pronunciations: a strong form (used for emphasis or in isolation) and a weak form (used in connected speech). The weak form is far more common.
| Word | Strong Form | Weak Form | Example |
|---|---|---|---|
| can | /kæn/ | /kən/ | "I can help" /aɪ kən help/ |
| from | /frɒm/ | /frəm/ | "letter from home" /ˈletər frəm həʊm/ |
| to | /tuː/ | /tə/ | "go to school" /ɡəʊ tə skuːl/ |
| at | /æt/ | /ət/ | "look at me" /lʊk ət miː/ |
| was | /wɒz/ | /wəz/ | "he was tired" /hiː wəz ˈtaɪəd/ |
| some | /sʌm/ | /səm/ | "some people" /səm ˈpiːpəl/ |
| the | /ðiː/ | /ðə/ | "in the morning" /ɪn ðə ˈmɔːnɪŋ/ |
The transformation typically involves:
Consider: "I can understand that" in careful speech might be /aɪ kæn ˌʌndərˈstænd ðæt/, but in natural conversation becomes /aɪ kən ˌʌndərˈstæn ðət/—nearly half the vowels reduce to schwa.
Grammatical contractions are formalized versions of connected speech processes that reduce auxiliary verbs and negatives.
Beyond standard contractions, informal speech creates additional reductions:
These aren't separate words—they're faithful transcriptions of how the original phrases sound in rapid speech.
When identical or similar consonants meet at word boundaries, they merge into a single, slightly lengthened consonant rather than being pronounced twice:
This process, called gemination, creates a held consonant rather than two separate articulations.
English is a stress-timed language, meaning stressed syllables occur at roughly regular intervals, regardless of how many unstressed syllables fall between them. This rhythm pattern forces function words and unstressed syllables to compress dramatically.
Consider the sentence: "The cats sat on the mats."
In careful pronunciation: /ðiː kæts sæt ɒn ðiː mæts/
In natural speech: /ðə ˈkæts ˌsæt ən ðə ˈmæts/
The stressed syllables (CATS, SAT, MATS) receive roughly equal time, while the unstressed words between them compress to fit the rhythm. This is why learners who give equal time to each syllable sound mechanical—they're not following English's rhythmic pattern.
Connected speech processes occur on a spectrum based on speaking rate and formality:
Example: "I don't know what you're going to do about it."
Formal: /aɪ dəʊnt nəʊ wɒt juː ɑːr ˈɡəʊɪŋ tuː duː əˈbaʊt ɪt/
Casual: /aɪ dəʊnəʊ wɒtʃər ˈɡɒnə duː əˈbaʊɾɪt/
The casual version demonstrates palatalization ("what you're" → /wɒtʃər/), reduction ("going to" → /ˈɡɒnə/), weak forms throughout, and even flapping of the /t/ in "about it" to /ɾ/.
Understanding connected speech is crucial for comprehending native speakers. When learners can't understand fast speech, the problem often isn't vocabulary but unfamiliarity with how citation forms transform in context.
Connected speech transforms English from a sequence of discrete words into a flowing stream of sound. These processes aren't errors to avoid but patterns to embrace—they're how English actually works in the real world, revealing the hidden architecture that makes natural speech possible.