Brand Name

Contact Dr. Aubrey Nunes

How I can help

You can contact me from the contact form on this site. I just need to know what you think the problem is with a child, how old the child is, your name, your relation to the child, and your email address.

Pop me an email, and I will tell you if I think I might be able to help.

My clinic is at 52, Bonham Road, London, SW2 5HG, just off Brixton Hill, a short walk or two stops on the bus from Brixton underground. It is a very child-friendly space with boxes and boxes of toys, a special hand-made doll's with a secret, interesting pictures, and shelves of books.

I only give treatment and advice face to face.

Before I do any work I need to get a slightly more detailed idea of what the problem might be. For this I need to ask you some questions. But I prefer to keep the questioning to the minimum, so as to get down, as soon as possible to a clinical investigation.

I offer a free initial half-hour consultation. From that point on, my fees are £125 per hour. But I only ask for fees if you can afford them. Don't worry if you can't.


I worked for ten years as a speech and language therapist in the British National Health Service. I then started doing some research. In 2002 I completed a PhD at the University of Durham. I described some patterns in children’s speech errors, which had not been described before. I asked: Why do errors pattern the way they do? It is odd that they pattern at all. And I described what I thought was a new therapy idea. More research has shown that what I thought was a new idea was in fact an update of an old one.

I now work on these issues as a clinical linguist. While I still don’t have a good answer to the question: Why do the errors pattern they way they do? I think the most plausible explanation involves whatever it is which makes it possible for most people to learn to talk in about ten years, relating the physical aspects of perception and articulation to the way words and sentences are organised with not just speech sounds but internal rhythm.

How this works is not simple or obvious. What children hear is talk, some addressed to them, some plainly not. But somehow most people learn what to say, what not to say, how to say the exact opposite of what one means for greater effect, how to play on words, how to create new words when necessary. And some things seem to get known without ever needing to be demonstrated or explained, as in “Who do you want to talk to?” and “Who do you want to talk to you?” where the subject of talk varies according to whether a would-be addressee is specified. There is thus a paradox of what is now known as ‘learnability’. It is only in the last 60 years or so that this paradox has been recognised for what it is.

A more technical version of this idea is often referred to as ‘the logical problem of language acquisition’.

I am working on an application of this thinking in relation to speech where the logical problem seems to me just as acute as it plainly is in relation to language. It seems to me not just possible, but likely, that the logical problem is related to the odd patterning of the various errors and difficulties that children have in learning to speak. There is, I believe, critical data in how children solve or don’t solve the logical problem for themselves. From this, there will, I believe, be useful and valuable clinical insights.

My work involves making on-line adjustments to the way the child is solving the logical problem – or not.

I have four degrees, in sociology, in theoretical linguistics, in speech and language therapy, and my PhD. I try to use all of them.


I can advise only on the area of my expertise: What is the essence of a given speech problem? How long might it take to fix? How likely would it be for it to fix itself? With what degree of confidence can answers be given?

While labels can be reassuring, I believe that they should be used with caution. I do not believe that research is yet at a point where it is useful or justifiable to attempt an exhaustive classification of speech disorders. But there is, I believe, some new theoretical apparatus, developed by a number of authors over recent years, which is very useful for clinical linguistics.

One to one

The sort of help which I can offer is almost necessarily one to one. Where the issues are relatively minor, group work can be both fun and – very importantly – confidence-building – especially in a group of five or six. But the more serious the issues, the more difficult it is to focus on them sufficiently sharply and adequately, other than one-to-one.


I was approached out of the blue outside my clinic by a stranger saying that his son, Jason, then aged seven, had had speech and language therapy and been discharged. The parents had been given the idea that any remaining problems should resolve on their own. There were no problems with Jason’s school work, but his father felt there were still issues with his speech. On assessment, Jason all the speech sounds of English in various positions in words, with no obvious problem in casual conversation. But he could say very few words of four syllables or more. His speech sounded immature. But what caused which errors? I took a word he couldn’t say, and made it a bit simpler. We continued this process, gradually moving closer to the target, until he could say it correctly. With no consultation and presumably no rehearsal, a week later Jason again said it correctly. Starting from a variation of a given target word that he could manage, I would give him around 100 nonsense words, making no corrections, just praising all of his efforts, but trying to ensure that every trial would be possible for him. My purpose was to define the point(s) at which the breakdown occurred and to find a way of working up to that point that would facilitate success. We continued like this until he could say all the long words I could think of, and Jason and his parents were not aware of any other problems with his speech.


At four and a half, Anna was mostly incomprehensible. For example, in cardigan one property of the G went to the final N and a different property of the N went to the position of the G. the effect of this sort of disruption in one syllable is to make the speech incomprehensible to everyone other than those who know the speaker well and want to understand. Her treatment consisted in 14 half-hour sessions, each focusing on a different aspect of her speech. Six months later ahe could say cardigan, except that in the G, the back of the tongue did not make a good contact with the roof of the mouth, in a way that is typical of Spanish, but not typical of children learning English. One session close to the end of his treatment when most of our goals had been achieved, we focused on this issue with G. After twenty trials, cardigan was now correct. By five and a half, Anna was fully comprehensible and within the normal range of speech development for a child of her age.


I was crossing a school hall when a child came up to me. “Do you help children with their speech?” he asked. I said: Yes. He explained that he could not say twenty, that he said the word as though it was quenty, and that whenever he was heard to do this, the other children in his class all laughed.

Most speech pathologists call this sort of thing a ‘process’. But this is an unusual process in this position in this word, what linguists call ‘dissimilation’ – making a sound more different from another sound in the same word – in this case making the first T more different from the second. This unusual process and the fact that it was obviously a problem for this uncommonly self-aware child was more than enough for me to initiate the normal process of referral, getting the approval of the parents, notifying the school, and so on. He was in fact the youngest self-referral that I have ever had. And I haven’t heard of any others this young.

Leo was just just six. But he had the speech of a four year old. After eleven half-hour sessions over three months, with Leo saying around 1,000 pseudo-words in all, bearing on a wide variety of relations between the stress pattern and the sound-structure, his speech was age-appropriate, having made up two years in terms of developmental age, in less than six contact hours.

This seems to me a more-than-adequate rate of progress. But Leo was uncommonly self-motivated.

Not just Imitation

Once upon a time it was thought that children learnt to talk by just copying adults until finally they got the idea. Some learnt well, and became actors, or lawyers, or barristers, or salesmen, or politicians. Others did not learn so well. Of course children copy what they hear. That is often painfully obvious, But now it is known that imitation can’t be the whole story. Some things which native speakers of a language all agree about they can't plausibly have learnt just by listening to older speakers. It is as though there were a normal human destiny of being able to talk.

Destiny – Learning to talk

In relation to any difficulties which children have in learning to talk it is like human beings have a destiny of eventually being able to talk. Children are not told how their language works. What they happen to hear said, and what they are taught (if they are taught) vary widely and randomly. But despite this huge range of childhood experiences, people brought up in a given language community have a common understanding. If they didn’t, there would be a monstrous unfairness in the idea of law. Business contracts would be meaningless. And there would be no plays on words or wit.

Take the two questions: “Who do you want to talk to?” and “Who do you want to talk to you?” the subject of the verb talk varies according to whether there is a pronoun you at the end. But in both cases, who is understood in a position to the right of the position in which it is pronounced, as in simpler questions. Most children cope with this apparent dislocation without any problems. Or take what are sometimes called ‘multiple Wh questions’ like “Who ate what?” or “What did who say when?” or “Why did what happen where?” Such questions are very rare in everyday discourse and in children’s speech. But when children do ask such questions (even though they only do so very infrequently), they do not mistakes like ‘Who what ate?” or “What who ate?” with both of the Wh words before any other words.

And there are what are sometimes called ‘echo questions’ where there is surprise in the voice, as in “He said what?” or “She went where?” with some stress on the Wh word which is understood in the position in which it is pronounced.

There are slight variations in these things across languages. In French, for example, the form of the Wh varies between que in “Qu’est ce qu’elle dit?” (What’s she saying?) and quoi in “Elle dit quoi?” (She’s saying what?) But there is a general pattern across languages, with the English pattern typical, and no language with the equivalent of Wh forms on the right edge rather than the left edge of the sentence.

Why should these things be so? These rather subtle variations between languages have to be learnt. Children make all sorts of mistakes in learning to talk. But not quite all sorts. They seem to know what to do with words like who, what, when, where and why without being told or taught.

Crucially, children are never given what is known as 'privileged information' about how things work in a given target language. Nor would children understand if anyone ever tried to give it to them.

It seems to me that it’s really rather extraordinary that we can agree about what anything means or doesn’t mean. But for some children the process of speech and language acquisition is a difficult, bumpy road – as it was for me.


Processes are relations between different speech sounds or ‘phonemes’ or between variations of a phoneme, by different feature combinations in different positions in different words. Some processes are aspects of the language competently spoken by adults, and are not noticed or remarked upon. But speakers are often highly sensitive to any differences between the processes in their variety of a language and those of some other variety, often not noticing any such processes in their own variety, but disparaging the processes in other varieties as lazy, incompetent, or inferior. Such differences are particularly sharply felt in Britain. But they occur in virtually all languages, with speakers from one town or village or part of town recognising the differences between their speech and that of the speech from some neighbouring town, village or part of town. In the limit, these differences can apply on different sides of a street. These differences can apply to the articulation of one vowel or consonant. Or they can apply to processes. Most speakers, adults and children, are completely unaware of any processes in their speech, thinking only of the words as words, and not of how they pronounce them. And it's the same for children. It is sometimes suggested that at least some children with speech problems need to be made aware of one or more processes in their speech. While this may sometimes be necessary, I personally think that it risks doing more harm than good. The child may replace a relatively innocuous process with one that is more obvious and ‘unnatural’. In my view, for the sake of speed and efficiency, intervention in respect of children's speech should seek to remain as surreptitious as possible.

One of the best known examples of a process occurs in words like little, pronounced in at least three different ways in Cockney, in so-called Estuary English, and in what is quaintly known as ‘Received pronunciation’ or RP, as the speech of privilege in Britain. In Cockney the tongue tip articulation of the T is replaced by a brief but complete closure of the larynx, and the L is replaced by something like the vowel in full. In RP, the T is pronounced with a closure at the tongue tip which is not released until after the formation of the L sound. In Estuary English the processes affecting the T and the L are less distinctive.

In a less well-known way, something very similar to the Cockney process also happens in RP in words like huntsman and appointment. In these words, with a T between an N and an M, (known as 'nasals' because they are pronounced with the airstream through the nose), the tongue tip articulation of the T is replaced by laryngeal closure, as in Cockeny, but in a much smaller set of words.

In Southern varieties of English, Hosanah in excelsis Deo is said (or usually sung) with an R in between Hosanah and in. This insertion of R only occurs where the words are in the same phrase or where the second seems to follow from the first, as in “You can go to Australia R and India is on the way home” but not in “It’s hot in Australia. India has food that’s hot too.” This cavalier insertion of R is not permitted in Scottish English.

In at least most varieties of English, Good morning is often said as GOOB MORNING with the lip articulation of the M assimilated in the D of the preceding good morning. The domain of this is the phrase. This happens even in speech carefully and authoritatively addressed to children at school. In “Good. Monday will be the first day,” with exactly the same sounds in the same words again next to each other, there is much less of a tendency to assimilate the D because the adjacent M is not in the same phrase, but in a separate sentence.

In English, as in most languages, there are contrasts by ‘voicing’ and ‘aspiration’ according to the time relation between the release of a closure in the mouth and the vibration of the vocal chords, giving the differences between what are mostly known as voiced and voiceless ‘stops’ (because the airstream is momentarily stopped) in tie and die, coal and goal, pea and bee. But this contrast is lost where the stop is preceded by S. So what is represented by the letters T, K, and P is different in stop and top, skip and kip, spare and pear.

In the monosyllables tree, true, trap, the T and the R merge or ‘coalesce’ into another, with the tongue tip articulation of the T and the voicing of the R getting lost in the process. This is sometimes known as ‘mutual assimilation’. But this is particular to English. It does not happen when English train is loaned into Greek. Nor does it happen in the many other native Greek names and words beginning with TR. This is something which children learning English and Greek as first languages have to learn to do, or not to do, in opposite ways.

All these processes are characteristic of normally competent adult speech.

But some processes are characteristic of child speech. Often these result in the loss of contrasts which make a difference to word meanings. These processes are what clinical linguistics seeks to fix, what I mostly work on. In the speech of children learning English, with a relatively complex set of permitted syllable structures, there is a common tendency for assimilation to apply within the word so that tick, for example, is said as KIK with the back of the tongue articulator from the end of the word replacing the tongue tip at the beginning of the word. Or pat is said as PAP with a tongue tip T articulation at the end of the word assimilating to the P articulation with the lips at the beginning. As these examples suggest, tongue-tip articulations are particularly liable to assimilation in English child-speech. But in the limit, assimilations between tongue tip, back of the tongue and lip articulators can go in all six logically possible ways. In the speech of a given individual child at a single moment in time this is rare, but not unattested. In one case that I analysed, this was subject to a very complex set of interacting principles. The resulting speech was unintelligible, even to an attentive parent. Or the assimilation can extend across the sentence, as in “A magician is a kind of robot” as A MAGICIAM IB MA PIME ’O’ WOBOK, untypically affecting the words is and a, with the final T dissimilating to a K. Such speech is as unintelligible as speech with assimilation in more than one polarity between articulators.

Some processes in child speech involve the substitution of one sound for another, K as T, F as P. Others involve, at least to some degree, structure outside the boundaries of the individual speech sound. This is necessarily the case with assimilation, making a given sound more similar to another, and dissimilation, making a sound more different. Other child-speech processes involve a misconstrual of the relations between sounds, clusters of sounds, syllables, the foot-structure and the closely-related domain of word-stress. Others misalign the systems of speech sounds and word stresses with one another. And others re-order sounds and properties of sounds, not randomly, but one or at most two at a time. A sound is sometimes copied in an incorrect position or just inserted by a process known as ‘epenthesis’. Or, by a process known as ‘metathesis’, two sounds or properties of sounds are re-ordered with respect to one another.

Many children learning English and some adults have difficulties with S. There are two main substitutions, both involving different uses of the tongue tip, one much commoner than the other, but both shifting the noise frequency of the sound significantly downwards. The less common of these processes results in a substitution by what sounds like the sound occurring twice in the Welsh name ‘Llangollen’. But interestingly children in Welsh-speaking areas do not tend to select this particular substitution, as though the characteristically Welsh sound was reserved for another sort of purpose, as it in fact is.

In all cases, the range of outcomes is much less than what might be expected if these errors were just by mistakes in the ‘targeting’ – with a distribution likely to be random.

Let me give some examples of a range of processes, some common, some uncommon. Some child-speech processes are ‘natural’ in the sense that the output corresponds to some pattern which is widely observed in the world’s languages. Other processes are characteristic of child-speech and either rare or unattested in the competent speech of those languages which have been studied in detail. And others are decidedly ‘unnatural’ – not making the target word easier to say in any obvious sense.

tie as DIE, car as GAR, pea as BEE, with the loss of the voicing contrast in favour of voicing at the beginning of the word or syllable by what is known as ‘initial voicing’.

head as HET, egg as ECK, cab as CAP, with the loss of the voicing contrast in favour of devoicing at the end of the word or syllable by what is known as as ‘final devoicing’ in a way that occurs in many languages – like German, Polish and Russian, with German roots like Rad and Tag sounding like RAT and TAK. The opposite process seems to be unattested.

key as TEA, as mentioned above, by a process known as ‘fronting’ because the T sound is articulated at the front of the mouth, whereas the K is articulated further back.

cardigan as KARDIDAN seemingly only occurring at level of structural complexity such there are two unstressed syllables with onsets differing only in their articulator, in other words only in words of at least three syllables, and subject to other conditions, and not in the case of the simplest structures like cat and dog, normally amongst children's first words.

cardigan as KARDITAN by devoicing but seemingly only after an assimilation losing the back of the tongue assimilation in the G. Here the devoicing is clearly ‘dissimilatory’ in its effect – making the sounds at the begining of the final syllable more different from the preceding ‘onset’.

tea as KEY, by a process known as ‘backing’, much less common than fronting.

sea as TEA, also mentioned above, and horse as HORT, by ‘stopping’, reversing the property in a consonant known as ‘continuance’ which allows the airstream to pass through a partially constricted opening to one of ‘non-continuance’, momentarily, but completely, blocking or stopping the airstream, hence the name of the process.

tea as SEA and eat as EES, much less common in child speech than stopping, but attested in the speech and history of many languages, including English, and known as ‘spirantisation’.

Shane, train and chain all as SHAIN, losing the contrast between the simple segment in SH, the cluster in TR where the surface form coalesces properties from both the T and R, and what is known as the ‘affricate’ standardly written as CH, where the airstream is initially blocked and then partially released.

teddy as EDDI or tummy as UMMI, by the loss of the beginning of the stressed syllable, reducing the word to the stress domain, scanning the structure leftwards as far as the stressed vowel), thus treating the word and the stress domain as one and the same thing.

monopoly as OPOLI, losing all of the structure before the stress domain, again treating the word as the stress domain, but with more loss of structure.

teddy as E or tummy as U, misconstruing the stressed vowel as the word, and deleting both consonants and the unstressed vowel.

bottle as BO, losing all of the structure after the stress, misconstruing the stressed syllable as the word.

coat as COAK where the tongue tip T on the right assimilates to the back of the tongue articulator of the initial K sound.

park as PARP where the back of the tongue K on the right assimilates to the lips articulator of the initial P.

monkey as MUMPI, with all the consonants articulated with the lips.

milk as NILK with the lip articulation of the M assimilating to the underlyingly tongue tip articulation of the L.

finger as DINNA with a default tongue tip articulation in the nasal assimilating in the initial sounds of both syllables.

elephant as TELITANT with a multiple assimilation and copying of tongue tip T from one end of the word to the other.

glove as DUD by two successive assimilations, the first coalescing properties from the G and L in the initial D, the second assimilating the final V to this.

clouds as DOWDS where the initial D takes properties from the K sound and the L.

skeleton as SKELINTON with the N copied one syllable to the left.

donkey as DODONG and plastic as PLAPLAK, in each case replacing most or all of the unstressed syllable with corresponding elements from the stressed syllable by a process known as ‘reduplication’.

potato as POPOTAITOE, doubling an unstressed syllable before the stressed syllable, creating a sort of foot in the process, but without the hallmark alternation of stress.

string as DIN, preserving different qualities from the consonants before and after the vowel,

soldier as SHOELDA, as a soundalike of shoulder, with a quality from the right edge of the cluster in the middle shifted to the onset of the initial stressed syllable, but without disturbing the voicing in either case.

soldier as HOEWUF, with the beginning of the stressed syllable going to H, the structure in the middle replaced by a W, with the whole of the second unstressed syllable getting a lip articulation from the stressed vowel with this including a final consonant.

spoon as FOON with coalescence between the lip articulation of the P and the ‘fricative’ or continuous airstream quality of the S.

spaghetti as BASKETTI where the S migrates to the beginning of the stressed syllable, and the surfacing initial labial is voiced by default.

spaghetti as PSKETTI where by a process after the migration of the S, the resulting structure is realigned with the foot, without the initial unstressed syllable, with an initial cluster of PSK, which is not permissible in English (or most languages).

balloon as BOON, jerusalem as JOOSALEM, Geronimo as JONIMO, at different stages of speech acquisition, but in all cases with the beginning of the initial unstressed syllable realigned as the beginning of the stressed syllable, losing an initial unstressed vowel and one consonant.

twenty as QUENTY by what is known as a ‘dissimilation’ with the initial T turning into a K to make the articulator contrast with the T in the final syllable.

monopoly as MONOKOLI where the lip articulator of the P dissimilates to a P between a stressed also with a lip articulation and an L in the final unstressed syllable.

hospital as HOSTIPU with the lip and tongue tip articulations of the P and T reversed, with the effect that the final syllable now has a lip articulation at the beginning and in the way the L turns out in child speech.

me as EE but moo correct, and knee correct, but noo as OO, with the M and OO both using the lips, and the N amd E not doing so, and thus with consonants articulated only where the articulator matches the articulation of the vowel, thus treating articulation as a single property of the syllable.

No listing of developmental pathological processes in child speech will ever be complete. There is always another process or combination of processes waiting to be revealed in the speech of some child. Some of the processes I have listed do not feature in lists of processes characteristic of child speech. But I have heard all of those listed above. One process which seems to me significantly underestimated is dissimilation. On my analysis of the data it is attested in the speech of most children, both those with problems and those developing normally.

But as well as taking account of unusual, exceptional or one-off processes, in my view it is also important to take account of what seems not to happen at all.

In normally competent adult speech there are small variations of which speakers and listener are unaware. The variation is greater in child speech. While the normally developing child does not say every word the same all the time, the greater the inconsistency the more difficult it is for listeners to tune in and understand. The supportive listener can learn that such and such a word is said in such and such a way and adjust accordingly – like learning to understand a new and unfamiliar dialect. But this only works if there is at least some degree of consistency – even if this is limited. Very inconsistent speech is almost unintelligible.

Inconsistency can arise in several ways. One is by a misconstrual of what happens where. It is possible in principle that a child may misconstrue how and when a process should apply. A child may hear the assimilation in the ‘noun phrase’ in “Good morning”, and misapply this. This could appear as inconsistency. Some people think that this is indeed what happens.

The most interesting thing about child-speech processes, it seems to me, is the way they are not completely random. Some of the non-randomness may be due to the way humans hear or articulate words. But this is easily over-stated. Another part of the non-randomness is not properly explicable in sensori-motor terms.

For instance, as noted above, many children say see as TEE by ‘stopping’. This is often explained on the basis that a complete closure is easier for the novice speaker than a partial closure. But I see two things wrong with this explanation. First, the complete closure has to be much more precisely timed than the incomplete closure. And second, while stopping is common in child-speech it is rare in competent speech where precisely the opposite process of spirantisation is common. Spirantisation does occur in child speech, but only rarely. The fact that there is one process in child-speech and the opposite process in competent speech does not fit with the idea that just one of them is ‘natural’. There must be different factors biasing the output in opposite directions, when the mastery of speech is still incomplete and when it is complete.

In a slightly different way, many children say doggy as GOGGI, with the initial tongue tip D assimilating to the back-of-the-tongue G. But very few say cuddle as TUDDOO with the back of the tongue K sound assimilating to the tip of the tongue D. This polarity in the early pattern of assimilation contradicts the predominant pattern of early substitution in favour of tongue tip articulations in key as TEA by fronting. The substitution pattern favours the tongue tip articulator, and the assimilation pattern disfavours it.

Even more oddly, in children with normal speech development as they get close to the point of complete mastery, the assimilation pattern changes. Under very strong and particular word structure conditions, a tongue tip assimilation is developmentally normal in calculator as KALTALAITOR, hippopotamus as HITOPOTAMUS, archeopterix as ARTIOPTERIX, and cardigan as KARDIDAN. All of these words have at least three syllables. Only very seldom does a similar assimilation happen in crocodile. There is something about the structure of those four words which makes them particularly vulnerable to assimilation in favour of the tongue tip articulator. Very occasionally cardigan is said as KARDINTAN by what seems to be a succession of processes, with an assimilation in the G becoming D, a devoicing in the D thus becoming a T, and a copying of the N in the penultimate syllable, but not in all logically possible ways.

The assimilatory process which often changes the final G in cardigan to a D is at the very least rare, and possibly unattested, in simple monosyllables like dog and cat.

This is also the case with matathesis as exampled in hospital as HOSTIPU and dissimilation in monopoly as MONOKOLI. Like assimilation in favour of the tongue tip, metathesis and dissimilation seem not to occur in the simplest cases. If these processes were all due to a natural preference for making words easier to say, it is not obvious why this should be so.

It is this significant degree of non-randomness across the history and polarities of processes which to my mind most strongly invites a linguistic explanation. On the basis of empirical evidence from clear cases like those listed above from children over a wide range of ages, both normally developing and with problems in their speech, I argue in my 2002 PhD thesis that the most plausible explanation is from a cognitive device making speech and language acquisition possible in around 10 years from the limited, potentially misleading, and often defective evidence that is available to the child learner.

Possible words

In almost all languages, words can be built up from a number of syllables, each built from a number of sounds or phonemes, in English with up to three consonants at the beginning, and a rime or rhyme, consisting of a vowel, and sometimes one or more consonants, or L or N as the whole of the rime in an unstressed syllable after the stressed syllable.

What are known as ‘syllabic consonants’ are relatively uncommon across the world's languages. Children learning English will certainly hear words like little, middle, wiggle, button early and often. But such syllables with just ‘syllabic’ N and L and without a ‘built in’ vowel are quite unusual across languages. In most languages, all syllables have a built in vowel. The fact that this is not so for English, is plainly something which children learning English have to learn. And for most children learning English, this point seems to be quite hard. So many, perhaps most, children learning English go through a stage, often for two years or more, of saying little as LICKOO, middle as MIGGOO, and so on, but revealingly. I think, not tickle as TIDDOO or toddle as TOGGOO, with the opposite relations between T and K and between D and G. So syllabic consonants have to be put into a special mental box. Despite the seemingly obvious evidence, the English child has to learn that these rather wayward syllables are not just possible in English, but common.

Because languages and varieties of languages are always in contact with one another as people trade or partner up, the number of possible combinations is not fixed, but subject to gradual change. So in English we now have schlep, spiel and schmuck from Yiddish (with much inconsistency and confusion about the spellings) and tsunami from Japanese, all with consonant sequences before the vowel which are not standardly considered permissible.

In English, as in most languages, there is no limit to the number of syllables. As syllables are combined with one another the range of syllable structures which are actually combined gets smaller. Different native speakers make different judgements. For some, but not all speakers, caulk with an L sound in the spoken form is a possible (and actual, though now archaic) word. And similarly, only for some speakers are sclerosis and sclerotic possible with a K between the S and the L. Such differences make a factorial difference to the number of possible words. But taking account of all the relevant factors governing the different sorts of syllable which English permits, the number of possible words is still enormously greater than the number of actual words.

Now most children are interested in patterns in the sounds of words – in sentences like round the ragged rocks the ragged rascal ran and in the rhyme between five and alive in One, two, three, four, five, Once I caught a fish alive. But this relies on children being able to hear the different sorts of connections between the sounds of particular parts of the words, how words can begin or end with the same sounds, hearing the properties of words separately from the words themselves. This skill is known as metalinguistics. By this skill speakers can tell what counts as a possible word. But interestingly, children with speech problems often have limited skills in metalinguistics. Of course, it isn’t a problem in life, being hazy about what counts as a possible word. But this is obviously a cognitive issue. And if an obviously cognitive metalinguistic issue occurs characteristically in children with speech problems, this raises a question about how far speech problems are at least typically motoric.

Take the word spaghetti which is quite hard for a lot of children to say. Many children move the S to the beginning of the stressed syllable so that the word comes out like BASKETTI. But some leave out the initial unstressed vowel so that the word comes out like PSKETTI. This is not a possible word in English. Most children know this without needing to be told, as they also know that BASKOTTI and BASKATTO, both nonsense, of course, are also similar to one another. But almost all children with speech problems have only a poor grasp of such similarities.

This is why I believe that the three known pioneers of speech and language therapy were correct in their belief that working with possible words which happened to be nonsense was likely to be therapeutic. By the way they were presented, the nonsense words revealed the connections between their sound structures and thus the ‘shape’ of English as a language.

None of the pioneers actually uses the expression ‘possible words’, as it only came into use during the 1960s. But it is only on the basis of this idea that their practice actually makes sense.

Lips, tongue, brain

It is obvious that many, perhaps most, children go through a stage of saying little and middle with the L as something like OO and the T of little as K and the D of middle as G. But very few say hot and mud or time and door with the T and D replaced in anything like the same way. Somehow the L sound at the end of little and middle disrupts the previous sound, whether it is T or D. But it is not obvious what is going on here. Is this by mishearing the words? Or by getting the tongue in a wrong position for one or more of the sounds? Various analyses have been proposed.

In all human populations, a proportion of individuals have problems with speech. In some it is screamingly obvious. In others it often passes unnoticed.

Sometimes the parents of a child with a speech and language issue ask themselves: Is my child brain-damaged? That can be a tough question.

In my experience, if the question is about the working of the brain as a whole, and the child has no other problem or problems, the answer is usually: No. But that is often a rather incomplete answer, because it may not really get to the essence of the question. To answer this question in any particular case, it is necessary to think of the child as an individual in a way that can only be done face to face, and certainly not in the abstract on the internet. Any attempt to give a general answer would be quite unethical in my view.

And the notion of damage may be too strong. The human brain represents the most complex entity in biology with a large number of interacting parts. So any one part can work relatively well or poorly for any one individual. And if it is working poorly, it can often be helped to work better.

Over the past century it has become clear that speech involves a large number of muscles from the lungs to the lips. A muscle is activated from the brain by a signal passing along a pathway. The longer the pathway between the brain and the muscle, the longer this takes. Long pathways have to be initiated before short ones. Almost all speech is on breathing out. The muscles of breathing, mainly the diaphragm and the muscles between the ribs, known as ‘intercostals’, are controlled by a different part of the brain from the muscles of the tongue and lips. So the messages to start breathing out have to be sent out by the brain some time before the first message for the first full-of-breath word. But all of this musculature has to be co-ordinated.

Making things more complicated, some of the ordering is in reverse. In English, as in classical Latin and many, but not all, modern European languages, the stress system (or the metricality or prosody) works from right to left. If what is known as the ‘scansion’ for stress did not work from right to left, English speakers might be confused about how to say Austria, America, Amazonia, with the stress on the first, second and third syllables from the left. The last of these is not familiar to all English-speakers. But there is no doubt about how to say it, because in all of these cases, scanning from right to left, the stress falls on the third syllable from the right. And the same principle of right-to-left scansion applies to the names of newly created drugs. The stress doesn't need to be spelt out on the label. The same principle applies to every word in English apart from a very small number of words, mostly recent loans and names of revered foreign celebrities.

The system at play here has been exploited by poets writing in metre, from Chaucer, Shakespeare and Dryden, to the present.

In large measure, the scansion for stress works independently from the speech sound system. But not completely, as in the case of words like little and middle. The T of little and the D of middle are pronounced in particular ways because of the unstressed L sound on the right which is unstressed because it amounts to a syllable in its own right. In English only unstressed syllables work like this. And this is presumably a difficult thing to learn because so many children have difficulties on this point.

But the musculature of speech is not just by instruction. There are also feed-back processes, checking the effects of the instructions, by listening to these effects and by feeling what is happening, where the tongue is in the mouth, whether it is touching the roof of the mouth, and so on. Fractional adjustments are made to the instructions according to the results by the feedback. The co-ordination of instructions and feedback is obviously intricate. The effect of any disruption of this feedback is obvious after a pain-killing injection by the dentist or by trying to talk if the sound of the speech is artificially delayed, as sometimes happens on a mobile phone. Speech gets disrupted, or becomes impossible.

In some cases, the chain of instructions is disrupted by a medically well-defined condition like cerebral palsy or physical trauma. Here there may be no linguistic issue of any sort.

And some children’s issues do seem to be exclusively motoric, as, for example, where the activity of one invisible muscle in the tongue triggers a corresponding activity of a highly visible muscle in the face. But on the basis of my experience and research, where there is no discoverable medical factor, such exclusively motoric issues are rare.

So there is an obvious question about how far speech is just by instructions and adjustments, as is sometimes suggested, and how far there is also an irreducible linguistic aspect. To my way of thinking, some processes are forcibly linguistic rather than physical and perceptual. For instance, want to shortens to wanna only if there is a following word. “Do you wanna go? Do you really want to?” Like D and T before a syllabic L, the articulation of the T at the end of want and at the beginning of to looks forward to what is coming next. Such looking forward can't be perceptual. And it isn't plausibly motoric. On such grounds, it seems to me that linguistic factors are forcibly involved, and that children are normally learning about these seemingly small details at a very early stage in their acquisition of speech and language.

Children hear words for interesting things like animal, excavator, helicopter, kindergarten, hippopotamus. And they want to say these words too. The rhythm in all English words of two syllables or more is built about what were nicely called ‘feet’ in the ancient world. In English, as in classical Latin, a foot typically has a stressed syllable followed by an unstressed syllable. This is separate from the idea of a word.

Listening to children with speech disorders, it is obvious that for some of them at least the issue is not just one of individual speech sounds. For instance, some children have difficulty putting the initial P in a word like pea or the initial F in a word like four together with what comes after. So from the end of the 1950s the idea started to grow that in order to properly understand speech disorders it was necessary to go beyond the notion of ordered speech sounds, to describe the integration of a larger system as a whole. So the idea developed of extending the neurological notion of apraxia, a sort of disorder which can occur after a stroke, by which someone can brush their teeth but not raise their hand to their mouth. A milder form of neurological apraxia is known as dyspraxia. By applying this notion of neurological apraxia or dyspraxia to speech, the idea developed of ‘Developmental dyspraxia’ or ‘Childhood Apraxia of Speech’ or CAS, as this is mostly known nowadays by speech pathologists. By this theory in one commonly developed form, it is now assumed that the disorder is in the motor-planning of speech – in the sequencing of the instructions.

It is obvious and incontestable that there is a motoric aspect of speech. The whole, incredibly complex apparatus, has to be moved, co-ordinated, and continually adjusted with extreme precision. The engineering is extraordinary.

But for children who cannot say pea or four, it seems to me more economical to hypothesise that their problem is in not connecting up the parts of the syllable, the beginning (or what linguists call the ‘onset’) and what linguists call the ‘rime’ or ‘rhyme’ (the vowel and any following consonants), than to hypothesise an entirely separate notion of ‘motor planning’, which does not throw any clear light on the considerable complexities of either word stress and its effects or on the derivation of contraction in wanna from want to.

Nor does the notion of motor planning throw any light on the range of characteristic ‘co-morbidities’, or the way that apparently separate issues with clarity of speech, using the grammar of complex sentences, learning to read and write, and being able to recognise similarities in the sounds of words, often cluster together. This is to say that speech and language problems tend to be complex. For instance, most dyslexic children have also had problems learning to talk. Most children who are hard to understand or have difficulties with literacy are not good at telling what counts as a possible word. If speech problems are characteristically motoric, how is that they so often co-occur with a poor awareness of words and sounds, an issue which is plainly not motoric, but strictly cognitive?

Some languages, but not all, have a complex rhythm inside the word. So in English there are two stresses in words like hippopotamus, with the stress in the rightmost foot stronger than the stress in the leftmost, and with the final syllable discounted in the computation of this. These are things which children learning English normally start learning between one and a half and two and a quarter – that features, phonemes, the syllable, the word, the stress pattern, and the foot structure, are all different things. For some children, this learning is not easy.

In my experience, unless there is a clear medical diagnosis suggesting otherwise, the overwhelming majority of speech disorders can be described in terms of linguistic categories, independently well-defined by normal speech, rather than inferred from disorder. (In science it is generally thought better to define categorisations by normal and correct function rather than by dysfunction.)

To throw away all of the categorisations by normal and correct function in favour of the much vaguer idea of a failure of praxis or motor-planning seems to me both an error in science and a therapeutic disadvantage.

So I am what is sometimes referred to as a ‘CAS skeptic’. But to my mind the real skepticism is not about CAS, but about linguistics.

I personally think that CAS (also often referred to as ‘dyspraxia’) is unfortunately somewhat over-diagnosed, and that well-evidenced linguistic categories provide better, more precise guidelines for effective treatment.

Pseudo words

There is a tradition going back to the 1660s of using pseudo-, non-, or nonsense-words to treat children’s speech sound disorders. I rediscovered this by chance in 1983, not knowing that the clinical idea went back more than 300 years.

I was just applying two main threads from Chomsky and Halle’s 1968 Sound Pattern of English: what have become known as ‘distinctive features’ and what is now often known as ‘metricality’, as in the two ways of saying words like project and survey, with the stress sometimes on the first syllable and sometimes on the second.

In the 1990s I did some experiments with normally developing children. These I describe in my 2002 PhD thesis. I would get children to say words like hippopotamus which I knew were hard to say. If the child made a small mistake like turning the first P into a T, I would turn the word into a series of simpler pseudo words and ask the child to say them one by one. My chapperone and I figured out a neat way of turning this into a fun game. Then a few days later I would check to see if this had made any difference. It usually did.

The startling difference between normally developing children and those with clear and obvious problems with speech is that the normally developing children are much more aware of the relation between the real word and the corresponding pseudo-word. I personally think that this difference is highly significant.


In a working life from 1849 to 1905, Alexander Melville Bell developed what he called ‘visible speech’. This was a first version of what we now call phonetics. Bell was building on a long tradition going back to the 1669 work of William Holder of breaking the speech sounds down into their constituents. For example, vowels sound different to one another partly by the position of the tongue in the mouth and partly by the shape of the lips. Bell drew a V-shaped chart with the vowel in tar at the bottom and the vowels in tea and two at the top corners, and other vowels at positions in between. The arm of the V with tea at the top represented vowels with the tongue towards the front of the mouth. The other arm with two at the top represented vowels with the tongue more to the back. He called these and all the positions in between ‘cardinal’ positions.

Then in 1917 Daniel Jones, Bell’s academic descendant at University College London, proposed what he called the ‘cardinal vowels’, arranging these on a quadrilateral, with two variants of the AH sound at the bottom and the vowels in tea and two at the top corners, but without mentioning Bell.

In 1913 it was clear to George Bernard Shaw, as an acute and perceptive observer, that Jones had edged Bell out of his place in history. So Shaw dramatised the relation between Bell and Jones in the characters of Colonel Pickering and Henry Higgins in ‘Pygmalion’, pointedly mentioning visible speech in the preface. Eliza, as the main character in the play, is clearly named after Bell’s wife, Eliza, who would seem to have been as feisty as the character who was named after her.

In a rather similar way, there is a recent claim from 2010 by some people that they have invented something which they call ‘Rapid Syllable Transition Treatment’, or ReST. They say that it involves ‘intensive practice in producing multisyllabic pseudo-words’. But they don’t mention any of their predecessors. The idea of therapy with multisyllabic pseudo-words is one of the main ideas in my 2002 thesis where I credit all of those predecessors I was able to find. The ReST authors seem to have first made their claim in 2010. So their claim of invention does not stand up to any very close scrutiny.

To my mind, the ReST authors make three mistakes. First, they say that ReST is for Childhood Apraxia of Speech or CAS. They characterise ReST in terms of ‘Principles of Motor Learning’. This overlooks the fact that a word is not just a series of muscular articulations or transition between syllables, but a structure in a series of larger structures including phrases and sentences. As I argue under the heading of Lips, tongue, brain, the notion of CAS fails to take account of both work in linguistics since 1968 and the co-morbidity evidence in speech and language pathology. Second, they characterise their pseudo-words exclusively in terms of speech sounds. They thus ignore Holder's big discovery that speech sounds have constituent parts, what are now, since the 1968 work of Chomsky and Halle, known as ‘distinctive features’. Third, they insist on giving feedback trial-by-trial, in case of doubt, erring on the side of correction, making no allowance for what is going on in the mind of the child, and thus ignoring the original logic of pseudo-words – by definition, possible words.

It is standard in science and health-care to make no more than one or two changes to an existing protocol. Only on this basis is it possible to evaluate the effects of whatever changes are made. Multiple changes to a protocol like those made by ReST make it difficult or impossible to determine which changes are useful and which ones are not. I personally think it unlikely that any of the changes by ReST will ultimately prove beneficial.

I started rediscovering and updating the pseudo-words idea, actually from the pioneers Holder and Bell, back in 1983. But at the time I knew only of the 1968 work of Chomsky and Halle, which, like my own, was developed without any knowledge of Holder and Bell.

While it is obviously useful to take account of all advances in knowledge and scientific technique and technology, it is also important in my view not to lose sight of the original discoveries. The ReST authors may see the fact that they are applying the pseudo-words idea to CAS as excusing them from any requirement to disclose any prior publications. But this has the effect of concealing what may be disadvantageous changes to older protocols and of exaggerating their originality. And the theory of CAS sits on what seem to me somewhat insecure foundations.

In my opinion, these shenanigans of concealment and disingenuousness have blighted the history and development of both clinical linguistics and speech and language pathology in the English-speaking world.


About one in three of all children with a problems with speech or language has a close relative who is also affected. Just as close relatives often look like one another, it is sometimes observed that a child sounds like a cousin or uncle or aunt.

The awesome foursome

The idea of taking the needs, interests and activity of the child as a starting point is often attributed to Jean-Jacques Rousseau, writing in the late 18th Century. But there is little or no evidence of him trying out any of his ideas in practice.

Two hundred years before Rousseau, Roger Ascham wrote, “The Schoolhouse should be indeed, as it is called by name, the house of play and pleasure, and not of fear and bondage.” Ascham was the last tutor of Queen Elizabeth l, chosen by her for this role, when she was 15. Ascham, then the Orator of Cambridge, had himself taught Elizabeth’s last tutor

Mindful of the fate of Socrates, Ascham was clearly aware that his point might be seen as subversive. Attitudes to discipline in education could be a coded reference to politics. He left the responsibility of publication to his widow. She only published the book two years after his death. The effect of Ascham’s skills as a teacher were evident in the accomplishments of his student, Elizabeth, who became on her own merits a significant scholar. In 1593, at the age of 60, she translated into English the Consolations of Philosophy by Boethius, from about 524, long one of the most influential books in the world. It provided a moral philosophy inspired by humanism, setting out appropriate limits on power and privilege. Boethius wrote this work in prison knowing, that he would soon be painfully and bloodily executed for a treason of which he was not guilty. Her choice of this work was doubtless meant partly to communicate her own personal views. Boethius' work, about 40,000 words long, had been translated into English by King Alfred and Geoffrey Chaucer. Elizabeth made her translation in twelve two-hour sessions over three weeks. The first paragraph is written in her own hand. Such a translation at almost 1,700 words an hour is no mean feat. Ascham’s pioneering tradition was reflected in the beginning of speech and language therapy and pathology and modern linguistics 100 years later. This legacy was buried and lost by some shenanigans in the early 1900s. Rockey (1977, 1979, 1980) and Judith Duchan (1984, 2010, personal communication) recovered some of this lost legacy.

In the 1660s, the Reverend William Holder was working with a deaf boy, Alexander Popham. From what Holder did it would seem that Alexander must have had a severe conductive deafness (where the defect is in the mechanism transmitting sound to the inner ear.) Holder proposed that ‘letters’, or speech sounds, or phonemes as we would now say, should be viewed, not as such, but as the effect of particular combinations of ‘matters’ or what are now known as ‘distinctive features’, in other words, where the sounds are pronounced in the mouth, how the airflow is shaped, whether the airflow is allowed to pass through the nose, what is going on in the larynx (otherwise known as the voice-box, the visible bump in the throat), and so on. These combinations gave both the actual words in the lexicon and the much larger set of possible words. Seemingly this was the first ever such practice anywhere in the world.

Holder makes a three-way distinction between the larynx as a source of ‘voice’, the palate like the ‘shell of a lute’, and the nose which 'gives a material discrimination’. His reference to the body of the lute makes it clear that he is thinking of resonance in what he calls ‘tract of Speech’, what we now call the ‘vocal tract’, seemingly the first ever reference to this.

It would seem that Holder would ask Alexander to repeat one by one carefully structured series of minimally different nonsense forms or pseudo-words. The forms did not mean anything, but they were nevertheless possible words. Holder would start with a form containing the vowel AH with the tongue lowest in the mouth which Alexander would probably have been able to perceive most accurately, then the vowels at the front of the mouth from the lowest to the highest, and then the vowels at the back of the mouth, BAH, BAY, BEE, BOE, BOO. “When you require one vowel of him, he will sometimes stumble on another... And when you have made him perfect at Syllables, then you may reckon that you have taught him all pronunciation of Language, since all words are onely some of these Syllables, or else Syllables compounded of these.”

Holder drew a table showing his features in a matrix. He calls this the ‘true Alphabet of Nature… out of which all languages are made’. He notes that “the French write some consonants which they do not pronounce to be Indices of the Derivations of their words.” By Holder’s notion of derivation, “The number of Letters in Nature is equal to the number of Articulations severally applying to every distinct matter of Sound”. There is a problem in the fact that the cross-multiplication is not complete. This is what is now known as the problem of ‘over-generation’. Holder tries to solve this by claiming that the sounds which do not occur are ‘harsh and troublesome’, and are likely to be ‘harsh and troublesome’ or left out. This is a first step towards two later-developed theories, what are known as the theories of ‘lenition’ and ‘markedness’.

Holder's hypothesis about derivation gives an order of definition – from features to phonemes rather than the other way round. This provides a guide for the teacher and the basis of an alphabet “according to the Number of Letters and their Natural order”.

Holder stresses the importance of encouraging children by ‘sweetness’. He writes, “Their eyes are the more vigilant, attent and heedful, which... gives a delight and encouragement to those who teach such apprehensive scholars.... Of those who are deaf and dumb, I say they are Dumb by consequence from their Deafness.”

Holder made his proposal at one of the early meetings of the Royal Society in front of an illustrious audience including the architect and polymath Christopher Wren, who he had previously tutored, Isaac Newton, and with Bishop Wilkins in the chair, who was working on his Lingua Universalis, intended as an unambiguous language of science.

Lacking the notion of the phoneme which would not be developed for another 200 years, Holder complains about “that faulty way of writing which they call Orthography”. Talking of the use of the letter H in digraphs, he writes, “The mistake, I guess, lies in this.... prejudice taken in with our first ABC.”

Holder’s notion of derivation would lie buried until in 1968 it was vastly exceeded by the work of Chomsky and Halle, using the term, derivation, in the same sense, but refining it on the basis of a vast amount of research over the intervening 300 years.

It would seem reasonable to regard Holder as the first clinical and theoretical linguist and the first speech therapist / pathologist.

In 1814 John Thelwall opened the first instance of what would we would now call a residential speech and language unit for children. It would seem that he took care of the children’s speech and language and his wife took care of everything else. For a man of his time he was uncommonly well aware of gender issues.

Thelwall developed an early version of what is now often referred to as the ‘metrical theory’ of word stress to describe the characteristic English alternation between stressed and unstressed syllables. Thelwall couches his theory in terms of musical cadences.

Making distinctive feature theory more complete, Alexander Melville Bell (1849) showed that English vowels could be defined by height and frontness / backness, with roundness as a secondary feature.

Ascham, Holder, Thelwall and Bell all describe their practice in recognisably modern terms, laying a foundation for clinical linguistics. But this was hugely undermined by the shenanigans.

The Sound Pattern of English

The Sound Pattern of English, to use its full title, (Chomsky and Halle, 1968) customarily referred to by linguists – admirers and detractors alike – in print and in conversation by the acronym of its title, SPE, has the unusual, perhaps unique, distinction of having been first published in paperback long after most of its main ideas had been abandoned by both of its authors.

Just one of the major achievements of SPE was to give the first full account of English word stress. Before SPE, it was often said that English stress varied randomly, as a property of the individual word, like the sequence of the speech sounds, because of variations like those between canopy, spaghetti and vindaloo, with the stress on the first, second and final syllables respectively. But despite such appearances of random variation, there are strong indications of something like a rule from the way loans are treated in English. The Russian place-names, Vladivostok and Borodino, the second the site of the famous battle commemorated in Tchaikovsky’s 1812, both have final stress in Russian. But this would seem to conflict with a deeply learned principle for English speakers which ensures that as Russian loans in English, both of these names are almost invariably pronounced with penultimate stress. So the idea of a rule was long suspected. But until SPE no such rule had been elaborated. SPE provided the first account of how a rule worked, of how and why the stress varies in photograph, photographer, and photographic and between survey as a noun with initial stress and as a verb with final stress, and a great deal more.

Another SPE achievement was the notion of speech sounds being derived from more elementary properties, which SPE called ‘features’, defining where they were articulated in the mouth, what was happening in the larynx, whether the airstream was allowed to pass through the nose, and so on. All sounds – or ‘phonemes’ are treated as stacks of unordered features. Features are either on or off in the same way that nerves are either activated, or they are not. This was, as it remains, a quite reasonable assumption given that this is how the nervous system is generally thought to work. The linguistic pioneer, William Holder, had suggested that the same features were used in all languages, with just the combinations language-specific. SPE made this idea vastly more precise on the basis of new evidence from a large number of languages and their histories.

Another achievement was a theory which was added after the rest of the book had been written when the authors realised that they needed to take account of the fact that while some combinations of feature values were common across the world’s languages, other combinations were exceedingly uncommon. And this needed to be explained. For instance, the vowels in Southern British English bud and bad are both cross-linguistically uncommon. But nearly all languages have a vowel like vowels in tea and two and the stressed vowel in father. So SPE added a theory of markedness by which common vowels were relatively unmarked and uncommon vowels more marked. And sounds like the initial TH sounds in this and thin were more marked than sounds which are common across the world's languages, like the initial S, SH and F in sin, shin, and fin.

But markedness is not just a new word for cross-linguistic rarity. It explains how and why the process which once turned the K sound in electric into the S sound in electricity, works this way in many languages, but the opposite way hardly at all.

These were only some of SPE’s achievements. But one by one, most of the main ideas of SPE were abandoned during the 1970s and 1980s. Why bother now with what might seem to be dead ideas? The point is that SPE represents a widely agreed point of departure, setting new standards, and remaining a benchmark against which other analyses can be, and are being, judged. But precisely because of the detail and explicitness, it laid the basis for corrections and improvements on all points.

One of the first SPE ideas to go was the model of stress. According to the SPE model, stresses could, in principle, go anywhere. But now, after a large number of languages have been examined, it is clear that in languages with systems of word stress, this often works like a beat, with stressed and unstressed syllables alternating with one another, in what are now often referred to as ‘feet’ (translating a terminology from classical Greece). A number of models have now been developed, each seeking to capture the seemingly significant alternation.

Similarly, it soon came to be realised that the syllable, not recognised by SPE, needed to recognised as a category, with what seems like a markedness variation across languages, with English allowing highly marked sorts of syllable structure, with very uncommonly complex syllables in words like scrimp, strength and strange. At the opposite end of the scale of syllabic markedness there are languages, such as those characteristic of Pacific islands including New Zealand, which allow only one consonant before one vowel.

But little of this discussion has attracted the attention of English-speaking speech pathologists who mostly favour what is treated as a well-established standard theory with features as ways of classifying phonemes, rather than the other way round with the phonemes derived from the feature combinations. The order of definition here is not an academic quibble, but something that makes a lot of difference to the practical task of helping children with speech problems. It is hard to find a single reference to SPE in any of the clinical literature. What this misses is that the supposedly standard theory has stolen the clothes of an older, clinically-based tradition by a series of shenanigans.

Because of the shenanigans, Chomsky and Halle did not know anything of the work of William Holder or the other pioneers. But thanks to SPE, there is now lively research in all of its main areas, including word-stress, feature organisation, markedness, and now the syllable, all matters of great concern for clinical linguistics.

M, N, B, D

About 2,600 years ago, someone in what is now Greece or Italy had the clever idea of making letter shapes fit the sounds they represented, with points at the top and downward facing openings for the sounds which are said through the nose, M and N, two points where the mouth is closed by the lips and one where it is closed by the tongue tip, and rightwards facing arcs and a bar closure on the left, for sounds where the closure is complete, B and D, two arcs where the closure is by the lips, and one where the closure is by the tongue tip.

Two and a half thousand years later we still use this design principle for four capital letters. The system is simple, elegant, graphical, easily understood, and appropriate. But for some unknown reason, this elegant matrix with two rows and two columns was only applied to those four letters, M, N, B and D. Possibly that ancient scholar died before being able to explain the idea to contemporaries.

This cleverness was not developed any further until the system was extended by William Holder in 1669 to cover the whole alphabet. Holder actually shows the idea by a matrix. He clearly conceived of the idea as applying to all languages – as a programmatic suggestion. He initiated a pioneering British tradition of research which was then lost by some shenanigans. Holder's idea has been taken a great deal further by modern linguistics, with on-going work to try and apply it to all the sounds in all of the world’s languages, with the key initiative by Chomsky and Halle in 1968. But M, N, B and D are a natural point of departure – where the fundamental principles are easiest to detect.

Not everybody agrees with the idea that a four celled matrix was identified in the ancient world. The original idea may, of course, have been by happy coincidence. But I prefer to believe that if we can see genius in the ancient world, we should recognise it, and respect our predecessors by assuming that when they got things right, this was not by chance, even if we have no way of ever finding out who this was.

Someone special

Keeping a diary of your child learning to talk

You are special to your child, and your child is special to you.

The most important single thing that you can do to help your child learn to talk is just to enjoy his or her company in conversation, using whatever language you feel most comfortable using.

A diary provides a uniquely useful record of language-development. If your child needs help, your diary will help to show this. But it is also a fun thing to keep. An interesting job.

New and interesting

Pick and choose what you write down. Perhaps once a day, perhaps more often, perhaps less often, you hear your child say something that sounds new and interesting. Write it down as soon as possible. If you possibly can, make a note about what was happening at the time. And always remember the date.

25 December

"Big Sock"

(On seeing Christmas stocking)

Many words may not come out quite right.

The most important things to get down are the words and bits of words that make a difference to the meaning. And you’re the best judge of what these are. Try not to put in any extra bits. Notice whether there are any words or bits of words being left out or said in the wrong order. It is important to record both what a child gets wrong and what he or she gets right.

Keeping a diary should not get in the way of fun conversation. It can increase the fun, like taking pictures when you’re out with friends. But as with anything else, if you’re enjoying your diary, you’ll keep it that much better.

Step by step and on the spot

Your diary should be as accurate as possible. It is always best to write things down immediately. You may find that the easiest way to do this is to keep a pocket notebook with you while you’re with your child.

It’s your privilege to be in the right place at the right time to hear your child talking about whatever is for him or for her, the most interesting thing in the world. This can be anything at all – teddy, a treat, or something on television. What you’re doing is describing a process which naturally happens in stages. One important stage is putting two words together – like "Big sock".

Answers to questions

Some of the most interesting things a child says are in answers to questions.

Someone asks where the child is, or what the child is doing. And the child answers with a single appropriate word. Write down the conversation. Show who is saying what.

Me: "What are you doing?"

Child’s name: "Paint"

Such answers are important because they show that the child has correctly understood the question.


Sometimes someone says something or asks a question which you wouldn’t expect your child to understand. But your child answers in a way which seems to show that he or she has understood.

Suppose your child has a brother called John, and your mother asks what he is doing. John is playing a guitar. But your child has never previously commented on what John is doing. Record the words you think you hear.

Grandma: "What’s John doing?"

Child’s name: "Play guitar"

Here your child seems to have both understood the question and given an appropriate two-word answer.

Two-word sentences answers of one or two words to simple questions are typical of children learning English between eighteen months and two years old.

Some months later, suppose your child has an older sister called Angelitsa, often known as Litsa, who plays drums in a group, and tomorrow you're planning to take Angelitsa and her drum kit in the car. You hear something, but you can't tell what it was supposed to be. Use a capital letter X.

You might write down the utterance as:

"Us going take Litsa drum X car tomorrow"

You're better than anyone else at guessing what your child means, working out which words are missing, etc.. But later it can be hard to remember.


(After watching Angelitsa sorting out her drum kit in the living room the night before her group's first gig)

Questions by a child

If your child asks a question, record, not just the question, but the answer which seems to satisfy the child's curiosity. Make this clear by adding a note about what seems like the end of the conversation.

Child’s name: "Why birdies fly?"

Me: "To get to their nests."

Child’s name: "Why, Mummy?"

Me: "By flapping their wings."

(child seemed satisfied)

Your child is probably confusing why and how. Your note helps to show how far your child has got in the process of learning what words like how and why mean.

Other questions, of course, ask just what the words say. When these questions are new they are especially interesting.

Some things to think about


    • Parent(s), guardians
    • Grandparents
    • Siblings
    • Language(s)
    • Big events

  • The big picture

    • With speech? The sounds, the rhythm, the intonation?
    • Putting words into sentences?
    • Voice? Hoarseness?
    • Fluency? Stammering? Stuttering
    • Any stops, slow downs, reversals in the acquisition process?
    • Beginning of the problem – from the beginning?

  • Speech and Language

    • First single words? Putting words together?
    • Consistency in the pronunciation of particular words?
    • Understandability, to whom?
    • Any signs of frustration at not being understood?
    • Play with others?
    • Understanding and responses to questions?
    • Liking like to talk or tell stories?
    • Difficulty moving tongue up and down, round, and side to side
    • Difficulty with long words
    • Ideas, creativity, curiosity?
    • Any assessments by a speech and language therapist?
    • Diagnosis?
    • Treatment, by whom, for how long, with what effect?

  • Reading and Writing

    • Interest in letters or words?
    • Confusion between similar-looking letters: d – b, u – n, m – n?
    • S back to front?
    • Confusion between similar-sounding letters: v – f, s – z?
    • Confusion between mathematical symbols (plus – minus, etc)?
    • Letter reversals in words: was - saw, now – won, left – felt?
    • Inconsistent reading – correct one time, wrong the next?
    • Gets the order of words wrong?
    • Confuses small words: a, the, of, for, from?
    • Difficulty in keeping the correct place on a line?
    • Reads the words, but does not understand?
    • Capital letters omitted or in the wrong places?
    • Dots on ‘i’s, crossed ‘t’s omitted?
    • Difficulty forming the letters and numbers?
    • Difficulty getting letters and lines pointing the right way?
    • Difficulties with punctuation and paragraphs?
    • Tilts head while reading or writing?
    • Squints while reading or copying things from the board?
    • Confusion between left – right, east – west?
    • Holds a pen, pencil, brush too tight? With an odd grip?
    • Difficulty copying?
    • Difficulty sequencing the alphabet, days, months, tables?
    • Obvious 'good' and 'bad' days, for no apparent reason?
    • Confusion between directional words, e.g. up/down, in/out?
    • Difficulty learning nursery rhymes?
    • Difficulty with rhyming words, e.g. 'cat, mat, sat'?
    • Problems understanding what he/she has read?
    • Generally slow writing?

  • Other Indicators

    • Accusations of not listening or paying attention.
    • Difficulty with clapping a simple rhythm.
    • Lack of self-confidence / poor self-image.
    • Need for repeated instructions, orders, telephone numbers.
    • Difficulty with planning and writing essays.
    • Difficulty processing complex language
    • Difficulty understanding the words of songs
    • Has difficulty saying words in a new / second language
    • Understands text only as a continuous narrative
    • Explains things only as narrative
    • Gets confused by diagrams with labels
    • Remembers what to say only in fragments
    • Loses track of changes of focus in conversation

  • History

    • Pregnancy and birth
    • Birth weight?
    • Any complications?
    • Infancy and early childhood?
    • Any infectious diseases, major injuries, or surgery?
    • Any hospitalisations?
    • Any relevant / significant diagnoses?
    • General health?
    • Ongoing treatment of any sort?
    • Any family history of issues with speech/language/literacy/hearing?

  • Personality and General Skills

    • Personality and general behaviour?
    • Difficulty getting dressed, putting shoes on correctly
    • Hearing? Testing? When? With what results?
    • Sensitivity to noise? what sort(s) of noise?
    • Difficulty with lists, ordered instructions?
    • Understanding what is said in noisy places?
    • Feeding, eating, and drinking?
    • Likes and dislikes in food and drink?
    • Any diagnosis in relation to the anatomy of the mouth?
    • Dummy? Thumbsucking?
    • Anything inappropriate getting put into the mouth?
    • Brushing teeth?
    • Strength and endurance?
    • Co-ordination and balance?
    • Hopping, skipping, running, jumping, kicking, catching?
    • Knowing which hand to use?
    • Drawing/coloring/cutting?
    • Buttons/zips/laces?
    • Understanding pictures?
    • Using two hands together?
    • First crawling, sitting, standing, walking?
    • Excitability?
    • Attention and concentration?
    • Any problems telling the time?
    • No fixed preference for one hand or eye in relevant tasks?
    • Fussiness about clothes?
    • Reactions to noise, pain and discomfort?
    • Rocking, swinging, cuddling, bouncing?