Bilabial Substitution Patterns during Consonant Production in a Case of Congenital Aglossia

Purpose: Congenital aglossia is a rare syndrome in which an individual is born without a tongue. The present paper examines articulatory details of the production of multiple consonants by an aglossic speaker. Method: Real-time magnetic resonance imaging data of the upper airway were collected from the aglossic speaker. Air-tissue boundaries were determined from the video sequences using a segmentation algorithm, and dynamics of vocal-tract constrictions and cross-dimensions were calculated. Results: The aglossic speaker produced the consonants /t, d, th, l ,r, f ,v, s, sh/ with a bilabial closure instead of a normal lingua-alveolar closure; however, in /t/and /d/ the overall vocal-tract configuration presented a cavity anterior to the constriction, which filtered transient and frication sources in a manner similar to normal alveolar production. Conclusion: The aglossic speaker, lacking a tongue apex, has developed a bilabial compensatory strategy to produce multiple consonants with her lips.

McMicken and her co-authors [13][14][15][16][17][18][19][20] have reported extensively on research from cinefluorographic films and audio-visual (AV) recordings collected in 1986 on a 16-year-old female PwCA. This research included perception of vowel production [14], perception of consonant production [15], and cinefluorographic examination of articulation [16], in addition to current re-examination of the PwCA, now in her 40s, with electropalatography [17], videoflourography of articulatory movements of pseudo tongue, hyoid and mandible [18], and speech and swallowing kinematics [19]. It should be noted that the PwCA was a co-author in previous articles and is currently the subject of this paper. Because of the attention research on this topic has garnered, McMicken and her co-authors have become aware of more individuals with congenital aglossia, and these individuals present opportunities for future research. As such, the subject of this article will henceforth be referred to as PwCA I, and future subjects will continue this cardinality as research on this narrow topic expands.
The above investigators are consistent in their comments that the spoken output of congenital aglossic speech was intelligible, but demonstrated some vowel and consonant distortions. Such findings would support theories of the speech motor control system as being engaged during even the simplest speech tasks, such as babbling or imitating sounds, and hence would support speech production in which acoustic targets could be achieved with different articulatory strategies to realize intelligibility through compensation [20,21]. Rosenthal [9] commented that speech in the congenital aglossic may improve considerably when the child learns to use other muscles or structures to substitute for the missing tongue. There have been several cases in which investigators report that absence of a tongue, or the presence of a rudimentary one, may be compensated for by hypertrophy of the floor of the mouth. Salles et al. [10] described a case of congenital aglossia in which a female Brazilian speaker elevated the posterior portion of the floor of the mouth to contact the palate, allowing her to develop speech and swallowing functions. They suspected the mylohyoid was the primary muscle of movement. However multiple consonant distortions were reported in this case. McMicken et al. [14,15] reported an overall intelligibility from the 1986 AV tapes of PwCA I of vowels at 78.5% and initial consonants 77.3%, with considerably variability depending on context.

Background
The PwCA I was re-examined by McMicken et al. [16][17][18][19] and articulatory movements for speech and swallowing were studied using electropalatography (EPG) and videofluorographic films. These results suggested PwCA I may present with the capacity to capitalize on a variety of actions during eating or swallowing to optimize the more refined movements of speech. There were highly predictable correlations of muscles for deglutition and speech, which suggested that this speaker used the muscular actions developed for deglutition to enhance speech and resonance. The 2014a [17] EPG study noted bilateral lip electrode activation in 7 out of 8 trials of /t,d/ productions as well as other consonants. The 2015b [19] videofluorographic study suggested production of the lingua-alveolar stop consonants /t/ and /d/ may be possible through a substitution pattern of lower incisors, lower lip, and mylohyoid to constrict the anterior oral cavity for intelligible production.
In 2015, and again in 2016, a collection of real time MRIs (rtMRI) were acquired on PwCA I. These rtMRIs allowed for an in-depth analysis and a greater understanding of the importance of a pattern of bilabial constriction in consonant production for PwCA I. Since it was obvious from the rtMRIs that bilabial constriction was utilized for multiple consonants, the question arises: what are physiological differences between the consonant productions that allowed for intelligible perception?

Research Questions
1.Is there an explanation that would account for intelligible perception due to similar physiological production of /p,b/ and /t,d/?
2.What is the degree of constriction and duration characteristics of multiple consonant productions?

Methods Subject
PwCA I is a 46-year-old woman born without a tongue. PwCA I presented with micrognathia (severe Class II malocclusion) and the absence of the tongue. Intraoral inspection revealed a wart-like tongue rudiment in the region of the tongue root. She also presented with additional anomalies related to the development of the middle and lower thirds of the face, whose detailed description is beyond the scope of the present work and will therefore be dealt with in a separate article that will specifically discuss the case.
The absence of the tongue was compensated for by the fact that the floor of the mouth (mylohyoid) and base of the tongue were hypertrophied and could independently and symmetrically be elevated to contact the palate during speech and swallow [18,19]. There was no evidence of hyper-or hypo-nasality as a resonatory characteristic [14,15]. In oral-peripheral assessment, range of motion of jaw and lips demonstrated what would be considered as normal mobility. She reported never receiving oral medical, surgical, or therapeutic intervention, and additionally reported that all speech and feeding milestones were met without assistance other than the use of a bottle with a longer stem and wider nipple during infancy. The following link demonstrates the intelligibility of the subject while reading the "Rainbow Passage:"

Stimuli, data acquisition and analysis
Speech stimuli consisted of multiple phoneme combinations, words, phrases, and sentences read by PwCA I. Speech responses were acquired using an rtMRI protocol developed specifically for the dynamic study of upper airway movements [22,23]. Experiments were performed on a GE Signa Excite 1.5T scanner with a custom eight-channel upper airway receiver coil. The airway of was imaged in the midsagittal plane using a multishot short spiral readout spoiled gradient echo pulse sequence (flip angle: 15 degrees; slice thickness: 6 mm; readout time: 2.5 ms, repetition time: 6.004 ms). The rtMRI data were reconstructed from two anterior coil elements, which had the most sensitivity to the upper-airway. Video sequences were formed at a frame rate of 83 fps (spatial resolution was 2.4 mm/pixel) using a view-sharing reconstruction scheme. Audio was simultaneously recorded at a sampling frequency of 20 kHz inside the MRI scanner while the subject was imaged, using a custom fiber-optic microphone system. Audio recordings were subsequently noise-canceled [24], then reintegrated with the reconstructed video.
The reconstructed videos were subjected to a modified version of the segmentation algorithm, by Bresch and Narayanan [25]. A midsagittal morphological template of the upper airway was constructed. A hierarchical gradient descent procedure then registered this template to each rtMRI video frame to approximate the sagittal air-tissue boundaries as polylines. The minimal distances (constriction degrees) between six regions in the outer vocal-tract walls (passive articulators) and the opposing inner wall surface (active articulators) were measured. This involved manually annotating for the speaker places of passive articulation as polylines of air-tissue boundaries. For each frame then, computed constriction degrees (bilabial, alveolar, palatal,   velar, velopharyngeal, pharyngeal) were measured as the minimum distances between outer and inner wall polylines. The sagittal air-tissue boundaries of each frame were also subjected to a modified version of the algorithm proposed by Maeda and Laprie [26], which, by progressively fitting circles along the VT-length from the glottis to lip opening, finds a midline of the vocal-tract, as a set of points that are equidistant from its inner and outer walls. A by-product of the algorithm is a set of lines that are perpendicular to the midline and cross it at the midpoints. This set of mid-sagittal cross-dimensions, and their positions along the midline, can be regarded as a valid two-dimensional counter part of the concept of area function, which is known to determine the acoustics of a given vocal-tract configuration. Figure 1 below demonstrates methods of acquiring air tissue boundaries on a normal and PwCA as well as measured constrictions on the PwCA. Analysis for the current paper was limited to VCV production with the vowel /a/. The accompanying videos and charts demonstrate the subject producing VCVs, computed bilabial consonant constriction, vocal tract shape at constriction and mid-saggital cross dimensions at

Results
Initial visual inspection of the video sequences focused on production of /t/ and /d/. The PwCA I forms a bilabial closure during production of /t/ and /d/, rather than an apicoalveolar closure, which would be expected in normal speech production. To verify this point, time-series describing the progression of the bilabial and alveolar constriction degrees, during the production of the VCV's /ata/ and / ada/ for the PwCA I, were examined. It was confirmed that the bilabial constriction degree reaches a minimum value near zero for the consonant production, while there is no change in the alveolar constriction that can be associated with the consonant production ( Figure 3).
A similar pattern of bilabial production was noted during VCV expression for the consonants /f,v,th,l,r,s,sh/. In each case, lip closure was noted on the rtMRIs at this moment of consonant constriction and analysis of the data revealed zero or near zero ratings at this moment, which was the observed closure pattern for each of the above consonants. There was a difference in the duration and degree of constriction of lip closure (Table 1). These differences are a clear indication of the varied articulatory movement, which were the hallmarks of bilabial compensatory patterns.

Discussion
It was clearly determined that the articulatory positions for /t,d,f,v,th,l,r,s,sh/ were determined by the lips in the PwCA I. One question is how a bilabial closure gives rise to a sound that is perceived as a /t/ (or /d/) rather than a /p/ (or /b/). To answer this, the midsagittal cross-dimensions of the vocal-tract were examined, as found by the midline derivation algorithm, at the exact times when maximum consonantal constriction takes place. The particular configuration of the lips of PwCA I creates an additional cavity length, anterior to the identified closure, which is actually located at the posterior area of the lips. The anterior cavity resembles that formed by normal speakers during the apicoalveolar closure. As Stevens reported in 1993, for transient and frication sources during the release of a plosive, the transfer function is dominated by the cavity anterior to the constriction [27]. There is no cavity for normal bilabial production while the cavity for alveolar and velar consonants can be in the range of 1.5-7cm. The cavity formed by the PwCA I during intended /t/production has a length of 1.3 cm (similar results were derived for /d/). Seen from this viewpoint, the specific production by the PwCA I is closer to a normal speaker's alveolar rather than a normal speaker's bilabial, and is thus perceived as an alveolar.
Research into the cavity dimensions for /f,v,th,l,r,s,sh/ consonant production will be investigated in the future. It was obvious from the analysis that each consonant is made with a difference in lip closure and duration as well as degrees of constriction. The explanation will require further analysis of the unique models for consonant production of the PwCA I.
These results are consistent with findings by McMicken et al. in a study investigating listener confusion of the intelligibility of a PwCA I as a function of semantic and phonemic variables [15]. In that investigation, researchers found: (1) there was a confusion of the alveolar stop /d/ for bilabial productions about 50% of the time; (2) while the /d/ was well recognized by listeners, it was acoustically different from the other sounds produced by the PwCA I; (3) the coefficient of /d/ regression was higher than that of typical speakers' /d/; and (4) that 30% of bilabial stops preceding the vowel /iy/ were perceived as alveolar stops. All of these acoustic and perceptual findings are highly consistent with the rtMRI findings in this current study. Another notable finding is that the authors in 2013 prematurely concluded that the PwCA I was not using potentially available articulatory maneuvers such as lip movement to generate high intelligibility. Rather, the authors concluded that the /d/ production may have been be dental-alveolar, which would account for a close consonant constriction. These new findings run counter to that suggestion, notably as a result of the type of imaging employed in the current study (rtMRI). Therefore, these new findings do support theories suggesting that adaptive movements will be made by speakers to make their acoustics more typical.
The application of this research to rehabilitation of the individual with oral-facial involvement should be obvious in that we have clearly demonstrated unique compensatory mechanisms, which allow for intelligible speech. The importance of an acoustic rather than visual model for stimulation of the client cannot be overemphasized. A visual model may confuse the potential compensatory abilities of individuals with aglossia [28].