Finding the acoustic features characterizing basic speech sounds has been a debated mystery for many years. The understanding of normal hearing listeners responses to nonsense Consonant-Vowel (CV) sounds in noise is extremely limited. Correlating confusion patterns and the acoustic cues available at a given signal-to-noise ratio (SNR), using model neurograms, is the key to this problem. We show that listeners use spectro-temporal across-frequency timing cues to discriminate sounds within confusion groups, e.g. that the recognition in noise of consonant /t/ critically depends on a ~20 ± 5 ms high frequency 3-8 kHz burst. Adding masking noise can remove such features, leading to morphing, turning one sound into another. Masking or truncating the burst typically leads to the recognition of /p/ or /k/. Our analysis of /ma/ and /na/ confusion patterns shows that an across-frequency timing difference between high frequencies and mid-frequencies is responsible for the discrimination between the two sounds. We will play examples of plosives /p/, /t/, /k/ and nasals /m/ and /n/, followed by several vowels, where the primary consonant cue has been modified, leading to morphing.