Recording and processing vocals (for both music and dialogue)

Hey all,

I’m working on vocals for a game project at the moment, so my take and questions on this subject will be a bit tilted into that direction. But vocals are always and interesting subject for me in the context of music as well, so all comments on processing vocals are welcome here.

For music, I usually glitch the hell out of vocals, using pitching, spectral transform and morphing, repeater and stutter effects, envelope-based volume shaping and filtering, reversing parts or applying strange delays, you name it. This is partly due to the fact that I rarely recorded original vocals for music so far.

My situation right now with the game-related vocals is naturally a bit different from that. I have multiple tasks:

  • selecting best takes or combining different takes
  • some cutting, speed changes and warping to make it sound more natural (basically fixing it in post instead of re-recording even though making some new takes would probably be a better option in any case)
  • basic treatment like gating, compression, EQing, De-Essing, saturation and using delays and reverbs to add some space (might delete that part and do it in-game though…)
  • some pitching, spectral morphing and transforming for some specific characters
  • additional effects such as radio interference for specific circumstances

Any experiences with any of these aspects might be helpful if you want to share something.

My main problem right now is more basic, though. I’m recording vocals with my own voice and my wife and we are struggeling a bit with making a dialogue sounding natural and giving different. Any tips or ideas to improve stuff like that?

Experiences with good voice-change software would also be interesting for me.

In general, feel free to discuss anything related to processing and recording vocals in this thread.

Thanks for your interest! <3

The one thing I will say is to make sure to learn the lines, so you’re not reading any of it, which may sound obvious , but if your not a voice actor it’s too easy to screw up the timing. Then practice the lines while doing something else. If you think of most conversations you have, you’re often doing something apart from just sat face to face. This will change the cadence of your speech, as well as the time between replies to each other. As for the other stuff, that will really depend on where the characters are in the game and what they are doing when the conversations take place.


I was going to say something similar to Teknomage. Whenever I have to do public speaking, I practice the speech beforehand looking at myself in a mirror. I do this until I can deliver the speech/lines 2-3 times in a row in the same amount of time. So, it’s not just memorization of the lines, you have to get to a point where you can deliver them consistently. Once you have that down, if you listen to a take, you can hear something and say “oh I didn’t like the way I delivered the last 3rd of that” and you’ll have the capability on the acting side to go do new take where you get the first 2/3 the same as before and you only change the last 1/3.

At first it’s hard, but progress will come quickly with practice. It used to take me 2-3 hours to get a 5 minute speech down and now I can do more like a 15 minute speech in one hour.

1 Like

Thank you both very much for the comments! You have convinced me to record more takes until it works instead of trying to fix everything in post… We have planned another recording session (again for the very first two dialogues lol). Gonna let you know if we get better at all hehe…

1 Like

De-essing is shit. Seriously. You have to automate it so much so that it doesn’t f-up the parts that don’t need it that you might as well…
…automate the volume anyway :smiley:

That said, I still use it. But in very small amounts.

Something like Melodyne lets you reduce volume level on parts very specifically so I find I can target those nassssty sssibilantsssss easily and accurately reduce their nasssstiness. By the time I’ve done that I very rarely need to apply any plugins to take care of it because its been done directly on the source. This takes time though and depending on the project may not be worth your time. If its a 3-minute song, sure, have at it. If its 30 minutes of talking then bollocks to that. Set the plugins loose and lets slip the dogs of… erm ok going too far now.

I also like Melodyne for removing breathing. These types of product are usually regarding as a tuning tool which of course they do, but that’s not really what I love it for. Half a bar of you intaking breath before you blurt your awesomeness? No problem. Snip. Gone. Then the last part just before the proper voice thing occurs can be reduced in volume so all you get is the start of the first word, no preamble. Almost removes the need for gating, or if needed at all, reduces the amount it is needed.


Both my wife and me were sick for a while, so we couldn’t record anything new, but I was listening a bit more to the vocals already recorded and there definitely needs to be some manual volume editing for breath removal and in general, so doing the deessing at the same time might be a solution. I’m not sure if I completely do it without deessing though. I have mixed experiences with it, sometimes it destroyed the sounds, but depending on source material and the specific plugin used (how many bands, threshold and so on) it can make sense in moderation imho. I have to specifically test it against volume automation in the DAW or in Melodyne, but I can imagine that it doesn’t matter much for sibilants that the volume is just reduced in contrast to dynamic EQing, MB compression or whatever is used for deessing, since the sounds just have to be tamed a bit in general. Automating the deesser would be another solution, it might be the best of both worlds since it doesn’t lower the volume in general but only specific frequencies. Really have to do some checks soon.

Thanks for your input in any case, much appreciated!