Saturday, February 19, 2011

WIP Animation - 11 Sec Club Feb

It seems this week has been a bit quiet around here... that's because I've been busy hacking away on Blender, attacking my stash of todo's. But (and there is a but), I've also been chipping away at this month's 11 Second Club sound clip. What you're about to see is still very much WIP animation (playblast quality) of just part of the clip...

11 Sec Feb 2011 - WIP 01

This is the first lipsync animation I've attempted in quite a while, and especially in 2.5. When I stumbled across the clip while just doing some routine bookmark probing, I figured that it'd be perfect for playing with using the rig I've been doing various animation tests with.

Having gone through this process a bit, I can say that for anyone trying to do lipsync in 2.5 at the moment, it's not really smooth sailing yet. Some things to watch out for:
  • Scrubbing sound in the new sound system does not exactly work that great, and has occasional glitches. 
    • It's nice that the new soundsystem play sounds at startup without needing to do any special workarounds to get it going, but...
    • Scrubbing doesn't seem to taking the right sized sound-chunks to use, resulting in more stuttering and overlapping sounds when trying to scrub. Either that, or it's too eager to respond to minor mouse tweaks and start to play again
    • Initially, there were a few times when this system would bug out completely, and would play through the entire soundclip on the slightest framechange, with no way to stop it once it started (though you could restart the sound by changing the frame again while it played away, and you could also reboot Blender to get the playback to stop)
    • The "mute sound" option in the Timeline menus seems to be broken. This looks like it might be an easy fix, so I'll try and have a look at it. 
  • Not all FFMPEG formats seem to be able to actually include sound (at least for playblasts. I've yet to confirm for proper renders)
    • For all other animation tests, I've been using OGG video. However, trying to use this for this animation, I found that I couldn't get any sound output, even after trying the various sound codecs. At this point, I thought that perhaps doing OGL renders skipped sound-rendering, while full rendering wouldn't.
    • On the advice of some experienced artists, I gave XVID + MP3 a go. However, on Windows at least, the XVID option continuously refused to work ("Failed to initialise stream"). 
    • After trying out many combinations (some of which, as is customary for FFMPEG, crashed) I finally found one that worked: H.264 + MP3 in AVI container. Of course, only VLC will play it (and I hope Vimeo will gracefully accept it), but at least it works on my computer :)
    • Yay for the crappiness of FFMPEG yet again as an encoder... hardly anything works!
On the topic of sound, some people have complained about the lack of waveform display in animation editors as being a prohibitive factor for doing lipsync successfully. Personally, I don't really have much use for these yet, as I still haven't found the trick for interpreting where words fall with regard to the waveform shapes (if I did, I might've started work on automated lipsync from wav-file already). Anyways, I think it's doable by reusing the waveform drawing code for the sequencer sound strips, though the big issue really is how do we obtain the waveform to show? Potentially you could have more than a single sound-strip being combined together to produce one "line" of dialog. Suggestions welcome.

Anyways, back to the actual animating.
I'm mostly happy with the way this current snippet of the clip is working now.

IMO, one of the main issues right now is still with the transitions between the main poses. Part of these are probably caused by keying the main poses first with constant-interpolation (and retiming them until they fit nicely with the audio), and then going in and starting to key the mouth shapes, with keyframes added on all channels (not just the mouth controls), such that when I changed the interpolation to smooth, some of the transitions were a bit rough/too snappy.

Another issue that I ran into was the nasty "timing" beast. A principle I've picked up (though perhaps not totally absorbed yet) through various attempts at animating over the years is to really try not to cram in too many different poses/etc. in too short a space (especially when doing your first pass over the shot, when it's really easy to overdo things). Not only would a real body need time to actually perform the movements implied by the keyframes, but the audience also needs to be able to actually comprehend that something happened. If you have too many poses going on, what effectively happens is that you can all too easily get "jittering mush"; that is, a puppet having a bit of a short-lived and meaningless spasm, with all those details that you thought would make a difference really just going unnoticed or seeming surreal.

With regards to taming this beast, one guideline I once read was to avoid having important poses hanging around for less than 5 frames or else they wouldn't be able to "read". I've been experimenting a bit with this, but so far it seems that even 5 frames at times is a bit too little.

Then again, what happens if you're animating a fast-talking speaker? It's obvious that enunciating every syllable like you would if the words were spoken slowly would look a bit extreme if not down-right "wrong" (search for "say it at full speed"), as it'd look like a jittering robotic mess. It'd also be a lot of work to go through keying all those poses (and later tweaking them) ;)

To investigate this matter, I decided to try and google for some "video reference". One particular example I came across was YouTube - fast talking girl!. You can see that her mouth is having quite a workout, clearly forming every word so that it can be heard clearly (and probably lip-read too). However, admittedly, this does look quite over-the-top, especially for more "normal" conversational quick-speech (as in this sound clip). Frame-stepping through this, I noticed that she seemed to be holding each pose for about 2 frames, with maybe a 3rd "transitional" frame between poses. Does anyone have any guidelines for how best to approach the timing of these things?

Overall, it seems that animating lipsync is actually non-trivial at times, especially when trying to get more convincing results. Secondly, doing this takes time: time to actually "become one with the clip", to get a good feel for the timing of things so that you can fit some convincing actions to it (though I wonder how much Softimage's FaceRobot and similar truly automated-lipsync tools can help alleviate at least part of the affairs).

Now, off to fix more bugz, and to attempt the "middle section". Having nearly completed this first section, I think I finally have some ideas for what this part should look like (though the end of this section really suggested only one of a few poses right from the beginning :)

Hasta luego!


  1. Hi ali good job, maybe we can provide the animation editors with a metastrip for audio display instead of only a single audio strip? that would solve your issue. The audio display is key for re-timing shots, when you work for a long time on a clip you end up recognizing this patterns. its your visual clue. It's nt like your gonna see the wave form and guess what facial pose to put there.. thats silly.. it's just the closest thing you have to a visual clue to work with. It reduces the number of play-tests greatly

  2. hey aligorith, I've been doing lipsync as well this week, so glad you changed the marker system to allow easy renaming - makes putting in markers for phonemes much easier! I still find it difficult to read the markers in the Dope sheet though as in this pic especially when they sit on top of keys the text is either obscured by the little triangle, or sits on top of the marker diamond and its hard to pick out one black line from another.

    really hope waveform display comes in as well. esp for long passages it makes it slightly easier to find where each phrase is because I can scan the timeline visually instead of having to read all my markers - agree with ZanQdo.

  3. hey ali
    interestingly in this clip of Eric Goldberg talking about lipsync he has the waveforms printed on his old school paper dope sheet (lip sync discussion starts at 3 mins in).