|Video to audio synchronization within a Pro Tools workflow|
(c)2005 by Richard Fairbanks
Please do not quote or reprint without acknowledgement
With all the recent discussion about video monitor latencies, firewire video delays, Pro Tools’ movie sync offsets, and session delay compensations, have you been wondering if your video and audio are REALLY in synchronization with each other? I think it is worthwhile to look at some of the strengths and weaknesses of Pro Tools’ supported video options. Some are better than others. Most of the concepts I discuss are relevant to any system but some specific details are intrinsic to Pro Tools and Apple. In particular, Apple’s Quicktime technology has become a commonly used set of tools to interface video pictures easily within the Pro Tools application. It comes as a free component of Apple’s operating systems and has become wildly popular as the low-cost entry level solution to the problem of working with audio and video simultaneously. Pro Tools has maintained Quicktime support for many years while Apple has done a good job of providing forward and backward compatibility for nearly all aspects of Quicktime-enabled media. Quicktime solutions would seem to be ideal for adding video to a Pro Tools session but there are hidden compromises to synchronization quality. “You don’t get something for nothing”, the old adage goes. A few specific results are noted at the end.
During the fall of 2004 I began feeling strong frustration with Digidesign’s choice to suspend forward development of their top-of-the-line Avoption video solutions, with no replacement or upgrade path made available to Apple users. I cherished my AVXL. Users like me were forced to use older software versions, where bugs abound, or find alternate methods. As an interim solution I turned to Quicktime DV movies which promise reasonably good images with inexpensive firewire DV-to-NTSC converters, only to discover that there were synchronization problems. I began to suspect that the problems were not occurring in a predictable way and therefore could not be easily solved. My inability to quantify the problem was directly due to my lack of a method to measure the relationship of video to audio across time. Most people simply line up audio beeps or clicks to video frames, such as count down sequences, and rely on their senses to tell them when audio and video are perfectly in sync. This is a highly unreliable method to say the least, and trying to see drifting over time is nigh impossible! I had been using a kludge-y system which included video tapes as references, but it was a sort of “go/no go” system. If synchronization was not perfect I could not exactly say how much error there was. Out of these frustrations I designed a solution to detect audio and video automatically and display timing differences between them. My method is independent of system hardware and software and can examine any system on any platform.
I measured several Pro Tools systems and one Avid Express Pro, with different hardware and software combinations. My results are presented here in hopes that YOU will find them informative and perhaps useful for making your own decisions and video hardware choices. This project is not a paid endeavor, nor do I or my company profit from revealing the results of my tests. My system results may not match your system, but reasons behind them will apply to you and everybody else.
I now offer for sale a limited number of handheld units which make it easy for you to conduct your own tests. Syncheck™
If you want a quick recommendation, I have two. Use a Quicktime window on your CINEMA DISPLAY (set movie sync offset for 3 quarter frames). This is a low latency, surprisingly accurate solution. If you do not want a movie window floating on your desktop (I certainly do not!) you must buy hardware, specifically Digidesign's V10 or MOJO, Doremi V1, or Gallery Virtual VTR. With ANY of these hardware choices, black burst genlock is mandatory for both audio (via SyncIO) and video (via reference fed directly to the video hardware). DV stream converters I have tested (Canopus ADVC100, Dazzle Hollywood DV Bridge, Miglia Director's Cut) are NOT ACCEPTABLE solutions because they can be out of sync by up to two frames, depending on session and automation complexity. This sync error is in addition to any delays by a video converter, monitor, and buffer circuits, and cannot be compensated for because it happens at unpredictable times.
|Some synchronization errors and possible causes|
For my tests, video was always played from an NTSC DV stream except for one test performed through a Miro DC30+ with motion-jpegA compression (the Miro was also tested with the DV stream). The type of video display being measured and its refresh rate are important. While analog CRT (“tube”) video monitors have the shortest latency (amount of time after a video signal is received until it is displayed), but the latency is not “zero” as is commonly thought. Their latency is designed into our interlaced video system because they must “scan” an image onto the screen one thin horizontal line at a time, starting at the upper left corner and eventually finishing at the lower right. This top-to-bottom scanning happens twice per video frame, once per video field. The process which requires two top-to-bottom scans to display a full video frame is called interlacing. An entire NTSC video frame takes approximately 33ms (1 second divided by 29.97 frames per second), which means a full top-to-bottom scan, a field, requires about half of that.
Display monitor refresh rate is also an important factor. The slowest refresh rates occur with standard definition PAL and NTSC monitors, which are locked by design to the video field rate. NTSC monitors refresh at 59.94 hertz, which is exactly twice the NTSC video frame rate. Most computer monitors have a faster refresh rate, 80hz or greater is common, although some multimedia monitors with video inputs are designed for only 59.94 to 60hz. Unless a monitor’s refresh rate is a precisely-locked multiple of the video frame rate there will be additional timing errors introduced by computer and display software and buffer components. No computers (except for a select few expensive graphics systems) have a provision to lock the monitor circuits refresh rate to the video signal’s frame rate, so you can expect to find errors with almost all computer system monitors. These timing errors have an interesting effect on video images displayed on them. Without going into the myriad of reasons why (some of which I do not understand), let me observe that video played on a computer’s monitor through Quicktime has video-to-audio sync errors that tend to drift back and forth between two values. The maximum amount of error is either one half frame (17ms), or one whole frame (33ms), depending on display type, refresh rate, and perhaps buffer circuits. The drifting can be a slow back and forth shift between upper and lower limits, or a single direction drift from one limit to the other with a “jump” at the cycle point, or a combination of both.
I can sum the two preceding paragraphs by saying that video signals are best presented by video monitors through a system which is locked perfectly in step with the video frame rate. Any attempt to view video on a non-synchronous display, such as your computer’s monitor, will introduce timing errors. The amount of error with very low latency and very fast refresh displays is small, thankfully. Apple’s Cinema Displays are good examples and tend to have a latency less than a frame with only one half frame drift. In contrast, I have a Viewsonic multimedia LCD display connected as a second monitor on my G5,through a DVI cable. It refreshes at only 60Hz. I measured Quicktime video displayed on it to lag behind audio by 1 frame of latency, plus another 0 to 1 frame of drift. In other words, Quicktime video displayed on it is always one to two frames out of sync!
Pro Tools can introduce other variables which are added on top of those just discussed. As I’ve already said, Pro Tools is able to send video data through Quicktime, a set of media presentation tools built into Apple’s operating systems. Most low cost video solutions use Quicktime in one way or another, thus Quicktime serves as a bridge between Pro Tools and third-party video solutions. It should come as no surprise that a certain amount of processing time is spent by Quicktime and these third party devices, and inevitably results in video which is displayed later than intended. Digidesign has thoughtfully included a “movie sync offset” parameter to compensate for these processing delays. The parameter is adjustable in quarter frame increments, instructing Pro Tools how far in advance to output video data so that, after processing delays, the video can be viewed at just the right moment. In fact, this works well to compensate for much of the potential error. It does NOT compensate accurately for variable errors. I have already discussed the drift variable, which occurs with mismatched frame rates and refresh rates. It cannot be corrected by the “movie sync offset” setting, merely changed to happen earlier or later.
There is are two more variable errors introduced by Pro Tools using Quicktime which cannot be effectively compensated for, either. One of these variable errors occurs as soon as you press the play button. I call this start error. Pro Tools will begin to play audio as soon as possible after you press play. Under most circumstances there appears to be no attempt made by Pro Tools to ensure that the audio timeline’s frame grid boundaries are aligned to the video signal’s frame boundaries. Pro Tools assumes that the video will be pulled along in perfect sync by Quicktime, even if Quicktime is unable to do so. In other words, if you place a “blip” at precisely 1:00:00:00 on the Pro Tools timeline, there is no guarantee that it will play beginning precisely on a video frame. The result is that each time you press play your absolute audio and video sync relationship will likely change. It might be exactly correct, but it will most likely be out of sync by any amount up to a full video frame (for hardware converters). Once play has begun and an absolute start error is established, the relationship tends to maintain until the next time play is pressed. In practice there is almost no drift between the two so that if playback started with a ¾ frame start error, for example, that amount remains.
“Doesn’t house sync take care of that?” Not entirely. Your SyncIO/USD will lock your session’s sample rate to house sync, which is a good thing. The only times Pro Tools attempts to align its audio frame grid to the video frame boundaries is when Pro Tools is online and generating time code through a SyncIO/USD, or Pro Tools is online and chasing external time code. Otherwise Pro Tools seems to ignore the actual video frame boundaries whether a SyncIO or USD is present or not! As I said, Pro Tools usually assumes that Quicktime will pull the video along in perfect sync, which it does fairly well in a floating window. Video from a DV stream converter or PCI accelerator card is not as lucky, possibly due to an extra buffering stage that is locked to the video output circuit. You should also remember that because nearly all affordable DV stream converters do NOT genlock to a black burst reference, you must lock your Pro Tools to them to stand any chance of aligning audio and video frame boundaries.
In my tests, with Pro Tools HD locked to a DV converter’s output, I could not get perfect results. The random nature of audio-video frame edge sync was eliminated, but replaced by a potential full frame error! That’s right. No matter how I tweaked the movie sync offset parameter, I could not eliminate the possibility that video might be out of sync by a full frame. I found no solution to the problem.
Unfortunately, this is not the end of our troubles with DV movies. Over and over with a couple of my mixing sessions, I saw video SLIP later by one and sometimes two frames during playback, but never more than two frame. This is disturbing because when it happened, and it happened a lot, it was during times of only moderate automation use with moderate to busy track activity, circumstances that become more likely as we build work in a session. If I temporarily suspended automation and played the same session there was no slippage, nor did I see slippage with heavy automation on a small number of audio tracks. When slippage occured, in both 16 bit and 24 bit sessions, never was I close to the supported track count limits. My use of automation was not always heavy, rarely extreme. Once Pro Tools allowed the video to slip behind it stayed behind until playback was stopped and started again. After starting again, the video would begin in sync (within the above mentioned start error window) but drop behind again within seconds, if the session was still busy. I was also surprised to discover that this slippage did NOT happen when a DV stream movie was played in a floating QuickTime window, but only when using a DV converter to output to an NTSC monitor. Changing Pro Tools' quicktime priority setting did not affect things. I do not understand why slippage was limited to only two frames. Is this possibly a Pro Tools bug?
I also tried a Miro DC30+ with OSX drivers. Once again there was that +/- half frame random start error at each playback. I tried both DV stream and motion-jpegA codecs. DV stream was good. (I have previously noted a very hefty cpu hit with this combination, which can limit larger track counts.) Motion-jpegA was a different story. One or two frames per second were misplaced by a frame! This might be a byproduct of the compression codec's ability (lack of, rather) to deal with my test video signal. I recommend using higher resolution video playback if at all possible.
Finally, there are more possible sync errors introduced by the video display's actual latency, which varies by manufacturer, type, and model, and delays in the audio monitor path (including distance to the listener). These are mostly fixed delays that can be accurately compensated for once they are known. You can examine audio and video simultaneously on a 'scope, which is a bit inconvenient to set up, or my method if it becomes commercially viable (that is, if I do not loose money). Judging the delays by relying on your unaided eyes and ears may be wildly inaccurate.
I did not specifically look for variations between different computers or operating system versions, if there are any. This was not a test of drive performance. While audio generally played from 10K SCSI (two cases were also tried with firewire 400), I tried both SCSI and ATA/SATA drives for video, with seemingly no differences. You should only consider my results as a starting point for your own tests. Movie sync offset values I give are a “best compromise”. I tried to split any drift equally ahead and behind the zero point. Start errors were only seen with video converter hardware such as Miro DC30plus and DV stream converters.
MOJO and Avid Express Pro
CRT NTSC interlaced display, no drift or wobble. Video measures 2ms behind audio (as it should be, due to scanning across the tube’s screen as I explained above, and accounting for the monitor’s overscan). It is worth noting that the simultaneous picture in the application’s monitor window on a 22” cinema display is ¾ frame AHEAD of audio, with ½ frame “drift”. This appears to be built in, not adjustable.
MOJO and Pro Tools 6.6r2
CRT NTSC interlaced display, no drift or wobble. Video measures 2ms behind audio (as to be expected, due to scanning across the tube’s screen as I explained above, and accounting for the monitor’s overscan). Movie sync offset setting not available with MOJO.
Doremi V1m and Pro Tools 6.4.1/USD, serial time code
NTSC on a CRT display was just as accurate as with Mojo. Movie sync offset=0
Apple Cinema Display, Quicktime floating window, PT 6.7 and 6.6r2
Two systems checked: G5/dual2.5G with 22” cinema display, and G4/dual1G with 20” cinema display. ½ frame “drift”. No change with movie window in different screen locations, but there is a top-to-bottom scanning delay within the movie window. Movie sync offset=3
Miro DC30plus and OSX driver, and Pro Tools 6.4.1
DV movie file, NTSC on CRT display, no appreciable drift, up to 1 frame start error. Movie sync offset=3.
Miro DC30plus and OSX driver, and Pro Tools 6.4.1
Motion-jpegA 215 Kb/sec movie file, NTSC on CRT display, no appreciable drift, up to 1 frame start error, additional 1 frame timing errors noted (see text). Movie sync offset=3.
Canopus ADVC100, Hollywood Dazzle DV Bridge, Miglia Director’s Cut
3 to 4 frame latency with up to 1 frame start error. Additional slippage of 1 or 2 frames possible during moments of heavy audio processing demands. Movie sync offset 15 to 17, system dependent.
Monitor Displays Tested
ALL CRT “Tube” video displays
Zero latency plus scanning delay from top to bottom (up to ½ frame). These serve as our “standard”.
Apple Cinema Displays
Tested as part of a system, above
Mitsubishi X390 LCD projector, NTSC component from MOJO
38 ms latency plus scanning delay from top to bottom (up to ½ frame). No drift.
Viewsonic VX900 19” multimedia monitor, DVI connection, Quicktime floating window
1 frame latency plus 1 frame drift. No difference from top to bottom of picture. Movie sync offset=6
Viewsonic VX900 19” multimedia monitor, NTSC composite from MOJO
2 frame latency plus scanning delay from top to bottom with ½ frame drift. This drift is not from Pro Tools or MOJO. I believe the monitor does not lock it’s refresh rate to the incoming video, unlike the Mitsubishi projector above, and the drift is caused by a mismatched video frame rate to monitor refresh rate.
|Thank you ForianE and Chief Technician!