New Ehko technology from MIT and Microsoft Research syncs video and audio for cloud gaming.
Interdevice synchronisation is a continuous problem in cloud gaming as video, audio, and haptic feedback are streamed from one central source to multiple devices, such as a player’s screen and controller, which typically operate on separate networks. These networks are not synchronised, leading to a lag between these two separate streams. A player might see something happen on the screen and then hear it on their controller a half second later.
Scientists from MIT and Microsoft Research therefore created Ehko, to sync the separate networks. Ekho uses the mismatch between noise sequences to continuously measure and compensate for interstream delay.
In cloud gaming, the microphone on the player’s controller records audio in the room, including game audio played by the speakers on the screen, which it sends back to the server. Using this for synchronisation has so far been unreliable because the room audio contains background noise.
To counteract this, Ekho adds identical sequences of low-volume white noise, known as pseudo noise, to the game audio before it is streamed to the player’s screen. It uses these pseudo-noise segments for synchronisation. The pseudo-noise is not heard by the player.
The Ekho-Estimator module adds the pseudo-noise sequences, when it receives the recorded game audio from the controller, it listens for those markers and lines up the streams. This enables it to calculate the inter-stream delay. The Estimator sends that information to the Ekho-Compensator module, which either skips a few milliseconds of sound or adds a few milliseconds of silence to the game audio being sent by the server, which synchronises the streams.
Krishna Chintalapudi, co-author and principal researcher at Microsoft Research, said: “The traditional way of doing this, which involves trying to measure the synchronization error using the underlying network, the errors are significantly larger. When we started this project, were weren’t sure whether this could even be done. But the accuracy we can get down to with Ekho, at sub-millisecond levels, it is unheard of.”
This technique could be used more broadly to synchronise media streams traveling to different devices, such as in training situations that utilise multiple augmented or virtual reality headsets.