Research

Cutting-edge computer vision and deep learning research aimed at evolving the ways we understand and interact with videos.

IMMERSION OF 2D GRAPHICS IN 2D SCENES

3D understanding of visual scenery combined with automatic immersion of 2D graphics

Uru's Immersions technology uses Deep Learning to automatically predict a depth map and normals for each scene inside an inputted video. This understanding allows us to automatically augment surfaces and regions inside those scenes with graphics in a realistic way. By additionally tracking the motion and pose of those surfaces or regions and then accounting for occlusion by other objects as well as shadows and light changes, our augmentations attains a level of photorealism.

This technology can be used to automatically (but realistically) augment videos with graphics (such as ads or emojis) or to generate realistic synthetic data for training Deep Learning models (e.g. logo recognition models).

IMMERSION OF 3D OBJECTS IN 2D SCENES

3D understanding of scenes combined with automatic immersion of 3D objects

Uru's Immersion technology can also be used to realistically immerse 3D models of objects in the scenes inside a video. Since Uru's deep learning models predict the depth map, normals, and focal length of any visual content, we can automatically predict the pose and planar structure of a given surface or region, allowing the realistic placement of a 3D model or object on that surface or region.

This technology can be used to automatically (but realistically) augment videos with 3D models or to generate realistic synthetic data for training Deep Learning models (e.g. object recognition models).

SCENE AND SHOT SEGMENTATION USING DEEP LEARNING

Automatic multimodal segmentation of the individual stories inside a video

Our StoryBreak technology uses Computer Vision, Deep Learning and Time Series Analysis to automatically segment a video into shots and scenes, where shots represent a continuous stream of video coming from a single camera (without any cuts) and where scenes represent a collection of one or more related shots that, together, tell a cohesive story.

This technology can be used to identify correlated shots to facilitate automatic metadata creation (content recognition) and to reduce noise of false predictions. It can also be used to segment a video into smaller independent sub-videos or to find optimal insertion times for mid-rolls, commercials, and more.