Research

Rendering pixel-accurate semantic segmentation masks over arbitrary-length video in real-time web applications presents significant technical challenges in synchronization, performance, and memory management.

Detailed and versatile annotations, structured activity and process understanding, integration with VLMs, and instruction generation enable and facilitate unparalleled opportunities for real-world robotic workflows and training.

Training our detection architectures on our full Coffee-Making 101 dataset and also a more specific subset, demonstrates that high-quality annotations result in high-accuracy object detectors.

Ramblr's XGBoost-base activity predictor combines a powerful video encoder with a lightweight boosted trees classifier, enabling fast, high-performance activity prediction for domain-specific applications.

Ramblr's GLobal OBject Embedder (GLOBE) leverages binary masks to isolate and encode object regions, enabling robust global identification across time and transformations.

Envision AI systems that respond to spoken, context-specific queries – such as “Do all the windows in this room meet the requirements of the BIM specifications and accessibility standards?” – with real-time, visually grounded answers from an AI-powered device.

A forward-looking research initiative based in Saxony, Germany, and focused on leveraging generative AI and spatial computing to develop multimodal, context-aware assistance systems for virtual and mixed reality.

A fine-tuned MLLM designed to understand spatial and temporal context by analyzing video and sensor data. Our goal is to enable machines to follow and understand complex human instructions in dynamic environments.

Research, Technology, and Projects

Internal Research and Technology

HD Mask Rendering

Coffee-Making 101 (for Robots)

Detector Training

Activity Prediction with XGBoost

GLobal OBject Embedder (GLOBE)

Research Projects

Show2Instruct

MuvAko

REACT