How Databricks is turning video into searchable, actionable intellige…

A utility company, police department, and urban planning team generate terabytes of video data daily from sources like drone inspections and traffic cameras, but almost none gets analyzed due to the time and expense of combing through unstructured video. Databricks treats video as a data engineering problem, allowing users to apply natural language queries to video content at scale. In a Databricks app, a user uploads a video or points to one in a Databricks Volume, enters a natural language prompt (e.g., white box trucks, security guards, solar panels), and initiates processing. Databricks Serverless GPU Compute triggers a Lakeflow job, which uses Meta's SAM3 segmentation model to identify objects matching the prompt in each frame. The video is truncated to only relevant moments; for example, a 26-minute traffic camera video was reduced to one minute and 55 seconds of relevant footage, with original timestamps preserved. Each truncated clip is then passed to a foundation model via the Databricks Foundation Model API for AI-generated analysis.