QuickCut: A Speech and Gesture Driven Framework for Automated Video Editing

by Martinraj Nadar | Friday, Mar 27, 2026

Abstract: The growing popularity of social media and short-form video platforms has increased the demand for simple, efficient editing tools. Traditional editors like Adobe Premiere Pro require technical expertise and remain tedious for casual creators. This paper presents QuickCut, a multimodal, speech- and gesture-driven automatic video editing framework that streamlines post-production through AI-based automation. Built on an open, modular Flask pipeline, QuickCut integrates Faster-Whisper, WhisperX, MediaPipe, YOLOv8-Face, OpenCV, and FFmpeg to interpret verbal and visual cues directly from recorded footage. The system performs operations such as cutting, zooming, captioning, face blurring, and visual enhancement without manual intervention. Experimental evaluation shows that QuickCut delivers comparable output quality while reducing editing time and cost by up to 97%, demonstrating its potential to make video editing faster, more accessible, and cost-efficient for creators at all skill levels. Authors: By Christina Pappachan, Ashan Perera, Velibor Adzic & Hari Kalva Conference / Journal 2026 IEEE International Conference on Consumer Electronics (ICCE)