High speed ffmpeg cluster encoding with Python and avidemux

When it comes to clustered video codec conversion there are two general scenarios:

Scenario 1: Encoding many videos across many computers
Scenario 2: Encoding a single video across computers

Scenario 1 is ubiquitous and most encoding clusters are likely running at full steam with a backlog of videos waiting in queue. Scenario 2 is less common and useful with deadlines, where concertedly converting a single video across your cluster would reduce time tremendously.

I searched the google cavern for scenario 2 and didn’t find any existing ffmpeg cluster implementations so I spent my Sunday afternoon writing a python script to do just that.  Now, using the 4 pcs at home I’m converting a single video 300% faster.  So how does it work?  In a sentence, I split the encoding into ffmpeg tasks (using –ss and –t), distribute the tasks to my cluster, and copy the parts into the final version using avidemux (–append and –rebuild-index).   Is it perfect?  Probably far from it.  But as a first draft it worked great.  I tested several sources and formats and the video/audio merged seamlessly and in sync.  The code has no error catching and you may need to massage the code to work in your setup.  I’ll work on a second draft converting to h.264 instead of flv.

# Version 0.1
# Big todo is adding error catching

import sys
import os
from re import search
from subprocess import PIPE, Popen

#configure the two parameters below
#1. The name of all the hosts in the cluster that will participate
hostList = ['one', 'two', 'three', 'four']
#2. The NFS mounted dir which contains the video you need encoded
encodeDir = "/net/ffcluster"

#Function definitions
def getDurationPerJob(totalFrames, fps):
return totalFrames / float(fps) / len(hostList)

def getFps(file):
information = Popen(("ffmpeg", "-i", file), stdout=PIPE, stderr=PIPE)
#fetching tbr (1), but can also get tbn (2) or tbc (3)
#examples of fps syntax encountered is 30, 30.00, 30k
fpsSearch = search("(\d+\.?\w*) tbr, (\d+\.?\w*) tbn, (\d+\.?\w*) tbc", information.communicate()[1])
return fpsSearch.group(1)

def getTotalFrames(file, fps):
information = Popen(("ffmpeg", "-i", file), stdout=PIPE, stderr=PIPE)
timecode = search("(\d+):(\d+):(\d+).(\d+)", information.communicate()[1])
return ((((float(timecode.group(1)) * 60) + float(timecode.group(2))) * 60) + float(timecode.group(3)) + float(timecode.group(4))/100) * float(fps)

def clusterRun(file, fileName, durationPerJob, fps):
start = 0.0
end = durationPerJob
#submits equal conversion portions to each host
for i in hostList:
runCount += 1
runFfmpeg = "ssh %s 'cd %s;ffmpeg -ss %f -t %f -y -i %s %s </dev/null'" % (i, encodeDir, start, end, file, fileName + "_run" + str(runCount) + ".flv")
start += end + 1/float(fps)
jobList.append(Popen(runFfmpeg, shell=True))
#wait for all jobs to complete
for i in hostList:
runCount += 1
#append/rebuild final from parts and rebuild index
avidemuxHead = "avidemux2_cli --autoindex --load %s_run1.flv --append %s_run2.flv " % (fileName, fileName)
avidemuxTail = "--audio-codec copy --video-codec copy --save %sFinal.flv" % (fileName)
#add --appends for additional host above the first 2
for i in range(len(hostList)- 2):
avidemuxHead = "%s --append %s_run%d.flv " % (avidemuxHead, fileName, i+3)
runAvidemux = "%s %s" % (avidemuxHead, avidemuxTail)
Popen(runAvidemux, shell=True)

#Main begin
sourceFile = sys.argv[1]
fps = getFps(sourceFile)
totalFrames = getTotalFrames(sourceFile, fps)
durationPerJob = getDurationPerJob(totalFrames, fps)
fileName = os.path.splitext(sourceFile)[0]

clusterRun(sourceFile, fileName, durationPerJob, fps)