Extracting images from an MHT file

posted on 2013-04-16

If you have to work with Windows and need to make screenshots, you might end up using the problem step recorder (Run → psr). After recording your steps you will be able to save a ZIP file which will contain a MHT file.

Because I'm using Linux and did not have a MHT viewer, I decided to create a small program to grab the screen shots out of the MHT file for me. The Python code in the script does not use the Python MIME extensions (which would make it even better) and is really only there to grab the screen shots. If you decide to hack a better version, please leave a comment.

You can download the source script here. Here is a general overview of what the source does.

mhtFileName = args[1]
fp = open(mhtFileName, "r")
contents = fp.read()

This will read the complete MHT file into memory, so make sure your MHT file is not to large.

boundry = re.compile("--=_NextP.*\n")
locationFieldRe = re.compile("Content-Location: (.*)\n")

Yes, I use a regular expression to find the MIME boundaries and pull out the filename of the screenshot.

os.mkdir("out")

I try to make a directory out to place the files. If it already exists, the script will simply fail at this point.

parts = boundry.split(contents)
for part in parts:
    if "Content-Transfer-Encoding: base64" in part:
        location = locationFieldRe.search(part).group(1).strip()
        body = None
        for line in part.splitlines():
            if body != None:
                body += line
                continue
            if line.strip() == "":
                body = ""
        of = open(os.path.join("out", location), "w")
        value = base64.b64decode(body)
        of.write(value)
        of.close()

Split the file into parts using the boundry and if it contains something that is base64 encoded, extract the location and store the binary version using b64decode.

If you call the script with a MHT file as the first argument, you should get an out directory with screen shots in it:

python mht_extract.py Problem_20130403_1119.mht