Extracting images from an MHT file
If you have to work with Windows and need to make screenshots, you might end up using the problem step recorder (Run →
psr). After recording your steps you will be able to save a ZIP file which will contain a MHT file.
Because I'm using Linux and did not have a MHT viewer, I decided to create a small program to grab the screen shots out of the MHT file for me. The Python code in the script does not use the Python MIME extensions (which would make it even better) and is really only there to grab the screen shots. If you decide to hack a better version, please leave a comment.
You can download the source script here. Here is a general overview of what the source does.
mhtFileName = args fp = open(mhtFileName, "r") contents = fp.read()
This will read the complete MHT file into memory, so make sure your MHT file is not to large.
boundry = re.compile("--=_NextP.*\n") locationFieldRe = re.compile("Content-Location: (.*)\n")
Yes, I use a regular expression to find the MIME boundaries and pull out the filename of the screenshot.
I try to make a directory
out to place the files. If it already exists, the script will simply fail at this point.
parts = boundry.split(contents) for part in parts: if "Content-Transfer-Encoding: base64" in part: location = locationFieldRe.search(part).group(1).strip() body = None for line in part.splitlines(): if body != None: body += line continue if line.strip() == "": body = "" of = open(os.path.join("out", location), "w") value = base64.b64decode(body) of.write(value) of.close()
Split the file into
parts using the
boundry and if it contains something that is
base64 encoded, extract the location and store the binary version using
If you call the script with a MHT file as the first argument, you should get an
out directory with screen shots in it:
python mht_extract.py Problem_20130403_1119.mht