Database with all XKCD comics
I've decided to take a large collection of images and create a mega image to scroll through using LeafJS. The plugin is extremely simple and if I create files that mimic the OpenStreetMap API I should be fine with just adding the correct URL. One problem, I need a large collection of images.
One thing I could come up with is downloading a comic that doesn't really mind being downloaded and republished with the right attribution: XKCD.
The images have random names, but you can find the name using the json metadata they host for every comic. The following Haskell script downloads the metadata and outputs the filename to the stdout.
#!/usr/bin/runhaskell
{-# LANGUAGE OverloadedStrings #-}
--From http://xkcd.com/about/ http://xkcd.com/614/info.0.json
import Network.HTTP
import Data.Aeson ((.:), (.:?), decode, FromJSON(..), Value(..))
import Control.Monad (mzero)
import Control.Applicative ((<$>))
import qualified Data.ByteString.Lazy.Char8 as BS
data XkcdMeta = XkcdMeta { img :: String } deriving (Show)
instance FromJSON XkcdMeta where
parseJSON (Object v) = XkcdMeta <$>
v .: "img"
-- A non-Object value is of the wrong type, so fail.
parseJSON _ = mzero
getUrlOrEmptyJsonString url = do
result <- Network.HTTP.simpleHTTP (getRequest url)
case result of
Left _ -> return "{\"img\": \"\"}"
Right response -> return (rspBody response)
decodeOrEmpty json = case decode (BS.pack json) :: Maybe XkcdMeta of
Just xkcd -> img xkcd
Nothing -> ""
jsonUrls = map (\x -> "http://xkcd.com/" ++ (show x) ++ "/info.0.json") [1,2..1169]
downloadAndShowImageUrl url = do
json <- getUrlOrEmptyJsonString url
putStrLn $ (decodeOrEmpty json)
download = mapM_ downloadAndShowImageUrl
main = download jsonUrls
Second part of the equation is downloading all the files and extracting their size to a json database. Because it dependens on libgd for graphics, I've created a small cabal project with the the dependencies described in it. The only source file extract the size data of each image and outputs a json record.
module Main (main) where
import System.Environment (getArgs)
import Graphics.GD (loadPngFile, loadJpegFile, imageSize)
import Data.Aeson (ToJSON(toJSON), encode, object, (.=))
import qualified Data.ByteString.Lazy.Char8 as BS
import qualified Data.Text as T
import Data.List (isSuffixOf)
data ImageMeta = ImageMeta {
filename :: String,
index :: Int,
pixelWidth :: Int,
pixelHeight :: Int
} deriving (Eq, Show)
instance ToJSON ImageMeta where
toJSON (ImageMeta filename index pixelWidth pixelHeight) = object [
(T.pack "index") .= index,
(T.pack "filename") .= filename,
(T.pack "pixelWidth") .= pixelWidth,
(T.pack "pixelHeight") .= pixelHeight ]
loadImage filename = if isSuffixOf ".png" filename
then loadPngFile filename
else loadJpegFile filename
dumpMetaJsonOfFileAt (index, filename) = do
img <- loadImage filename
size <- imageSize img
let metadata = ImageMeta filename index (fst size) (snd size)
putStrLn $ BS.unpack (encode metadata)
-- Each commandline argument is considered a png file to load and dump about
main = do
args <- getArgs
mapM_ dumpMetaJsonOfFileAt (zip [1,2..] args)
The complete source and database with all images can be downloaded here: xkcd_comics_db.tar.xz.
It contains XKCD 1 to XKCD 1169. The comics.json
file only contains PNG and JPEG images, I've decided to drop all GIF images at that point.
Next step will be to create a program to implement any kind of Packing algorithm. As soon as I have, I'll write another post. In the meantime, I hope somebody can enjoy the database and the code snippets and Learn You a Haskell for Great Good!