Clean up your projects before backup

posted on 2013-04-30

Everybody needs to make backups and do so regularly (I use Back in Time). But you don't need to back-up your temporary build files and project build artifacts.

Wanting to try out more Haskell, I decided to give it a try at recursively walking my directories with projects and cleaning them based on available files.

I have decided to call the tool Cyps (Clean your projects), you can find the source in a github repository. You can just checkout the code and run the command to have it traverse any of the subdirectories and run clean commands on each directory. Feel free to fork the repo and add commands and suggestions. The README mentions a few things I would still like to see in the future.

Because it's a simple tool, there is not much more I can type about it. However, as I also used it as a Haskell learning experience, I'll dive into the code.

Recursively walk directories in Haskell

I wanted to use the lazy evaluation of Haskell to create an infinite list of directories in a depth-first manner. The first code I found to walk directories was from Real world haskell. Quoting the code here:

getRecursiveContents :: FilePath -> IO [FilePath]

getRecursiveContents topdir = do
  names <- getDirectoryContents topdir
  let properNames = filter (`notElem` [".", ".."]) names
  paths <- forM properNames $ \name -> do
    let path = topdir </> name
    isDirectory <- doesDirectoryExist path
    if isDirectory
      then getRecursiveContents path
      else return [path]
  return (concat paths)

The forM above will force a listing of the directory, which means there is not much lazy Haskell going on there. Using only the first result

main = do
    contents <- getRecursiveContents "."
    putStrLn (show ( take 1 contents ))

will still take a long time to evaluate, because the full list is created before the take 1 is applied. Furthermore I wanted to only have directories. In short: a tail recursive function where the first element of the list would be found and returned before the rest of the recursion happens. I ended up with the following function:

depthFirstDirectories :: FilePath -> IO [FilePath]
depthFirstDirectories directory = do
    contents <- getDirectoryContents directory
    directories <- filterM doesDirectoryExist contents
    return (directory : filter (`notElem` [".", ".."]) directories)

Because of the lazy evaluation, calling the function with a take 1 on the result

main = do
    contents <- depthFirstDirectories "."
    putStrLn (show ( take 1 contents ))

will only evaluate the first directory listing, and return very quickly.

If you have any comments or questions, feel free to post them below.