File rotation in Bash

posted on 2013-10-23

This blog post explores the ways I tried to do file rotation in Bash and finally found a solution.

Problem introduction

The problem I have is quite specific. I've written a program that collects URLs from twitter messages. Now I want to devote about 100MB of my disk space to keeping track of the latest URLs streaming by in the Twitter status sample.

Having written a simple program to do that, I do not want to incorporate it as a feature, but more generally solve it with a Bash wrapper script (because that would be a quick solution).

Because I can't expect everybody to compile twitterLinkScraper from source, I'll replace it with strings /dev/urandom in this post.

Attempt 1: GNU coreutils split

The first idea was to use split (split, GNU coreutils, 8.20). Which seems like the obvious choice:

strings /dev/urandom | split --suffix-length=1 --bytes=100 - urls

This will create urlsa to urlsz files, but then stop with split: output file suffixes exhausted

I would like split to overwrite the first part if the suffixes are exhausted, but there doesn't seem to be an option for that.

Attempt 2: Apache rotatelogs

Apache utils (2.2.22) contains a program called rotatelogs: Piped logging program to rotate Apache logs. It allows you to create files with a given pattern and rotate at a given time or size. Let's try that with a file for every weekday:

strings urandom | rotatelogs %w.urls 1M

Thiw will rotate at every megabyte. However, it does not truncate the output file when it opens it. Which means that after the first week, you will be appending extra links. Because I need something that will not grow above a limit, rotatelogs will not do.

Attempt 3: redirect Bash output half-way

Then I thought: maybe I can change the output half way during a bash script. For example:

#!/bin/bash
exec > a.txt
echo a
exec > b.txt
echo b
echo c

will create two files. a.txt will contain the output of echo a and b.txt will contain the output of both echo b and echo c.

Now let's try to start a program and switch half way:

#!/bin/bash
maxSize=10
index=0
exec > ${index}.urls
strings /dev/urandom &

#Rotate output
for index in {0..2}; do
    currentDumpFile="${index}.urls"
    exec > ${currentDumpFile}
    while [[ "`stat -c %s "${currentDumpFile}"`" -lt "${maxSize}" ]]; do
        sleep 1
    done
done
wait

Note: if you run this script, make sure you killall strings after hitting Ctrl-C, because it will keep running in the background.

First we redirect the output to 0.urls. We then start a loop to check for it's size, and when it reaches the maximum maximum size, we try to change the output file using exec > ${currentDumpFile}.

This will create the second output file, but because the output of strings spawned earlier is already tied to exec > 0.urls we will not see the output be appended to the second output file. A real shame, but obvious when you think about it.

I gave it another shot by trying things like job control to switch back and forth, but to no avail.

Success 4: Use Bash to read each line and then rotate

Looping through lines is possible in Bash using read when piping the output to a while loop:

#!/bin/bash
maxLineCount=10
i=0
lineCount=0
strings /dev/urandom | \
while read LOGLINE; do
    currentDumpFile="${i}.txt"
    #Switch to new output?
    if [[ "${lineCount}" -gt "${maxLineCount}" ]]; then
        let i=i+1
        if [[ "${i}" -gt "3" ]]; then
            i=0
        fi
        lineCount=0
        currentDumpFile="${i}.txt"
        truncate --size 0 "${currentDumpFile}"
    fi
    echo "$LOGLINE" >> "${currentDumpFile}"
    let lineCount=lineCount+1
done

We run the strings command and pipe the output to what is on the next line | \. The next line contains a while loop which uses read to read a single line into the LOGLINE variable. Which we append to the log output file. If we switch files, we make sure to truncate the current file (otherwise you would have the rotatelogs behavior).

You could also do size based checks, like in attempt 3, which I leave as an exercise to the reader.

Success 5: Use mkfifo to create a fifo file

Fifo stands for first in, first out. The concept is creating a special file (using mkfifo) which will block when written to until somebody reads from the file. This means that we don't have to spawn a process for each and every log line we write, only a single process at each rotation.

#!/bin/bash
maxLineCount=100
if [[ ! -e "logfifo" ]]; then
    mkfifo "logfifo"
fi

strings /dev/urandom > logfifo &

while true;do
    for i in {1..10}; do
        currentFile="${i}.txt"
        head -n "${maxLineCount}" logfifo > "${currentFile}"
    done
done
wait

First we make the fifo file if it's not there, then we spawn a background process which stdout we connect to that fifo. Then, we simply use head to take the first maxLineCount from the fifo and put them in the logfile.