Early in September, I wrote Quick Storage Stats, a small post about how I wrote a quick dumpy little script to generate stats for the file host that I frequently pretend I do not run.

The script was nice and succinct, which meant, obviously, that I had to make it "worse".

I love graphs. Graphs are great! So, I decided I needed some nice little sparkline style infographs from the stats I can "easily" generate. I also figured that maybe having information the last 30 days, instead of just info from the "all time" end of the timescale.

I had to awk some things out of other programs' outputs, and I openly admit that I grabbed some of the more esoteric awk stuff out of various StackExchange answers. The additions are below.

ls -Atrl /path/to/files | grep "^-" | awk '{
    key=$6" "$7
    freq[key]++
}
END {
for (date in freq)
     printf "%s\t%d\n", date, freq[date]
}' | sort -k1M -k2n | awk '{ p=$3","; print getline == 0 ? $3 : p ORS $3","}' ORS=""

and

find /path/to/files -type f -mtime -30 -exec ls -lt {} \; | awk '{
    key=$6" "$7
    freq[key]++
}
END {
for (date in freq)
        printf "%s\t%d\n", date, freq[date]
}' | sort -k1M -k2n | awk '{ p=$3","; print getline == 0 ? $3 : p ORS $3","}' ORS=""

Let's break the initial commands down, first, shall we?

Before I pipe anything into anywhere, I had to get a list of files. Since I want both everything ever and the last 30 days, I get to run two different commands. First, things first, let's get everything, since we were already doing that.

ls -Atrl /path/to/files | grep "^-" will list all files that are not directories or . and .. (so still no directories) in a information-packed table format, supposedly sorted by the last-modified date and then reversed in order. Trigger-wise, A drops the dot directories, t does the time sort, r reverses the results, and l makes the awk-able table format. I left the grep pipe in there because it further pares down what results are returned, listing only files, as a file results in permissions looking like -rw-rw-rw-, whereas a directory leads with a d, drw-rw-rw-. grep allows you to only return lines that match a given pattern, like ^-, which tells grep to look at specifically the beginning of lines, and only return those that start with the minus character.

In my other command, we want only the most recent 30 days of files to be returned. find /path/to/files -type f -mtime 30 -exec ls -l {} \; does just what I wanted, probably. find /path/to/files -type -f gives me a list of files on that path, but specifically only files. -mtime 30 does the 30 day breakdown, and -exec ls -l {} \; does... something. It definitely formats my list of files in a similar way to ls -l, and I openly admit that I wasn't 100% on the {} \; bit ... or even really like 5%, but per an explanation in an IRC chatroom somewhere in space, {} is whatever stuff find found and \; is to end the -exec. So, now we both know that.

Soooo, now we move on, pretending that that last bit made some sense, ok? Both commands are now piped into awk, so we can do things at and to them. I can't graph stuff without some sort of count of the stuff to be graphed, so this bit of awking does that.

awk '{
  key=$6" "$7
  freq[key] ++
}
END {
  for(date in freq)
    printf "%s\t%d\n", date, freq[date]
}'

OK! So this awk gives me the data printed at positions 6 and 7 seperated by a space, and then using some voodoo witchcraft magic, throws down a tab character and a count of how many files were in BOTH commands for the date it just created. This means I get a nice, neat list that looks something like this:

Oct 8   3
Oct 9   1
Oct 24  6
Oct 15  9
Oct 16  1
Nov 1   1
Oct 26  2
Oct 17  1
Oct 19  5

I get this result for both the full and 30 day list of files. That allows me to further pipe this data along to sort, using sort -k1M -k2n to sort first column one by month, then column two by number, and since it is the day of the month, that works. The third column provides me with nice, clean, data sorted data, so I can keep piping this stuff down the line.

awk '{print $3}' would normally output everything I need, as all my numeric data is in column three of the last awk's output, but, unfortunately, I need it in CSV, since I want to pull it into a comma seperated list for use in a very nice SVG script phuu on GitHub put together for makin' the old sparklines. Ok! Let's get some CSV action going on.

awk '{ p=$3","; print getline == 0 ? $3 : p ORS $3","}' ORS=""

I understand most of this awk, so here goes. the p=$3"," starts off my results by putting just the number of the first line piped to awk by the sort followed by a comma and VERY SPECIFICALLY NO SPACES EVER. Then, thanks to the semi-colon, we start doing more things. A ternary if-type statement that checks to see that the current line (print getline ==0) is not the last line. gettline can return one of three things: 0 if we're on the last line, 1 if we aren't, and -1 if something went very wrong. Unless we're on the last line, or one of the previous pipes is sending the wrong data, we should return a 1. If, as we hope, we're getting a non-zero, positive result, we then print the data from column three followed by another comma. At the end of all of this, we should print JUST the data in the third column, and, thanks to the ORS, which tells awk how to seperate lines, we print ONLY that data, no column.

My data now looks something like this: 3,3,6,2,2,1,1,1,1,1,7,2,1. I pulled down a copy of the SVG files from that GitHub so I don't have to make calls to a third party service (sparksvg.me). The "call", as I've started calling it, goes to https://dumbfileho.st/bar.svg?DATA, and that is why I needed my data in CSV without spaces. The output of my horrible spaghetti code is somethign like this, when combined with the last blog about this:

some honestly really shitty stats.