Image by Gratisography

I host a useless file uploader that I absolutely do not want associated with my real name so I am not linking it on this blog, but my last several small projects have been devoted to making the thing do what I ask it to do and also to make sure I can host it somewhat safely.

As such, this is a post about how I get clamav, a nice (read: free) antivirus that will do the virus scans on a Linux box with minimal fuss.

The final outcome of this little project looks like this, followed by a somewhat line-by-line explanation:

#!/bin/bash
# Remove the viruses from {DIRECTORY} and
# report them to {FILE}

FILE_HOME="/path/to/file"

# {FILE} lives in a directory above where my {DIRECTORY} stuff is,
# so I pushd there and use relative references to get into {DIRECTORY}
pushd $FILE_HOME
clamscan $FILE_HOME/things/to/scan > virus.recent
cat virus.recent | cat - virus.all | sponge virus.all
cat virus.recent | grep FOUND | sed -i 's/{DIRECTORY}(\w+\.\w+)/\1/' | cat - virus.found | sponge virus.found
cp virus.found $FILE_HOME/path/to/report/removed.txt
cp virus.recent $FILE_HOME/path/to/report/scan.txt
cp virus.all $FILE_HOME/path/to/report/all.txt

popd

So let's imagine everything that I am trying to see and display lives under /home/example/dir, which is why we pushd /home/example/dir. The clamscan $FOLDER > virus.recent command string lets me run the clamav virus scanner, then pipe the resulting scan report into a text file called virus.recent.

From here, we prepend that virus scan result file into a secondary file I have named virus.all, which, as you might suspect, contains all of my virus scan reports... hence the all. The cat - file | sponge file will be explained soon. Maybe.

After I get done the busy work, I make some regrettable choices and perform some trashy regex, using sed... specifically, I pipe the output of the virus.recent file into grep, which I have searching for the word "FOUND" which, if you check out this pastebin, you can see why. The virus scanner output when there is a virus found looks something like filename.ext: virusname FOUND, so I look for that FOUND (rather than the removed, which wouldn't tell me what the reason for the removal was, so...) and do some quick sed themed magic to it to remove the file path... because I know where my files live and, really, no one else needs to know that.

OK, so I have all of the most recently found virus files, what the virus was, no path (thanks to sed's voodoo witchcraft magic time) and now we are ready to vomit that nonsense into a file... except I want ALL of the viruses that I have found, so far, not just the viruses that I just found. So, using cat, I push the contents of my sed output, using the - part of my cat - virus.found, into STDOUT, along with the existing contents of the file I created while you weren't looking, virus.found. Then, using sponge, I "sponge" up all of what just got hurled into STDOUT -- my newest virus files and all of the files that were already listed in virus.found!

My copies just move my three virally themed files into easily-accessed locations (read: web facing), so I can quickly check to see if there is any weirdness going on with my uploader... which there usually is because people are dumb and I keep telling myself to make it private but private file hosts make for boring blog posts.

Here is a more commented copy of my file (imaginatively named virus.sh, but it is literally the same code as above, so just... I don't know, don't complain to me when you realize that, I guess.

#!/bin/bash
# Remove the viruses from {DIRECTORY} and
# report them to {FILE}

FILE_HOME="/path/to/file"

# {FILE} lives in a directory above where my {DIRECTORY} stuff is,
# so I pushd there and use relative references to get into {DIRECTORY}
pushd $FILE_HOME

# Do the virus scannin' and then pipe the report into virus.recent
clamscan $FILE_HOME/things/to/scan > virus.recent

# I would like ALL of my virus reports in the same place, so we put the info in 
# virus.recent into STDOUT, followed by the contents of virus.all, then suck all
# of that data up into virus.all again (with a sponge).
cat virus.recent | cat - virus.all | sponge virus.all

# OK. So remember that time we pushed stuff from a file to STDOUT? First we have 
# to do some things to it so we get only the stuff we want... specifically, we
# only want lines from virus.recent that have the word "FOUND", because that line
# will tell me about the virus. SED out the filepath, because SECURITY BY OBSCURITY,
# then do the same thing I did with virus.all, above.
cat virus.recent | grep FOUND | sed -i 's/{DIRECTORY}(\w+\.\w+)/\1/' | cat - virus.found | sponge virus.found

# This is just me moving files to a web facing directory so I can double check them 
# without remoting into my server.
cp virus.found $FILE_HOME/path/to/report/removed.txt
cp virus.recent $FILE_HOME/path/to/report/scan.txt
cp virus.all $FILE_HOME/path/to/report/all.txt

# Just in case, always `popd` back. The `d` is silent. :)
popd

As usual, the image(s) used in this post (if there were any) are from Gratisography.