Simple Ruby log splitter

The other day I needed to split up a large log file into smaller, more manageable bits. Long ago I wrote a Java based file splitter, and long before that I wrote one in C++ (woohoo iostreams!). But unfortunately, I did not have either of these programs with me.

So, I wrote it in Ruby:

  1. #
  2. # read from $stdin, splitting it into multiple files, based off of MAX_LINES
  3. #
  4.  
  5. FILE_SUFFIX = ARGV[0] ? ARGV[0] : “split.log”
  6. MAX_LINES = ARGV[1] ? ARGV[1].to_i : 500000
  7. lines = 0
  8. fileCount = 0
  9. currentFile = nil
  10.  
  11. $stdin.each do |line|
  12.         if (currentFile == nil)
  13.           fileName = “#{fileCount}_#{FILE_SUFFIX}”
  14.           puts “Writing file “ + fileName
  15.           currentFile = File.open(fileName, “w”)
  16.           fileCount+=1
  17.           lines = 0
  18.         end
  19.  
  20.         currentFile.write line
  21.         lines+=1
  22.  
  23.         if (lines > MAX_LINES)
  24.           currentFile.close
  25.           currentFile = nil
  26.         end
  27. end

Usages:

ruby split.rb < file

ruby split.rb mylogfilename < file

ruby split.rb mylogfilename maxlines < file

This is by far the smallest (and simplest) I have ever gotten the file splitter code, without using extra jars in Java. There are a number of reasons for this, not least of which is the elegant use of closures in Ruby to pass each line, instead of having to do all of the reading directly in my code.

But really, I’m putting this out here to see what ideas or approaches other people might have on how I could have done this better/more efficient, etc. So, please post your thoughts and comments. I would love to see a more elegant approach.