The other day I needed to split up a large log file into smaller, more manageable bits. Long ago I wrote a Java based file splitter, and long before that I wrote one in C++ (woohoo iostreams!). But unfortunately, I did not have either of these programs with me.
So, I wrote it in Ruby:
-
#
-
# read from $stdin, splitting it into multiple files, based off of MAX_LINES
-
#
-
-
FILE_SUFFIX = ARGV[0] ? ARGV[0] : “split.log”
-
MAX_LINES = ARGV[1] ? ARGV[1].to_i : 500000
-
lines = 0
-
fileCount = 0
-
currentFile = nil
-
-
$stdin.each do |line|
-
if (currentFile == nil)
-
fileName = “#{fileCount}_#{FILE_SUFFIX}”
-
puts “Writing file “ + fileName
-
currentFile = File.open(fileName, “w”)
-
fileCount+=1
-
lines = 0
-
end
-
-
currentFile.write line
-
lines+=1
-
-
if (lines > MAX_LINES)
-
currentFile.close
-
currentFile = nil
-
end
-
end
Usages:
This is by far the smallest (and simplest) I have ever gotten the file splitter code, without using extra jars in Java. There are a number of reasons for this, not least of which is the elegant use of closures in Ruby to pass each line, instead of having to do all of the reading directly in my code.
But really, I’m putting this out here to see what ideas or approaches other people might have on how I could have done this better/more efficient, etc. So, please post your thoughts and comments. I would love to see a more elegant approach.
Stephen | 25-Aug-06 at 11:51 pm | Permalink
cat foo.txt | split -l 100
thoughts.on.code :: A Ruby DSL for splitting files | 27-Aug-06 at 11:02 pm | Permalink
[...] Tonight Blaine sent me his version of my Simple Ruby Log splitter, that took it a giant step further and made a reuseable library to split files. He used an object oriented approach, as well as provided extensibility by allowing the caller to pass in a block that determined when to split the file. Hopefully he will post the code either on my site or his. Good stuff. [...]
Groovy Log Splitter « Groovy Bar & Grill | 29-Aug-06 at 8:46 pm | Permalink
[...] Matt and Blaine are having fun writing Ruby log splitters. I couldn’t pass up the opportunity to write one in Groovy, so here it is: [...]
Jeff | 29-Aug-06 at 8:58 pm | Permalink
My humble Groovy version here:
https://groovybargrill.wordpress.com/2006/08/29/groovy-log-splitter/
erik scheirer | 06-Jun-08 at 11:41 am | Permalink
Thanks indeed, what a lovely and useful post for those of us who ‘code by google’ !