Recently, I had to generate some box-and-whisker plots to present results of my work on my master thesis. Some google search revealed that there are no complete solutions to fully fit my expectations and needs. My colleague, Stefan, presented a solution on his blog, but that script still wasn't what I was looking for.
I wanted to have a script that, naturally, will present a set of results of an experiment. Stefan used quartiles and median, I liked standard deviation and average values more. Moreover, I wanted a script to be configurable at least by passing a path to a directory, where files with results are situated.
And that's how my funky-fresh-and-supercool-Ruby-script was born. Let me guide you through, as I think, most interesting parts of it. If you only need to se the whole script, you should scroll this post down, where you'll find a link to download it :-).
I decided that I'll allow the user to pass two arguments - first is the directory (explained earlier) and second one is optional and it's a filename pattern (if it's not present, default is "*.txt"). Script simply chooses only these files from given directory, which match the specified filename pattern. Thanks to the Ruby's magic, it's done simply as that:
# changing directory to givenWell, I know that Process.exit! may not be the nicest way to exit a script, but it works like a charm ;-).
Dir.chdir(directory)
# finding
files = Dir.glob(filename_pattern)
if files.length == 0
puts 'There are no files matching specified filename pattern'
Process.exit!
end
...
files.each do |file|
...
end
Result files that this script process have to be in a specific format:
iteration_number valuefor every line. For example:
1 234.6and so on.
2 324.3
3 4.55
My algorithm generates results in an untypical way - the result files may not have same number of lines - that's why I perform a check of number of lines of every file and getting a minimum number from them - that way we're sure that every iteration has the same number of values to process.
Next, script calculates the values: mean, standard deviation and keeps global minimum and maximum of results, to be used and explained later.
Two interesting things here:
1) because of using
Dir.chdir(directory)earlier, we're still inside this directory, so invoking
output_file = File.new('output.dat', 'w')will result in creating a file inside this directory.2) thanks to Drew Olson's "5 things you can do with a Ruby array in one line" I came up with these lines:
sum = tab.inject { |s, item| s + item }for summing up the array andvariance = ((tab.map { |item| (item - average) ** 2 }).inject { |s, item| s + item }) / file_counterfor computing variance. Nice, huh? :)I decided to print out the last two values: average and standard deviation. I needed them to compare my algorithm's best results for different parameters. Nevertheless - comment them out or just delete if you don't need them.
Now, to create a nice plot (I needed logarithmic scale for the lowest, latest values), I wrote a simple piece of code to find a range of two numbers being powers of ten closest to the given value. Ok, not so obvious, I know, but I think this code explains what I did:
first_bigger = 10.0As you can see, if global minimum is, for example, 0.5, this code produces a range of (0.1, 1) and if global minimum is 45 - it gives (10, 100). Currently, I'm only using the first_smaller value, but first_bigger may also be useful for modifications, so I leave it there.
first_smaller = 1.0
while !(first_bigger > global_minimum && first_smaller < global_minimum)
if global_minimum > first_bigger
first_bigger *= 10.0
first_smaller *= 10.0
else
first_bigger /= 10.0
first_smaller /= 10.0
end
end
Finally, producing the gnuplot script file which inside looks like that:
set terminal png size 1280,1024Of course, #{value} are replaced with proper values. For explanation of all these enigmatic gnuplot options (maybe except output image resolution ;-)) I have to send you to the gnuplot documentation page.
set output "output.png"
set boxwidth 0.2 absolute
set yrange [ #{first_smaller} : #{global_maximum.ceil} ]
set xrange [ 0 : #{min_lines + 10} ]
set log y
plot 'output.dat' every 5 using 1:3:2:6:5 with candlesticks lt 3 lw 2 notitle whiskerbars, '' using 1:4:4:4:4 with candlesticks lt -1 lw 2 notitle
And that's it! Script assumes that the user has RW rights to the given directory, so be sure you set them. It produces two files: output.dat and gnuplot_script. The format of first of them is:
iteration_number minimum_value average-standard_deviation standard_deviation average+standard_deviation maximumfor every line.
To make gnuplot create outpug.png file with the plot, simply go to the directory you've passed to the script and type
?> gnuplot gnuplot_scriptYou can download this script from here - whatever you want to do with it - you can. The only thing I ask for is some adnotation from where you got it.
Voila! :)
0 Response for the "Box and Whisker script"
Post a Comment