Pitfalls of Ruby's memoization pattern
20 Feb 2017@something ||= calculate_it
is so common in Ruby code. You use it often to store something "heavier". But many times it leads to sub optimal performance. You'd expect the value to be cached no matter what. But reality is different as we recently learned in our code, run this and see:
class AdvancedService
def self.something_big
@local_cache ||= begin
puts "Here I am, surprised?"
false
end
end
end
AdvancedService.something_big
AdvancedService.something_big
||=
is so simple to write that often you don't give much thought to it. I'm guilty of this as well. The fix is simple for that:
class AdvancedService
def self.something_big
return @local_cache if defined?(@local_cache)
@local_cache ||= begin
puts "Here I am, surprised?"
false
end
end
end
How many of you remember about adding it there?
Now, what's still wrong with this code? There's a big problem there. You see that's a service.
Check it out:
AdvancedService.something_big
2.times do
Thread.new do
puts AdvancedService.something_big
end
end
What happened here? As AdvancedService
is shared between threads the initialization ran only once (sometimes it can run twice when you delete first line as there's no synchronization between threads). Is that OK?
Sometimes, but I guess not that often. I recently found a code that's supposed to cache this value for the current request. But guess what? It cached it for the whole application. As most of the servers you use with Ruby/Rails are multi-threaded this will be happily shared between all requests.
To solve it you can use Thread.current[:local_cache]
(but remember it's not thread-local but fiber-local, WTF?).
But hey, remember, multi-threaded server - one that re-uses threads? So this fails if you want to have a per-request cache, but don't worry there's a gem for that (and probably dozen more).
I personally cache as a last resort and only if I ran out of different options. But some architectures may require it.
I remember a time when I was working on large Java application with a multitude of DAOs (data access objects) that were querying the database to get single pieces of information (each DAO is responsible for one table usually), the problem was that different parts of the application needed the same information over and over. On the other hand the data was request specific, didn't make sense to keep it always in memory. In cases like that per-request caching is the easiest and safest to use.