Alternative title: “ResourceT considered harmful”
Summary: ResourceT is a great tool, used to solve real problems when dealing with constrained resources and runtime exceptions. However, in the wild, it is often overused for situations where its full power isn’t needed. If you want more information on ResourceT, check out its README.md.
How do you copy a file in Haskell? Let’s ignore the obvious
answer (System.Directory.copyFile
)
and the cheeky answer:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import System.Exit
import System.Process
main = rawSystem "cp" ["src", "dest"] >>= exitWith
We’ll want to use binary I/O functions of course.
One idea would be to use strict ByteString
versions of
readFile
and writeFile
:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import qualified Data.ByteString as B
main = B.readFile "src" >>= B.writeFile "dest"
Unfortunately, this has the potential to use unbounded memory for large input files. So instead we use lazy I/O:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import qualified Data.ByteString.Lazy as BL
main = BL.readFile "src" >>= BL.writeFile "dest"
Unfortunately, this has a different problem: non-deterministic
resource usage. You see, if there’s some kind of an exception
thrown when writing to dest
, we do not get any
guarantees about when the file descriptor for src
will
be closed. In a program this small, it makes no difference. In a
long lived, multithreaded application, this has the potential to
take down your entire process with file descriptor exhaustion.
All of this is old news to people familiar with streaming data libraries. And as such, you probably won’t be surprised to see me offer another solution to the problem, based on a library I wrote (conduit):
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
main = runConduit $ sourceFile "src" .| sinkFile "dest"
That looks all well and good, but we unfortunately get a compilation failure:
• No instance for (MonadResource IO)
arising from a use of ‘sourceFile’
• In the first argument of ‘(.|)’, namely ‘sourceFile "src"’
In the second argument of ‘($)’, namely
‘sourceFile "src" .| sinkFile "dest"’
In the expression: runConduit $ sourceFile "src" .| sinkFile "dest"
With some squinting and brain power, this starts to make sense. The strict I/O version above avoided a potential file descriptor leak by using potentially unbounded memory. This allowed the file descriptors to be closed promptly. Lazy I/O fixes the memory issue by keeping the file descriptors open longer, possibly leaking them. Conduit is forcing us, at the type level, to solve both. Conduit itself addresses memory usage, but relies on something else—ResourceT—to guarantee that the file descriptors get closed in the case of exceptions.
Fortunately, solving this problem is pretty straightforward:
just use runResourceT
:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
main = runResourceT
$ runConduit
$ sourceFile "src" .| sinkFile "dest"
Or, since this pattern is so common in conduit, we have a built in helper function:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
main = runConduitRes $ sourceFile "src" .| sinkFile "dest"
You’ll see this kind of code all over the place in the conduit world, often in documentation written by me! I’m trying to atone for that sin today.
I had a bit of a sleight of hand above. I told you that the types forced us to use ResourceT, and that’s true. But why, logically, do we need this concept? The reason is as follows:
ResourceT
, which lets
you register cleanup actions that should be run even in the case of
exceptions.Alright, so obviously we need to use ResourceT in order to use
sourceFile
and sinkFile
. And those
functions need to use ResourceT
in order to allocate a
file descriptor inside the conduit pipeline, since they cannot
guarantee that cleanup actions will occur otherwise. Sounds
legit.
But ResourceT is a powerful tool. It allows you to dynamically register new cleanup actions at will. In our situation we don’t actually need such power! Let me demonstrate (note: I’ll show you an easier way to do the same thing a bit later):
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
import System.IO
main =
withBinaryFile "src" ReadMode $ src ->
withBinaryFile "dest" WriteMode $ dest ->
runConduit $ sourceHandle src .| sinkHandle dest
You see, there’s nothing actually dynamic about our resource
allocations. We need to open up two files, one for reading, and one
for writing. We need to guarantee that both of those file
descriptors will be closed in the event of an exception (or normal
termination for that matter). This kind of workflow is well known,
understood, and used in the Haskell world, and that’s why we have
standard functions like withBinaryFile
that performs
all of this. More generally, we refer to it as “the bracket
pattern”, based on the underlying bracket
function
which is used in implementing functions like
withBinaryFile
.
Of course, the code above is not only somewhat tedious, but it’s
error-prone. It’s easy to accidentally swap ReadMode
with WriteMode
. If that sounds contrived, well, ahem,
I’m guilty of it. That was a
good motivation for me to use the ResourceT-based approach in
tutorials until now. However, conduit now boasts some helper
functions that make this much easier and more error-proof:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
main =
withSourceFile "src" $ src ->
withSinkFile "dest" $ dest ->
runConduit $ src .| dest
It’s still more wordy than the
sourceFile
/sinkFile
approach, but I’d
argue that it’s worth the cost to avoid introducing people to
heavyweight approaches they don’t need. I’ll be trying to move in
this direction with future writing and training, not to mention my
own coding.
Alright, so I’ve thrown around that ResourceT is “heavyweight.” But is this actually a problem? I’m going to argue that it is, for multiple reasons:
Performance There is a negligible performance overhead to the bookkeeping required for ResourceT. In general, this hit is small enough to not be that important. However, I’m including it as the first bullet since:
Complexity ResourceT works as a monad transformer, which many people know is a topic I’ve been becoming increasingly leary of. I’ve also seen confusion about the lifetime of values inside ResourceT, which is a point of confusion I haven’t really seen from the bracket pattern.
Overlived resources I’ve seen many bugs in production code pop up because people have used values created from ResourceT which have already been freed. While this is possible with the bracket pattern too, for whatever reason it seems like ResourceT hides that away from people better. As a contrived example, consider this code:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
import Control.Monad.Trans.Resource
import System.IO
main = do
(_, src) <- runResourceT $ allocate (openFile "src" ReadMode) hClose
(_, dest) <- runResourceT $ allocate (openFile "dest" WriteMode) hClose
runConduit $ sourceHandle src .| sinkHandle dest
In this case, both src
and dest
are:
allocate
hClose
is registeredrunResourceT
finishes running, causing the cleanup
to runrunResourceT
And as a less contrived example, I’ve seen many bugs pop up
around how to do this correctly with transPipe
,
e.g.:
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
import Control.Monad.Trans.Resource
import System.IO
main = runConduit
$ transPipe runResourceT (sourceFile "src")
.| transPipe runResourceT (sinkFile "dest")
This last example also demonstrates part of why I shy away from transformers these days too.
There is a type based approach that solves these problems quite well: regions. It was (of course) invented by Oleg. While it works, the idea never really caught on, in my opinion because the cost of juggling the types was too high.
Interestingly, regions isn’t too terribly different in concept to lifetimes in Rust. And perhaps more interestingly, I believe this is an area where the RAII (Resource Acquisition Is Initialization) approach in both C++ and Rust leads to a nicer solution than even our bracket pattern in Haskell, by (mostly) avoiding the possibility of a premature close.
I’ve seen ResourceT advocated as a great way to avoid
asynchronous exception bugs in Haskell. The theory seems to be: if
you use ResourceT, you don’t even need to think about async
exceptions, just use allocate
appropriately and you’re
all set!
I disagree with this. In practice, I think you’ll end up with resources far overliving where they’re needed. And if you’re avoiding learning about async exceptions, I can almost certainly guarantee you’re not handling them correctly. My recommendation is:
bracket
I hope this is enough motivation: don’t use resourcet if you don’t have to. That, of course, leaves one important question.
This blog post is kind of weird. I wrote a library. I maintain the library today. And I’m telling people not to use it. What gives?
ResourceT is an absolutely necessary tool in some cases. My point here is: if you’re not in one of those cases, don’t use it. If you can see a way to solve the problem with bracket-like functions, do that.
The general rule for when you need ResourceT is for dynamic
resource usage. This means that, before you begin processing,
you don’t know how many resources, or which exact resources, you’re
going to need. The best example I know of is a memory-efficient
deep directory traversal. Let’s write a naive program that will get
a list of all files ending in .hs
in a directory
tree.
CHALLENGE See where the memory inefficient part is in the code below before reading my explanation.
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import System.Directory
import System.FilePath
import Data.Foldable (for_)
main :: IO ()
main = start "."
start :: FilePath -> IO ()
start dir = do
rawContents <- getDirectoryContents dir
let contents = map (dir </>)
$ filter (not . hidden) rawContents
hsfiles = filter (fp -> takeExtension fp == ".hs") contents
for_ hsfiles putStrLn
for_ contents $ fp -> do
isDir <- doesDirectoryExist fp
if isDir
then start fp
else pure ()
hidden :: FilePath -> Bool
hidden ('.':_) = True
hidden _ = False
The problem here is the call to
getDirectoryContents
. It will read into memory all of
the entries for the given directory. If there are 1,000,000 files
in a directory, it will take up a few megabytes of memory in
filenames alone. Instead, we’d want an approach where:
.hs
file extension, we print itThe thing is, we need to ensure that each time we open a directory, we also close it. And we don’t know how many layers deep we will be opening directories, or the names of those directories, before we begin. This is a use case where ResourceT usage is a must, and conduit provides some built in functions for performing this task.
#!/usr/bin/env stack
-- stack --resolver lts-12.10 script
import Conduit
import System.FilePath
main :: IO ()
main = runConduitRes
$ sourceDirectoryDeep False "."
.| filterC (fp -> takeExtension fp == ".hs")
.| mapM_C (liftIO . putStrLn)
NOTE Astute readers may note that this problem also has unbounded resource usage, namely we will keep open at maximum a file descriptor for each nested directory. I’m aware of no algorithm that will avoid this cost.
There are certainly other cases of dynamic resource usage that
pop up in the wild. To put things in perspective, however, some
months back I refactored the Stack codebase to remove all usages of
ResourceT
. Even a codebase performing as many
different I/O heavy activities as Stack seems to be free of dynamic
resource allocation.
I debated including this section. Feel free to consider it “extra credit” and skip it.
One of my points against ResourceT is the complexity of using a
monad transformer. However, this is a bit of a red herring. You
could easily come up with a non-monad transformer API. For example,
consider an API where you explicit create and share some
CleanupRegistry
:
withCleanupRegistry $ registry ->
runConduit
$ sourceFile "src" registry
.| sinkFile "dest" registry
One potential downside is that this is somewhat verbose. But
that’s the constant debate around implicit arguments via
ReaderT
versus explicit arguments. There’s a more
fundamental problem here: this API tends to encourage even
more usage of outlived resources.
Above, I demonstrated how transPipe
is often used
in practice to use closed resources. That’s true, but for the most
part the monad transformer nature of ResourceT
prevents that specific problem. However, explicitly passing around
registry values has a high likelihood of encouraging bad
coding.
I don’t have even anecdotal evidence to back this claim up, since I never wrote the resourcet library with that usage in mind. It’s just a suspicion. But it’s a strong enough suspicion that I’ve avoided advertising such an alternative API to resourcet.
ResourceT remains a good tool, and one I’ll recommend, where warranted. However, since writing it, I’ve discovered:
withSourceFile
above), using the bracket pattern instead is not particularly
difficultIf you’ve got use cases that you’re unsure really require ResourceT, feel free to drop a comment below or ping me on Twitter to discuss it. I hope this was helpful!
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.