Happstack.State – the basics

One of the surprising things about using Happstack for developing web apps is that, unlike most other frameworks out there, it doesn’t use a conventional database to save information. Instead, it implements something it calls a MACID store: an in-memory user-defined data store that provides ACID guarantees (atomicity, consistency, isolation, and durability).

There are several advantages to using MACID instead of a relational database with perhaps an object-relational mapper on top of it. The biggest is that the program can use whatever data structures and data types are most convenient, instead of being forced to stick with things that map nicely to database tables. It also simplifies setup of the application: there’s no need to configure a separate database server to let the application talk to it.

As an example of using Happstack’s MACID store, implemented in the Happstack.State module, the following is a simplified version of the user account directory that I wrote as part of The Button.

Before we get to that, though, let’s take a moment to consider how the program is going to store passwords. Being security-conscious, we aren’t going to store the passwords themselves — if an attacker were to get access to the files backing the MACID store, he would learn the passwords for all users in the system. Instead, we’ll store a one-way hash of each user’s password. Given a password, it’s easy to compute the corresponding hash, but extremely difficult to take a hash and compute the password it was derived from. That way, the program can check if a user’s password matches the one for the account, without ever having to remember what the password actually is.

Actually, it’s not quite that simple. Just using a hash still lets an attacker perform a dictionary attack, perhaps using a rainbow table, if he gets a copy of the hashed passwords. The basic idea is that the attacker would guess a password, hash it, and see if the hash matches the password hash for any users. To foil this, we salt passwords before hashing them, appending some random data before running the hash function. The salt values don’t need to be secret; they just need to be different for each user, to guarantee that even if two users have the same password, the hashes will be different.

And just to ruin the attacker’s day even more, we do key strengthening by running the hash function over and over instead of just once, to increase the amount of work an attacker needs to do to brute-force a password. Since a hash can be computed fairly quickly, the user doesn’t notice if we run the function one time or 100 times. The extra work only becomes noticeable if you’re trying to guess lots of passwords, which slows down an attacker.

OK, enough about password security. Here’s how we’ll represent a hashed password in our program:

data PasswordHash = PasswordHash { pwHash :: [Word8]
                                 , pwSalt :: [Word8]
                                 }
    deriving (Typeable, Data)
 
instance Version PasswordHash
$(deriveSerialize ''PasswordHash)

A PasswordHash is the hash itself, along with the salt used when computing it. The deriving (Typeable, Data) and the rest is part of Happstack.State’s deep magic, which we don’t need to worry about for this example.

The actual hashing of a password is performed by this function, which takes a salt value and the password, and computes its hash:

hashPassword :: [Word8] -> String -> [Word8]
hashPassword salt password =
        let passBytes = listToOctets $ map ord password
        in  (iterate step passBytes) !! iterationCount
    where iterationCount = 100
          step chain = SHA512.hash (chain ++ salt)

That function converts the password into a series of bytes and repeatedly appends the salt and runs it through the SHA-512 hash function, returning the result after doing that 100 times.

That’s enough about passwords for now. Next, let’s consider what information we want to store about each user. For this example, we’ll only care about the user’s password (hashed, of course), along with the time they first registered an account:

data UserInfo = UserInfo { usPassword :: PasswordHash
                         , usJoined   :: ClockTime
                         }
    deriving (Typeable, Data)
 
instance Version UserInfo
$(deriveSerialize ''UserInfo)

The user directory itself will just be a map from user names to the information for each user:

newtype UserDirectory = UserDirectory (M.Map String UserInfo)
    deriving (Typeable, Data)
 
instance Version UserDirectory
$(deriveSerialize ''UserDirectory)
 
instance Component UserDirectory where
    type Dependencies UserDirectory = End
    initialValue = UserDirectory M.empty

Since the user directory is the thing we’re actually going to put into the MACID store, there’s a few extra lines of boilerplate at the bottom to say what other MACID stores it depends on (none) and what the contents of a brand-new store is (empty).

That’s about all there is to defining what goes into the MACID store itself. Next, we need to define the operations that either query the store without changing it, or update the store to a new value.

Let’s start with an easy one: a query that returns a list of users, paired with the date their account was created:

listUsers :: Query UserDirectory [(String, ClockTime)]
listUsers = do
    UserDirectory dir <- ask
    return $ M.toList $ M.map usJoined dir

The type signature is perhaps the most interesting part: Query UserDirectory [(String, ClockTime)]. This is a read-only query (Query) that operates on a UserDirectory and returns a list of (String, ClockTime) pairs. The implementation of the function is simple: get the current value of the store using ask, then iterate over the map to get the user names and dates.

Here’s a more sophisticated query, one that takes a user name and password and sees if the password is correct:

checkPassword :: String -> String -> Query UserDirectory Bool
checkPassword name pass = do
    UserDirectory dir <- ask
    case M.lookup name dir of
        Nothing       -> return False
        Just userInfo -> let PasswordHash hash salt = usPassword userInfo
                         in  return $ hash == hashPassword salt pass

Again, the type signature (String -> String -> Query UserDirectory Bool) shows this is a query on our UserDirectory, this time taking two strings (user name and password) as arguments and returning a boolean value. The code looks up the UserInfo, if any, associated with the user, gets the salt, hashes the password provided as an argument to the query, and returns whether the two match.

Of course, it would help if we had a way to add users to the store. That’s what this update operation does:

addUser :: String -> String -> Update UserDirectory Bool
addUser name pass = do
    UserDirectory dir <- get
    if M.member name dir
        then return False
        else do salt <- newSalt
                now <- liftM fixEventClockTime getEventClockTime
                let passwordHash = PasswordHash (hashPassword salt pass) salt
                let userInfo = UserInfo passwordHash now
                put $ UserDirectory $ M.insert name userInfo dir
                return True

Since this is an update, the type signature has Update in it instead of Query. This means instead of using ask to get the current state, we use get to get the current state and put to set it to a new value. The code itself first checks if a user by the name already exists, and if not, creates a new UserInfo structure with the user’s hashed password and the time that the update was made.

A couple things are worth pointing out. First, inside a Query or Update, we can use the MACID store as a random number generator, which is how a new salt is generated for the user:

instance Random Word8 where
    randomR (lo, hi) rng = let (val, rng') = randomR (fromIntegral lo, fromIntegral hi) rng
                               val :: Int
                           in  (fromIntegral val, rng')
    random rng = randomR (minBound, maxBound) rng
 
newSalt :: AnyEv [Word8]
newSalt = sequence $ take saltLength $ repeat getRandom
    where saltLength = 10

AnyEv is a monad that both Query and Update work with. The newSalt function generates 10 random bytes by first generating an infinite list of monadic computations that return random numbers (repeat getRandom), throwing away all but the first ten of them (take saltLength), and finally combining them into a single monadic computation that returns a list of random bytes (sequence) (instead of a list of monadic computations that each returns one random byte).

Annoyingly, Haskell doesn’t provide a direct way to generate random bytes (of type Word8), so we have to manually specify how Word8 implements the Random typeclass. The implementation is straightforward: to generate a random byte, first generate a random integer, and then convert it to a byte.

Returning to the update function, it’s also worth noting that getEventClockTime returns the time at which the update was logged, not the time the update is executing! This is because the MACID store doesn’t save a new copy of the state to disk every time it changes. Instead, it saves a log of what updates were made. If the program terminates abnormally before the state can be checkpointed, the MACID store recovers by loading the most recent checkpoint and re-executing all the updates since then. This is how the MACID store provides durability without completely thrashing the disk.

The astute reader will note that, if we ever do need to replay the addUser update, newSalt will be executed again! Does this mean that an update that uses the RNG could get different values during recovery? As it turns out, the MACID store also saves the state of the RNG, so replaying the update during recovery will generate the same random numbers that were obtained from the original execution of the update. The particularly astute reader will conclude that the MACID store’s RNG is unsuitable for generating cryptographic keys, since an attacker with access to the MACID store’s backing files can predict what the RNG will return. Fortunately, we don’t care if the salt is predictable, just that it’s different for each user.

OK, enough of that. There’s just one little piece of deep magic we need to convert those functions into honest-to-goodness transaction handlers for the MACID store:

$(mkMethods ''UserDirectory ['addUser, 'checkPassword, 'listUsers])

That bit of magic generates a type constructor for each transaction function. When invoking a transaction, instead of calling the functions themselves, we create an object to represent the request, and then pass it to query or update as appropriate.

Here’s an example of using our new MACID store. The following implements a simple command interpreter that lets us manipulate our user directory:

commandLoop :: MVar TxControl -> IO ()
commandLoop state = do
        putStr "> "
        hFlush stdout
        command <- liftM words getLine
        processCommand state command
    where processCommand state ["list"] = do
                    people <- query ListUsers
                    mapM_ (putStrLn . showUser) people
                    commandLoop state
          processCommand state ["add", user, pass] = do
                    success <- update $ AddUser user pass
                    putStrLn $ if success then "User added" else "User already exists"
                    commandLoop state
          processCommand state ["login", user, pass] = do
                    success <- query $ CheckPassword user pass
                    putStrLn $ if success then "Success" else "Bad account or password"
                    commandLoop state
          processCommand state ["checkpoint"] = do
                    createCheckpoint state
                    commandLoop state
          processCommand _     ["quit"] = return ()
          processCommand state _        = do
                    putStrLn "Unrecognized command"
                    commandLoop state
          showUser (name, joined) = name ++ " (joined " ++ show joined ++ ")"
 
macidProxy :: Proxy UserDirectory
macidProxy = Proxy
 
main :: IO ()
main = startSystemState macidProxy >>= commandLoop

Note how, for example, instead of calling addUser user pass, we do update $ AddUser user pass, where the AddUser type constructor was generated automatically and takes the same arguments as addUser. The update function does all the work of invoking addUser and making sure the result gets saved to the MACID store which enforcing all the ACID guarantees.

Instead of walking through how the command interpreter works, here’s a sample session, which should give you the idea:

> list
> add paul swordfish
User added
> list
paul (joined Sun Apr  5 20:00:01 EDT 2009)
> add cowboy sourMilk7
User added
> list
cowboy (joined Sun Apr  5 20:00:13 EDT 2009)
paul (joined Sun Apr  5 20:00:01 EDT 2009)
> login paul notHisPassword
Bad account or password
> login nobody foobar
Bad account or password
> login paul swordfish
Success
> quit

The files backing the MACID store are, by default, in the _local directory under where you ran the program from: current-0000000000, the initial checkpoint of the (empty) user directory we started with; and events-0000000000, the log of all the updates that were applied. If we had created a checkpoint, it would be in the directory too.

Now, quiz time. Not counting all the security problems I already mentioned in The Button that apply equally well to this example, there is an additional security flaw in the above code that could let an attacker steal passwords with little effort on his part. Your quiz has two questions:

  1. How can the attacker steal passwords from the MACID store?
  2. How can we change the program to prevent him from doing so?

For your convenience, here is Vulnerable.hs, the complete example described in the above post. Assuming you’ve installed GHC 6.10 and Happstack on your computer, just do “runghc Vulnerable.hs” to try the program out yourself.

Next time: the answers to the quiz.

Comments are closed.