Philipp Kant, IOHK

Bobkonf 2017, Berlin

typical dichotomy:

- performance
- work close to bare metal, use c
- safety
- work with abstractions, use haskell

sometimes you need **both**

example: medical application

- need timely answer
- errors can cause real harm

one solution: *write c in haskell*

haskell does expose "the metal":

`Ptr`

,`plusPtr`

,`moveBytes`

, …typically not used in application code

reserved for high-performance libraries, with a high-level API

- as long as the libraries are correct, there will be no buffer overflow

- static types go a long way
- write small functions that fit in your head
- use
`QuickCheck`

when you need **proof**: `LiquidHaskell`

*refinement types*, extension of haskell's type system- statically ensure relations between possible values
- optional step at compile time – zero overhead for library users

Amdahl's Law: \(t_n = t_{0,\text{seq}} + \tfrac{t_{0,\text{par}}}{n}\)

serialisation significant overhead in sequential part

- serialisation: represent data as sequence of bytes
- to save it to files
- to send it to another computer/process

- possible features
- versioning, backwards compatibility
- architecture independence
- cross-language compatibility
- incremental (de)serialisation
- easy to use
- speed

- use case: distributed high performance computing
- typical data: vectors of simple data types, fits in memory
- design goal: speed
- no versioning, fixed architecture, no partial deserialisation and backtracking

store: serialisation to/from strict ByteStrings

efficiency: one allocation per serialisation, no partial results

networking: data arrives in chunks, need streaming

add thin streaming layer on top of store

-- | Copy the contents of a 'ByteString' to a 'ByteBuffer'. copyByteString :: MonadIO m => ByteBuffer -> ByteString -> m () -- | Try to get a pointer to @n@ bytes from the 'ByteBuffer'. -- -- If there are not enough bytes in the ByteBuffer, indicate -- how many bytes are needed consume :: MonadIO m => ByteBuffer -> Int -> m (Either Int ByteString)

type ByteBuffer = IORef BBRef data BBRef = BBRef { size :: !Int , contained :: !Int , consumed :: !Int , ptr :: !(Ptr Word8) }

- best case: segmentation fault
- worst case: data corruption anywhere in the same program

- best case: segmentation fault
- worst case: heartbleed

prove correctness via `LiquidHaskell`

refine data types

type Nat = Int

{-@ type Nat = {v:Int | 0 <= v} @-}

measures

[a]

{-@ measure len :: [a] -> Int len [] = 0 len (x:xs) = 1 + (len xs) @-}

refined function types

head :: [a] -> a tail :: [a] -> [a]

{-@ head: {xs:[a] | len xs >= 1} -> a @-} {-@ tail: xs:[a] -> {xs':[a] | len xs' <= len xs} @-}

`LiquidHaskell`

optional step at compile time

Î»> stack exec -- liquid src/System/IO/ByteBuffer.hs LiquidHaskell Copyright 2009-15 Regents of the University of California. All Rights Reserved. **** DONE: A-Normalization **************************************************** **** DONE: Extracted Core using GHC ******************************************* **** DONE: Transformed Core *************************************************** **** DONE: Uniqify & Rename *************************************************** Working 72% [================================================.................] Done solving. **** DONE: annotate *********************************************************** **** RESULT: SAFE **************************************************************

data BBRef = BBRef { size :: !Int , contained :: !Int , consumed :: !Int , ptr :: !(Ptr Word8) }

{-@ data BBRef = BBRef { size :: Nat , contained :: { v: Nat | v <= size } , consumed :: { v: Nat | v <= contained } , ptr :: { v: Ptr Word8 | (plen v) = size } } @-}

new :: MonadIO m => Maybe Int -> m ByteBuffer new maybel = liftIO $ do let l = fromMaybe (4*1024*1024) maybel newPtr <- Alloc.mallocBytes l newIORef BBRef { ptr = newPtr , size = l , contained = 0 , consumed = 0 }

{-@ mallocBytes :: l:Nat -> IO ({v:Ptr a | plen v == l}) @-}

**** RESULT: UNSAFE ************************************************************ /home/philipp/clones/store/src/System/IO/ByteBuffer.hs:181:15-33: Error: Liquid Type Mismatch 181 | newPtr <- Alloc.mallocBytes l ^^^^^^^^^^^^^^^^^^^ Inferred type VV : {VV : Int | VV == l} not a subtype of Required type VV : {VV : Int | VV >= 0} In Context l : Int

new :: MonadIO m => Maybe Int -> m ByteBuffer new maybel = liftIO $ do let l = max 0 (fromMaybe (4*1024*1024) maybel) newPtr <- Alloc.mallocBytes l newIORef BBRef { ptr = newPtr , size = l , contained = 0 , consumed = 0 }

**** RESULT: SAFE **************************************************************

{-@ unsafeConsume :: MonadIO m => ByteBuffer -> n:Nat -> m (Either Int ({v:Ptr Word8 | plen v >= n})) @-} unsafeConsume :: MonadIO m => ByteBuffer -> Int -> m (Either Int (Ptr Word8)) unsafeConsume bb n = liftIO $ do bbref <- readIORef bb let available = contained bbref - consumed bbref if available < n then return $ Left (n - available) else do writeIORef bb bbref { consumed = consumed bbref + n } return $ Right (ptr bbref `plusPtr` consumed bbref)

{- plen ptr == size >= 0 contained <= size consumed <= contained -}

\begin{align}
&\texttt{available}\; \geq \texttt{n}\\
\Leftrightarrow\; & \texttt{contained}\; - \texttt{consumed}\; \geq \texttt{n}\\
\Rightarrow\; & \texttt{plen p} - \texttt{consumed}\; \geq \texttt{n}\\
\end{align}

copyByteString :: MonadIO m => ByteBuffer -> ByteString -> m () copyByteString bb bs = bbHandler "copyByteString" bb go where go bbref = do let (bsFptr, bsOffset, bsSize) = BS.toForeignPtr bs -- if the byteBuffer is too small, resize it. let available = contained bbref - consumed bbref bbref' <- if size bbref < bsSize + available then enlargeBBRef bbref (bsSize + available) else return bbref -- if it is currently too full, reset it bbref'' <- if bsSize + contained bbref' > size bbref' then resetBBRef bbref' else return bbref' -- now we can safely copy. withForeignPtr bsFptr $ \ bsPtr -> copyBytes (ptr bbref'' `plusPtr` contained bbref'') (bsPtr `plusPtr` bsOffset) bsSize writeIORef bb $ Right BBRef { size = size bbref'' , contained = contained bbref'' + bsSize , consumed = consumed bbref'' , ptr = ptr bbref''}

{-@ enlargeBBRef :: b:BBRef -> i:Nat -> IO {v:BBRef | size v >= i && contained v == contained b && consumed v == consumed b} @-} enlargeBBRef :: BBRef -> Int -> IO BBRef enlargeBBRef bbref minSize= do let getNewSize s | s >= minSize = s getNewSize s = getNewSize $ (ceiling . (*(1.5 :: Double)) . fromIntegral) (max 1 s) newSize = getNewSize (size bbref) ptr' <- Alloc.reallocBytes (ptr bbref) newSize return bbref { size = newSize , ptr = ptr' }

{-@ reallocBytes :: Ptr a -> l:Nat -> IO ({v:Ptr a | plen v == l}) @-}

{-@ resetBBRef :: b:BBRef -> IO {v:BBRef | consumed v == 0 && contained v == contained b - consumed b && size v == size b} @-} resetBBRef :: BBRef -> IO BBRef resetBBRef bbref = do let available = contained bbref - consumed bbref moveBytes (ptr bbref) (ptr bbref `plusPtr` consumed bbref) available return BBRef { size = size bbref , contained = available , consumed = 0 , ptr = ptr bbref }

{-@ moveBytes :: p:Ptr a -> q:Ptr a -> {v:Nat | v <= plen p && v <= plen q} -> IO ()@-}

- safety and performance
- you can have both!
- write c in Haskell
- hide details behind high-level API
`LiquidHaskell`

- prove invariants of your code at compile time
- does not change the code at all
- no run time overhead
- compile time overhead optional
- no need to change client code