Thanks to Alexander Sotir... (15 Mar 2009)
Thanks to Alexander Sotirov to pushing me to check that the carry chains in donna-c64 were sufficient. I don't know if I realised something when I wrote it which I'm currently missing, or if I just screwed up, but I now believe that they're wrong.
I wrote this Haskell code to check it:
This Haskell code has been written to experiment with the carry chains incurve25519-donna-c64. It's a literate Haskell program, one can load it into
GHCI and play along.
> module Main where
>
> import Data.Bits (shiftR, (.&.))
There are two constants that we'll need.
Our five limbs are, nominally, 51 bits wide, so this is the maximum value of
their initial values.
> twoFiftyOneMinusOne = (2 ^ 51) - 1
2^128 - 1 is the limit of the range of our temporary variables. If we exceed
this at any point, our calculations will be incorrect.
> two128MinusOne = (2 ^ 128) - 1
Now we define a type which mimics our 128-bit unsigned type in C. It's a
disjuction of an Integer and the distinguished value 'Overflow'. 'Overflow' is
contagious: if we try to perform any operations where one or both of the
operands is 'Overflow', then the result is also 'Overflow'.
> data U128 = U128 Integer
> | Overflow
> deriving (Show, Eq)
We make U128 an instance of Num so that we can perform arithmetic with it.
> instance Num U128 where
> (U128 a) + (U128 b) = mayOverflow (a + b)
> _ + _ = Overflow
> (U128 a) * (U128 b) = mayOverflow (a * b)
> _ * _ = Overflow
> (U128 a) - (U128 b) = mayOverflow (a - b)
> _ - _ = Overflow
> negate _ = Overflow
> abs a@(U128 _) = a
> abs _ = Overflow
> signum (U128 _) = 1
> signum _ = 0
> fromInteger = mayOverflow
> instance Ord U128 where
> compare (U128 a) (U128 b) = compare a b
> compare _ _ = EQ
This function lifts an Integer to a U128. If the value is out of range, the
result is 'Overflow'
> mayOverflow :: Integer -> U128
> mayOverflow x
> | x > two128MinusOne = Overflow
> | x < 0 = Overflow
> | otherwise = U128 x
Our field elements consist of five limbs. In the C code, these limbs are
actually uint64_t's, but we keep them as U128's here. We will convince ourselves
that we don't hit any 64-bit overflows later.
> data FieldElement = FieldElement { m0 :: U128, m1 :: U128, m2 :: U128,
> m3 :: U128, m4 :: U128 }
> deriving (Show, Eq)
Now, two helper functions:
This function takes only the bottom 51-bits of a value
> clamp :: U128 -> U128
> clamp (U128 a) = U128 $ a .&. 0x7ffffffffffff
> clamp _ = Overflow
This function drop the bottom 51-bits of a value
> topBits :: U128 -> U128
> topBits (U128 a) = U128 $ a `shiftR` 51
> topBits _ = Overflow
This function simulates the 'fsquare' function in donna-c64, including its carry
chain. If the carry chain is sufficient, then iterating this function for any
valid initial value should never overflow.
> square :: FieldElement -> FieldElement
> square e = result where
> t0 = m0 e * m0 e
> t1 = m0 e * m1 e +
> m1 e * m0 e
> t2 = m0 e * m2 e +
> m2 e * m0 e +
> m1 e * m1 e
> t3 = m0 e * m3 e +
> m3 e * m0 e +
> m1 e * m2 e +
> m2 e * m1 e
> t4 = m0 e * m4 e +
> m4 e * m0 e +
> m3 e * m1 e +
> m1 e * m3 e +
> m2 e * m2 e
> t5 = m4 e * m1 e +
> m1 e * m4 e +
> m2 e * m3 e +
> m3 e * m2 e
> t6 = m4 e * m2 e +
> m2 e * m4 e +
> m3 e * m3 e
> t7 = m3 e * m4 e +
> m4 e * m3 e
> t8 = m4 e * m4 e
>
> t0' = t0 + t5 * 19
> t1' = t1 + t6 * 19
> t2' = t2 + t7 * 19
> t3' = t3 + t8 * 19
>
> t1'' = t1' + topBits t0'
> t2'' = t2' + topBits t1''
> t3'' = t3' + topBits t2''
> t4' = t4 + topBits t3''
> t0'' = t0' + 19 * topBits t4'
> t1''' = clamp t1'' + topBits t0''
At this point, we implement two carry chains. If 'currentChain' is true, then we
implement the carry chain as currently written in donna-c64. Otherwise, we
perform an extra step and carry t1 into t2.
> result = if currentChain
> then FieldElement (clamp t0'') t1''' (clamp t2'') (clamp t3'')
> (clamp t4')
> else FieldElement (clamp t0'') (clamp t1''') t2''' (clamp t3'')
> (clamp t4') where
> t2''' = clamp t2'' + topBits t1'''
This is the maximum initial element: an element where all limbs are 2^51 - 1.
Inspection of the 'fexpand' function should be sufficient to convince oneself of
this.
> maxInitialElement :: FieldElement
> maxInitialElement = FieldElement twoFiftyOneMinusOne twoFiftyOneMinusOne
> twoFiftyOneMinusOne twoFiftyOneMinusOne
> twoFiftyOneMinusOne
This function takes two field elements and returns the worst case result: one
where the maximum of each limb is chosen.
> elementWiseMax :: FieldElement -> FieldElement -> FieldElement
> elementWiseMax x y = FieldElement (f m0) (f m1) (f m2) (f m3) (f m4) where
> f :: (FieldElement -> U128) -> U128
> f accessor = max (accessor x) (accessor y)
We now define a series of values generated by squaring the previous element and
setting any limb that is less than the maximum to the maximum value.
> maxSeries = iterate (elementWiseMax maxInitialElement . square)
> maxInitialElement
This value controls which carry chain is used in 'square', the current one or
the one with the extra carry
> currentChain = True
By running this, we can see that the current carry chain is insufficient for
this simulation:
ghci> maxSeries !! 4
FieldElement {m0 = Overflow, m1 = Overflow, m2 = Overflow, m3 = Overflow,
m4 = Overflow}
The series overflows after only four iterations. However, if we use the
alternative carry chain, the series is stable far beyound the requirements of
the Montgomery ladder used in donna-c64:
ghci> maxSeries !! 100000
FieldElement {m0 = U128 2251799813685247, m1 = U128 2251799813685247,
m2 = U128 2251799813685247, m3 = U128 2251799813685247,
m4 = U128 2251799813685247}
Additionally, these values are small enough not to overflow the 64-limb limbs.