Monday, August 21, 2023

What does it mean for a monad to be strong?

This is something I put on github years ago but I probably should have put it here.


Here's an elementary example of the use of the list monad:


> test1 = do
>   x <- [1, 2]
>   y <- [x, 10*x]
>   [x*y]


We can desugar this to:


> test2 = [1, 2] >>= \x -> [x, 10*x] >>= \y -> [x*y]


It looks like we start with a list and then apply a sequence (of length 2) of functions to it using bind (>>=). This is probably why some people call monads workflows and why the comparison has been made with Unix pipes.


But looks can be deceptive. The operator (>>=) is right associative and test2 is the same as test3:


> test3 = [1, 2] >>= (\x -> [x, 10*x] >>= \y -> [x*y])


You can try to parenthesise the other way:


> -- test4 = ([1, 2] >>= \x -> [x, 10*x]) >>= \y -> [x*y]


We get a "Variable not in scope: x" error. So test1 doesn't directly fit the workflow model. When people give examples of how workflow style things can be seen as monads they sometimes use examples where later functions don't refer to variables defined earlier. For example at the link I gave above the line m >>= x-> (n >>= y-> o) is transformed to (m >>= x-> n) >>= y-> o which only works if o makes no mention of x. I found similar things to be true in a number of tutorials, especially the ones that emphasise the Kleisli category view of things.


But we can always "reassociate" to the left with a little bit of extra work. The catch is that the function above defined by y-> ... "captures" x from its environment. So it's not just one function, it's a family of functions parameterised by x. We can fix this by making the dependence on x explicit. We can then pull the inner function out as it's no longer implicitly dependent on its immediate context. When compilers do this it's called lambda lifting.


Define (the weirdly named function) strength by


> strength :: Monad m => (x, m y) -> m (x, y)
> strength (x, my) = do
>   y <- my
>   return (x, y)


It allows us to smuggle x "into the monad".


And now we can rewrite test1, parenthesising to the left:


> test5 = ([1, 2] >>= \x -> strength (x, [x, 10*x])) >>= \(x, y) -> [x*y]


This is much more like a workflow. Using strength we can rewrite any (monadic) do expression as a left-to-right workflow, with the cost of having to throw in some applications of strength to carry along all of the captured variables. It's also using a composition of arrows in the Kleisli category.


A monad with a strength function is called a strong monad. Clearly all Haskell monads are strong as I wrote strength to work with any Haskell monad. But not all monads in category theory are strong. It's a sort of hidden feature of Haskell (and the category Set) that we tend not to refer to explicitly. It could be said that we're implicitly using strength whenever we refer to earlier variables in our do expressions.


See also nlab.


> main = do
>   print test1
>   print test2
>   print test3
>   -- print test4
>   print test5

Sunday, March 05, 2023

Constructing Clifford Algebras using the Super Tensor Product

Some literate Haskell but little about this code is specific to Haskell...


> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE UndecidableInstances #-}
> 
> import GHC.TypeLits



Introduction

This is a followup to Geometric Algebra for Free and More Low Cost Geometric Algebra.


In those articles I showed how you could build up the Clifford algebras like so:


type Cliff1  = Complex R
type Cliff1' = Split R
type Cliff2  = Quaternion R
type Cliff2' = Matrix R
type Cliff3  = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4  = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5  = Quaternion Cliff3'
...


I used CliffN as the Clifford algebra for a negative definite inner product and CliffN' for the positive definite case. It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.


The core principle making this work is that for type constructors implemented like Matrix, Quaternion etc. we have the property that



eg. Matrix (Quaternion Float) is effectively the same thing as Matrix Float Quaternion Float.


But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:


> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4


...


Or even better:


> type family Cliff (n :: Nat) :: * where
>   Cliff 0 = Float
>   Cliff n = Complex (Cliff (n - 1))


But there's one little catch. We have to work, not with the tensor product, but the super tensor product.


We define Complex the same way as before:


> data Complex a = C a a deriving (Eq, Show)


Previously we used a definition of multiplication like this:


instance Num a => Num (Complex a) where
  C a b * C c d = C (a * c - b * d) (a * d + b * c) 


We can think of C a b in Complex R as representing the element . The definition of multiplication in a tensor product of algebras is defined by . So we have .


This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.



Let's go Super!

A superalgebra is an algebra graded by where is the ring of integers modulo 2. What that means is that we have some algebra that can be broken down as a direct sum (the subscripts live in ) with the property that multiplication respects the grading, ie. if is in and is in then is in .


The elements of are called "even" (or bosonic) and those in "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)


The super tensor product is much like the tensor product but it respects the grading. This means that if is in and is in then is in . From now on I'm using to mean super tensor product.


Multiplication in the super tensor product of two superalgebras and is now defined by the following modified rule: if is in and is in then . Note that the sign flip arises when we shuffle an odd left past an odd .


The neat fact that John pointed out to me is that .


We have to modify our definition of * to take into account that sign flip.


I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy. But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.


So I could define SuperAlgebra like so:


class SuperAlgebra a where
  conjugation :: a -> a


where conjugation is the negation of the odd part.


(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)


But there's a little efficiency optimization I want to write. If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row. This means applying negate to many elements of large algebraic objects even though any pair of them cancel each other's effect. So I add a little flag to my conjugation function that is used to say we want an extra negate and we can accumulate flips of a flag rather than flips of lots of elements.


> class SuperAlgebra a where
>   conjugation :: Bool -> a -> a


Here's our first instance:


> instance SuperAlgebra Float where
>   conjugation False x = x
>   conjugation True x = negate x


This is saying that the conjugation is the identity on Float but if we want to perform an extra flip we can set the flag to True. Maybe I should call it conjugationWithOptionalExtraNegation.


And now comes the first bit of non-trivial superalgebra:


> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
>   conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)


We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to . But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation. And that should explain why conjugation is defined the way it is.


Now we can use the modified rule for defined above:


> instance (Num a, SuperAlgebra a) => Num (Complex a) where
>   fromInteger n = C (fromInteger n) 0
>   C a b + C a' b' = C (a + a') (b + b')
>   C a b * C c d = C (a * c - conjugation False b * d)
>                     (conjugation False a * d + b * c) 
>   negate (C a b) = C (negate a) (negate b)
>   abs = undefined
>   signum = undefined


For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.


That's it!



Tests

I'll test it with some examples from Cliff3:


> class HasBasis a where
>   e :: Integer -> a


> instance HasBasis Float where > e = undefined


> instance (Num a, HasBasis a) => HasBasis (Complex a) where > e 0 = C 0 1 > e n = C (e (n - 1)) 0


> make a b c d e f g h = > C (C (C a b) (C c d)) > (C (C e f) (C g h))


> e1, e2, e3, e21, e31, e32, e321 :: Cliff 3 > e1 = e 0 > e2 = e 1 > e21 = e2 * e1 > e3 = e 2 > e31 = e3 * e1 > e32 = e3 * e2 > e321 = e3 * e2 * e1


> main = do > print (e1 * e1 + 1 == 0) > print (e31 * e31 + 1 == 0) > print (e3 * e3 + 1 == 0) > print (e21 * e21 + 1 == 0) > print (e2 * e2 + 1 == 0) > print (e32 * e32 + 1 == 0) > print (e321 * e321 - 1 == 0) > print (e3 * e2 * e1 - e321 == 0) > print (e2 * e1 - e21 == 0) > print (e3 * e1 - e31 == 0) > print (e3 * e2 - e32 == 0) > print (e21 * e32 - e31 == 0)



Observation

The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction. It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative). Nonetheless, I think comparison with Cayley-Dickson may be useful.



Efficiency

As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications. I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four. This should lead to a fast implementation of Clifford algebras.


Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras. I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with .


Note also that this code translates directly into many languages.