A Neighborhood of Infinity

Wednesday, May 21, 2008

Life in a Lazy Universe

I've seen lots of articles on simulated reality recently. Supposing for a bit that this is a sensible hypothesis to entertain and not just a science fictional conceit, what might be the consequences of supposing that such a 'reality' is running on a machine with lazy evaluation?

If your task were to simulate a universe well enough for its inhabitants to be convinced it was seamless, then there is an obvious optimisation that could be made. You could simulate just those parts that could be perceived by the inhabitants. But the catch is that if an inhabitant were to explore a new region, the simulation would be required to fill in that region. Just creating a blank new area wouldn't do, it's current state would need to be consistent with having had a plausible history and so the simulation would be required to fill in not just its current state, but also its past. This is precisely what is provided by lazy evaluation - demand driven evaluation that takes place only when results are observed. It seems natural that such a simulation should make use of laziness.

But such optimisation doesn't need to be limited to lazily computing 'new regions' where by 'region' we mean a spatially localised volume. We could also imagine implementing level of detail. If we don't look closely at an object in our grasp, there's no need to compute every single detail of it. We need only provide detail at the resolution at which it can be perceived. We'd like this built into every aspect of the simulation so that anything within it is computed only to a desired accuracy. I've spoken a few times about such a data structure in posts like this. This is precisely what the real numbers give us. Computationally, a real number is an object which when given a degree of accuracy, returns a rational number (or some other finitely manipulatable structure) that represents the real to this accuracy. The wikipedia article suggests that 'the use of continua in physics constitutes a possible argument against the simulation of a physical universe'. This is diametrically opposed to what I might argue: in fact the presence of continua suggests the existence of an efficient demand driven simulation with level of detail. The problem is that the people who have been thinking about these issues have been thinking in terms of traditional imperative style programming where a real number is typically represented to some fixed finite precision. But in fact there are infinitely many real numbers that can be represented exactly if we think in terms of lazy data structures.

But this is all idle speculation if we don't make predictions. So here is one. In a computer simulated universe, all physics must obviously be computable. But all computable functions are continuous. So in a simulated universe we'd expect the physical laws involving real numbers to make use of them only in a continuous way. Sound familiar?



The next section is an aside and I need to give a spoiler alert as I'll be mentioning some aspects of Greg Egan's Permutation City.

What exactly what is meant by 'demand driven' above? If we have implemented a simulated universe I expect we would be interested in looking in on it from time to time. So this is what is usually meant by 'demand'. Whenever the program performs I/O to show us what is going on, it would trigger the evaluation of the lazy thunks that had been sitting around. But consider how the universe might appear to its inhabitants. Whether or not we look in on it us surely irrelevant to the inhabitants, assuming we just look and don't try to communicate. But if I/O isn't performed, functional programmers don't make much of a distinction between an unevaluated thunk and an evaluated one. They are observationally equivalent. So do we in fact need to run the simulation for its inhabitants to consider themselves to exist? Anyway, I won't say anything more about this because Permutation City deals with this issue at length. I'm just rephrasing it as a lazy evaluation issue.

My next post will be on life in a monadic universe.


Update: I'm going to add some detail to the above. Suppose we represent real numbers as infinite sequences of digits. Suppose also that we have some module in a simulation that, let's say, resolves collisions between particles. You might ask it "what is the velocity, to 20 digits, of particle A after the collision". It must then look at all of the relevant inputs, decide how accurately it needs them, and then propagate new requests. For example it might work like this "if I need the velocity of A to 20 digits I need the mass of particle B to 25 digits and the velocity of particle C to 26 digits..." and so on. The idea of demand driven modules that produce outputs to a given accuracy by making requests of their inputs to a given accuracy exactly parallels the subject of analysis in mathematics. There you ask questions like "if I need to know f(x,y) with an error of no more than δ, what ε of error can I tolerate in x and y?". Functions that allow this kind of analysis are precisely the continuous ones. In other words, a simulated universe that is built from continuous functions lends itself to a demand driven implementation. One might argue that the whole of the fields of analysis (and topology) are the study of functions that can be evaluated in this way. And of course, in our universe, the functions we study in physics are typically the continuous functions. So contrary to the claim that the existence of continua argues against the idea that the universe is a simulation, I'd like to point out that they might actually make it more convenient to simulate our universe. In fact, we use this fact all the time in ordinary everyday physics simulations because we know we only need to work to a certain level of accuracy to get accurate results.

The claim that this universe can be simulated, and the claim that this universe is a simulation, are philosophical claims that I don't like to waste time on. But the claim that the existence of continua blocks the notion that the universe is a simulation is a fairly precise statement of logic, mathematics and physics. I claim it is wrong, and that's the only claim I'm making here.

Saturday, May 17, 2008

The Interchange Law

I find it amazing the way you can take a bunch of abstract nonsense and turn it into concrete Haskell code for doing something mundane. Having down-to-earth examples really does help with getting your head around the abstractions.

Anyway, on page 43 of Categories for the Working Mathematician (2nd Ed.) is the 'interchange law' for horizontal and vertical composition of natural transformations. Not only can we find a nice mundane example to help thinking about it, we can even test it using Quickcheck.

For convenience we can define a natural transformation type:


> {-# OPTIONS -fglasgow-exts -fno-warn-missing-methods #-}

> import Test.QuickCheck
> import Control.Monad.Writer

> type Natural f g = forall a.f a -> g a

Where f and g are intended to be instances of Functor.

If we have a natural transformation from f to g, and another from g to h, then we can compose them using the ordinary Haskell composition operator (.). In Haskell, functors are frequently containers, so we can use containers as a guiding example. If we know how to map a bag of stuff to a sack of stuff, and we know how to map a sack of stuff to a box of stuff, then we just perform the two operations in sequence and we can map a bag of stuff to a box of stuff. This is known as vertical composition.

But there is also another way to compose natural transformations known as horizontal composition. Again we can think about containers. Suppose we know how to map a bag of stuff to a sack of stuff, and we know how to map a box of stuff to a crate of stuff, then we also know how to map a bag of boxes of stuff to a sack of crates of stuff. There are two obvious ways to do this: we could convert the bag of boxes of stuff to a sack of boxes, and then convert each box to a crate. Or we could convert the boxes to crates first, and then map the resulting bag of crates to a sack of crates. We can define two binary operators `o` and `o'` to perform each of these tasks:

> o,o' :: (Functor f,Functor f',Functor g,Functor g') => Natural f' g' ->
> Natural f g -> (forall c.f' (f c) -> g' (g c))
> o s t x = s (fmap t x)
> o' s t x = fmap t (s x)

Intuitively we'd expect these things to be equal, and in fact they always are. (Exercise: write a quickcheck for this based on the code below.)

Now we need some functorial containers to play with:

> data Pair x = Pair x x deriving (Eq,Show)
> instance Functor Pair where
> fmap f (Pair a b) = Pair (f a) (f b)

> newtype Id x = Id x deriving (Eq,Show)
> instance Functor Id where
> fmap f (Id x) = Id (f x)

Now we can define a bunch of natural transformations mapping between these containers and some others:

> alpha :: Natural Pair []
> alpha (Pair x y) = [x,y]

> beta :: Natural [] Maybe
> beta [] = Nothing
> beta (x:xs) = Just x

> alpha' :: Natural ((,) a) Id
> alpha' (a,x) = Id x

> beta' :: Natural Id (Either b)
> beta' (Id x) = Right x

On page 43 is the interchange law. Superficially it looks a lot like abiding. For any α, β, α' and β' with compatible types, we have

(β . α) o (β' . α') = (β o β') . (α o α')


This is the identity I want to check for the special case of the types I've chosen above. So we can define the left and right hand sides:

> lhs = (beta . alpha) `o` (beta' . alpha')
> rhs = (beta `o` beta') . (alpha `o` alpha')

> type From = Pair (Float,Integer)
> type To = Maybe (Either String Integer)

And here we go:

> test1 = quickCheck (\x -> lhs (x :: From) == (rhs x :: To))

Just type test1 in ghci to hear the sweet sound of 100 tests passing.

Anyway, we're just a hair's breadth away from defining 2-categories now. But I'll leave that for another day.

Thinking about it, I gave slogans for the previous theorems so why don't I give one for the interchange law. Take a deep breath. If you know how to convert a bowl of stuff into a box of stuff, and a box into a bag, and an urn into a crate, and a crate into a sack, then there are two equivalent ways to convert a bowl of urns into a bag of sacks: we can either construct methods going from bowl to bag and from urn to sack and combine them to go from bowl of urns to a bag of sacks, OR, we can go from a bowl of urns to a box of crates to a bag of sacks. I'm sure it's all a lot clearer now. ;-)

Well that's enough abstract nonsense. Time to get away from it all and watch some youtube videos. Eh? Oh well. Just wish me luck for tomorrow when I try to beat my fastest time in Bay to Breakers. Maybe some other bayfpers will be there too.



> instance Arbitrary a => Arbitrary (Id a) where
> arbitrary = liftM Id arbitrary

> instance Arbitrary a => Arbitrary (Pair a) where
> arbitrary = liftM2 Pair arbitrary arbitrary

Saturday, May 10, 2008

Desingularisation and its applications


> {-# OPTIONS -fno-warn-missing-methods #-}

Here's a neat trick you can carry out with the Dual class that I've talked about many times before. Suppose you define a function like

> sinc x = sin x/x

Clearly this function has a singularity at x=0 where the division isn't defined. But this is an example of a removable singularity because it is possible to replace this function with another that is continuous (in fact, analytic) everywhere and is well defined at x=0. We could do this simply by defining:

> sinc' 0 = 1
> sinc' x = sin x/x

but it'd be cool if we could automate the process. Here's a function that does exactly that:

> desingularise f x = re (f (D x 1))

As if by magic, we can now compute

> ex1 = desingularise sinc 0

We can try other functions too:

> ex2 = desingularise (\x -> x/(exp x-1)) 0

So how does it work? Well I've modified my usual definition of (/) so that it is able to divide one infinitesimal number by another:

> instance Fractional a => Fractional (Dual a) where
> D 0 a'/D 0 b' = D (a'/b') 0
> D a a'/D b b' = D (a/b) ((b*a'-a*b')/(b*b))

desingularise 'perturbs' the value of its argument infinitesimally so that when we pass in an argument of zero we end up evaluating our function not at zero, but somewhere infinitesimally close. So our division by zero becomes division of infinitesimals and the above definition comes into play. The first clause of the definition can also be viewed as an application of l'Hôpital's rule.

But it doesn't solve all of our problems. For example

> ex3 = desingularise (\x -> sin x^2/x^2) 0

doesn't quite work. You need two applications of l'Hôpital's rule. We can implement this easily with

> ex4 = desingularise (desingularise (\x -> sin x^2/x^2)) 0

That's fine if you know how bad your singularity is in advance. If you don't, you can use power series instead. None of these methods will work for this function however:

> horrible x = exp (-1/x^2)

Anyway, the fact that it doesn't work for all functions doesn't detract from the fact that it's a pretty approach from solving the problem when you have a fairly well behaved singularity.

I've not actually seen anyone else use dual numbers to smooth over singularities in this way, but a few years ago I did come across something very closely related that apparently caused a little controversy in the computational geometry world.

Suppose you want to determine whether or not a point lies within a possibly non-convex and arbitrarily complicated polygon. One algorithm is this: follow a line in any direction from the point to infinity (though just outside the bounding box of the polygon would do) and count how many times you cross the boundary of the polygon. If you count an odd number of crossings you're in the polygon, otherwise you're outside. It's beautifully simple and if nobody had thought of it before it'd be worthy of publication.

But like the methods described in many computational geometry papers I've read, this one often fails in practice. There are at least two sources of difficulty. Firstly, computational geometry algorithms are subject to numerical errors. You often find yourself having to decide whether or not a point is on one side or another of a line, say, and getting inconsistent results because two different ways of computing the same thing result in different decisions being made because of rounding. That can be fixed by using something like arbitrary precision rational arithmetic. But even if you do this, there is still a problem.

When you follow a line from the point you're testing you may find that it doesn't simply cross one of the sides of the polygon, it may travel precisely through a vertex. If your code isn't written carefully you may accidentally count that as zero, one or two crossings. It's easy to write sloppy code that gets it right most of the time, but it's hard to work out every possible case where some kind of degeneracy might happen. What happens if the outward ray is actually collinear with a polygon face? Do vertex intersections and collinearities cover everything?

An ugly kludge is this: at the first sign of trouble, restart the algorithm but jiggle all the inputs randomly a bit. For a large class of algorithms this will work tolerably well, depending on your definition of tolerable. (Compare with Sard's theorem in Morse theory.) But it really is ugly, especially if you've gone through the hassle of using arbitrary precision arithmetic. But maybe you can see what's coming...

Back in '94 Edelsbrunner and M√Âșcke came up with the idea of deforming the geometry infinitesimally, giving their method the utterly bizarre name "Simulation of Simplicity". I think it took people a while to realise that they were actually doing the same thing as people doing automatic differentiation and so the whole notion of an infinitesimal perturbation left people wondering what in fact they were doing. More seriously was this: writing robust computational geometry code that catches all of the degenerate cases is hard. How is it that such a simple algorithm could be slotted into a sloppily written polymorphic algorithm and turn into a watertight one? Was everyone in computational geometry going to be put out of work?

Luckily for the computational geometers it still takes quite a bit of work to get the details exactly right, and even when you do, the result isn't necessarily an efficient solution to the problem. I know that when I had a messy computational geometry problem to solve when I was working on MI:2 I just wrote code that turned a blind eye to degeneracies and aggressively pruned away anything that looked suspicious: like near zero length lines and near zero area faces. If a stray pixel poked through somewhere that it shouldn't, and tweaking the tolerance didn't fix it, an artist found a way to hide it. Sometimes beauty and elegance come second place to getting the job done on time.



> data Dual a = D a a deriving (Show,Eq)

> re (D a _) = a
> im (D _ b) = b

> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)

> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')
> exp (D a a') = D (exp a) (exp a*a')

Saturday, May 03, 2008

You Could Have Defined Natural Transformations

I'll assume you know what a functor is (and optionally a bit of Haskell):

Consider these two functions


f:Z x Z -> Z    f(x,y) = x+y
g:Z x Z -> Z    f(x,y) = x

There's a fundamental difference between them. In some sense f "makes use" of the fact that we're applying functions to integers, but even though g is acting on integers, it's not using the fact that we're acting on integers. But this is a vague idea. How can we make it precise?

Consider the same in the context of computer programs. This applies in many programming languages, but I'll use Haskell here

> f (x,y) = x+y
> g (x,y) = x

When we ask ghci for the types of these it tells us that f is of type (Num t) => (t, t) -> t but that g is of type (t, t1) -> t. ghci has spotted that f is making use of an interface exposed by integer types (the type class Num) but that g makes no assumptions at all about its arguments. So in a programming language (with static types) we can make the notion above precise by having a type signature that explicitly says something like "I'm not going to make any assumptions about my arguments". But how can we say something like this in the language of set theory, say? Set theory doesn't come with a mechanism for making such promises. (Well, ZF out of the box doesn't, anyway.)

So consider a set-theoretical function like this

h:Z x Z -> Z x Z x Z
h(a,b) = (b,a,a)

What does it mean to say that h doesn't use specific facts about its arguments? It clearly doesn't mean that h doesn't depend on its arguments because the right hand side clearly depends on a and b. We need to say that the right hand side depends on the arguments, but that it somehow doesn't make specific use of them.

Consider an example:

h(1,2) = (2,1,1).

Part of what we mean by not using specific knowledge is that if we were to replace one argument by another value we expect to perform the same substitution on the right. For example, replacing 1 by 3 should give:

h(3,2) = (2,3,3)

So for any 'substitution' function k:Z -> Z we want

h(k(1),k(2)) = (k(2),k(1),k(1))

More generally, we want:

h(k(a),k(b)) = (k(b),k(a),k(a))

Note that we've made use of the functions (a,b) -> (k(a),k(b)) and (a,b,c) -> (k(a),k(b),k(c)) to implement the idea of applying the same substitution to each of the elements. We might want to work with a wide variety of mathematical 'containers' like tuples and sequences as well as more abstract objects. Can we generalise to all of these in one go?

The function that coverts a set X to X x X, or X x X x X forms part of a functor. The other part maps functions between sets to functions between containers. So we can work with the usual ordered pair functor L with LX = X x X and Lf(x,y) = (f(x),f(y)). Similarly we can use MX = X x X x X and Mf(x,y,z) = (f(x),f(y),f(z)). We can stop talking about 'containers' now and talk about functors instead. We can rewrite our rule above as

h(Lk(a,b)) = Mk(h(a,b)).

I've been working with h acting on containers of integers, but if h truly doesn't rely on specific properties of its arguments then there is a version of h for any set X, ie. for any X we have a function

hX : X x X -> X x X x X
hX(a,b) = (b,a,a)

and when we use a function k to substitute values in the arguments there's no reason to substitute values from the same set. For example, if we let A be the set of the 26 letters of the alphabet (using your favourite way to encode letters as sets), then we could let k be the function that maps n to the nth letter (modulo 26) and still expect

hA(Lk(1,2)) = Mk(hZ(1,2))

to hold. For example

hA('a','b') = ('b','a','a')

More generally, h is a family of functions such that for any k:X -> Y we have:

hY o Lk = Mk o hX

Note that there's nothing set-specific about this, it's a statement that could hold in any category.

We can take this as a definition. A natural transformation between functors L,M:C -> D is a family of morphisms hX, one for each object X in C, such that for any morphism k:X -> Y in C we have:

hY o Lk = Mk o hX

And there we have it. Starting with the idea of a function that doesn't 'peek inside' its arguments, we're led inexorably to the idea of substitutability of arguments and from there to a categorical definition of a natural transformation.

As the definition has been generalised from Set to any category you can have fun unpacking the definition again in other categories. For Set-like categories with functions as morphisms you'll often find that natural transformations are still functions that don't make too many assumptions about their arguments, but in each case the rule about what exactly constitutes 'too many assumptions' will be different. The category of vector spaces is a good place to start.

One last point: Category theory was more or less invented for natural transformations. And it's curious how it fits so nicely with the idea of abstraction in computer science, ie. the idea of a function that is limited to using a particular interface on an object.

One interesting result relating to natural transformations is the Yoneda lemma. Note also that the Theorems for Free are also largely about similar kinds of substitutions on function arguments.

Sunday, April 27, 2008

Infinitesimal rotations and Lie algebras

A little while back I tried to give a rough idea of what Lie algebras were about. What I want to do now is show how with a bit of Haskell code we can directly get our hands on one. I'm going to assume you know how to make matrices for rotations, and multiply them, and some knowledge from previous posts, for which I'll give links.

Firstly a bit of Haskell administration:


> {-# OPTIONS -fno-warn-missing-methods #-}

Now we need some quick and dirty matrix code:

> data Matrix a = M [[a]] deriving (Eq,Show)

> instance Functor Matrix where
> fmap f (M a) = M $ map (map f) a

> instance Num a => Num (Matrix a) where
> M a * M b = M $ mult a b

A Lie Group


What I'm going to do is start by constructing elements of the group of 3D rotations, otherwise known as SO(3), and show how there's another algebraic structure hidden inside it. So let's make some rotation matrices:

> rx theta = M $ [[1,0,0],
> [0,cos theta,-sin theta],
> [0,sin theta,cos theta]]

> ry theta = M $ [[cos theta,0,sin theta],
> [0,1,0],
> [-sin theta,0,cos theta]]

> rz theta = M $ [[cos theta,-sin theta,0],
> [sin theta,cos theta,0],
> [0,0,1]]

These are the three rotations around the x-, y- and z-axes. It's traditional to build arbitrary rotations through the use of Euler angles:

> euler [a,b,c] = rx a*ry b*rz c

The 3D rotations form an example of a Lie group. (A Lie group is essentially a group where the operations like multiplication are differentiable.)

Any 3D rotation can be constructed from a single application of euler. But notice how there's a bit of ugliness in this function. I've made an arbitrary choice of which order to apply the rotations in. I could have defined:

> euler' [a,b,c] = rz c*ry b*rx a

And it's easy to show that eulereuler'. This is because rotations don't commute. In other words, for rotations, a*b≠b*a.

We can measure the non-commutativity. Remember that any rotation has an inverse. For example rx theta*rx (-theta) gives us the identity matrix because one of these two rotations 'undoes' the other. Given any two rotations we can construct what is known as their commutator:

> commutator a b = inverse a*inverse b*a*b

The idea is that we first perform b, then a, then undo b and then undo a. If a and b commute then this expression can be rearranged to inverse a*inverse b*b*a and then the inverses cancel leaving us with the identity matrix. If they don't commute then we end up with a non-identity matrix. So the comuutator measures the extent to which matrices don't commute.

As I'm feeling lazy, I don't feel like writing inverse. Instead, as I'm only going to work with rotations, I'll use the fact that the inverse of a rotation matrix is the transpose and define:

> inverse = transpose

Try playing with expressions like commutator (rx 1) (ry 2). Note how the numbers quickly get messy. Try to write down closed form expressions for applications of euler and you'll see how complex things can get.

A Lie Algebra


But there's another way to approach the rotation group - through 'infinitesimal' rotations. In my earlier article I just talked about infinitesimal group operations in a hand-wavey way. Now I'm going to make the notion more rigorous. We just need to introduce an infinitesimal number, d, whose square is zero. I've talked about this a lot before so I'm borrowing my earlier code and defining:

> d :: Num a => Dual a
> d = D 0 1

If you try it you'll see that d*d is zero.

Now we can try making infinitesimal rotations:

> rot1 = euler [d,2*d,3*d]

Note how when we evaluate this we get 'nice' numbers. No need to worry about those trig functions any more. And if you look closely at rot1 you'll see that it's essentially the identity matrix plus an infinitesimal part. We can pull the infinitesimal part out using fmap im rot1. You may be able to guess how to build it from the arguments to euler. But first, try evaluating rot2:

> rot2 = euler' [d,2*d,3*d]

It's the same! Try other infiitesimal rotations. When dealing with infinitesimal rotations it doesn't matter what order you apply them in, you get the same result. Working with infinitesimal rotations is looking much easier than working with full=size rotations. In fact, it gets better. Try defining

> rot3 = euler [5*d,-d,2*d]

Now look at fmap im (rot1*rot3) and compare with fmap im rot1 and fmap im rot3. We can multiply infinitesimal rotations simply by adding their infinitesimal parts. In fact, we can define

> star [a,b,c] = M $ [[0, -c, b],
> [c,0,-a],
> [-b,a,0]]

So we have:
fmap im (euler [a*d,b*d,c*d]) == star [a,b,c]

and
fmap im (euler [a*d,b*d,c*d]*euler [u*d,v*d,w*d]) == fmap im (euler [(a+u)*d,(b+v)*d,(c+w)*d])


Not a single trig expression anywhere!

So we have a simplified way of viewing rotations by looking at infinitesimal rotations. A triple [a,b,c] can be thought of as representing an infinitesimal rotation through star and instead of multiplying matrices we just add the triples elementwise. These triples, together with the binary operation of addition form an example of a Lie algebra. But there's a piece missing. We have lost all information about the non-commutativity of the rotations. It's one thing to simplify, but it's another to lose an important feature of what you're looking at.

The problem is that d is 'too small'. We need an infinitesimal that doesn't go to zero the moment you square it, but is still, in some sense, infinitesimally small. We could rewrite the Dual type. But there's a trick. Define:

> e :: Num a => Dual (Dual a)
> e = D (D 0 1) (D 1 0)

(If you think of Dual as forming a tensor product as I described here then e=1⊗d+d⊗1.)

You can check that e^2 is non-zero but e^3 is zero.

Now when we compute a commutator we get something a little different:

> comm1 = commutator (euler [e,0,0]) (euler [0,e,0])

fmap re comm1 is essentially the identity as before. But if we look at fmap im comm2 there's a non-zero infinitesimal piece which is different from what we had when we worked with d. This infinitesimal piece is in fact proportional to e^2. As (im . im) (e^2) is a half, we can extract the coefficient of e^2 from comm1 using

> im2 x = im (im x)/2

In fact, we have fmap im2 comm1 == star [0,0,1]. So by choosing infinitesimals that aren't too small, we haven't lost information about non-commutativity. In fact, a bit of experimentation may convince you that with:

> shrink u = fmap (e*) u

we have:
fmap im2 (commutator (euler (shrink u)) (euler (shrink v))) == star (u `cross` v)


So let's step back a bit and recapitulate all this in something approaching English:

Given a tiny rotation we can represent it as three Euler angles a, b, c, all of which are tiny. We can think of a, b and c as forming a vector [a,b,c]. When we do this, apart from an even smaller error, multiplication of rotations becomes ordinary addition of vectors and the order of rotations isn't significant. But if we choose not to ignore this small error we see that a rotation represented by u and a rotation represented by v don't quite commute and the order does matter. The size of this error is measured by the cross product of u and v. This is intuitively plausible, we'd expect that rotations defined by vectors in a similar direction would be closer to commuting, and this is reflected in the fact that the cross product is zero for parallel vectors.

So now I can fill in the missing piece from the description of the Lie algebra I gave above. The Lie algebra so(3) consists of the 3d vectors (representing infinitesimal rotations), addition of vectors (representing multiplcation of infinitesimal rotations) and the cross product (measuring the amount by which small rotations fail to commute). This picture doesn't just apply to the rotations, a similar one applies for any Lie group. We get the same pattern of a simplified form of multiplication and a binary operation that measures non-commutativity.

But you might still ask if there's something missing. What if we defined f so that f^4 is zero but f^3 isn't? Would we extract more information about the non-commutativity of rotations? Well the interesting fact, which I won't prove here, is that it doesn't. Almost everything you need to know about a Lie group can be extracted from its Lie algebra. In fact, from a group's Lie algebra you can almost recover the original group. (In fact, what you recover is its universal cover.) But notice how there are no trig formulae involved when talking about Lie algebras. Lie algebras give a really nice way to study Lie groups without getting your hands too dirty. But this isn't the only reason to study Lie algebras, many physical properties arising from symmetries are more naturally studied through the Lie algebra because Lie algebras arise from Lie groups as soon as you start differentiating things. In particular, Lie algebras play a major role in Noether's theorem, one of the cornerstones of modern theoretical physics.

In summary then, I hope I've given some flavour of Lie algebras. Using infinitesimals you can get your hands on them directly without the use of a large amount of mathematical machinery. There has been some difficult stuff here, but I'm hoping that the freedom you now have at the Haskell prompt to play with the things I've been talking about will make up for the inadequacies of my explanations.

One last word: in principle we could do the same with E8. But we'd need 248 by 248 matrices, and the equivalent of euler would need 248 parameters. (That's a coincidence, in general the number of parameters needed to define an element of a Lie group isn't equal to the dimension of the matrix, but it is for SO(3) and E8).


Appendix (the bits of code I left out above)



Defining the infinitesimals:

> data Dual a = D a a deriving (Show,Eq)

Extract the 'full-size' and 'infinitesimal' parts of a number:

> re (D a _) = a
> im (D _ b) = b

> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)

> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')

> instance Fractional a => Fractional (Dual a)

Some useful matrix and vector operations:

> mult a ([]:_) = map (const []) a
> mult a b = zipWith (:) (map (dot (map head b)) a) (mult a (map tail b))

> transpose (M a) = M $ transpose' a where
> transpose' [] = repeat []
> transpose' (xs : xss) = zipWith (:) xs (transpose' xss)

Some simple vector operations

> dot a b = foldr (+) 0 $ zipWith (*) a b
> cross [a,b,c] [d,e,f] = [b*f-c*e,c*d-a*f,a*e-b*d]

Saturday, April 12, 2008

Negative Probabilities

I'm always interested in new ways to present the ideas of quantum mechanics to a wider audience. Unfortunately the conventional description of quantum mechanics requires knowledge of complex numbers and vector spaces making it difficult to avoid falling back on misleading analogies and metaphors. But there's another approach to quantum mechanics, due to Feynman, that I have never seen mentioned in the popular science literature. It assumes only basic knowledge of probability theory and negative numbers to get a foot on the ladder.

We start with tweaking probability theory a bit. One of the axioms of probability theory says that all probabilities must lie in the range zero to one. However, we could imagine relaxing this rule even though on the face of it it seems meaningless. For example, suppose we have a coin that has a 3/2 chance of landing heads and a -1/2 chance of landing tails. We can still reason that the chance of getting two heads in a row is 3/2×3/2=9/4 by the usual multiplication rule. But obviously no situation like this could ever arise in the real world because after tossing such a coin 10 times we'd expect to see -5 tails on average.

But what if we could contrive a system with some kind of internal state governed by negative probabilities even though we couldn't observe it directly? So consider this case: a machine produces boxes with (ordererd) pairs of bits in them, each bit viewable through its own door. Let's suppose the probability of each possible combination of two bits is given by the following table:







First bitSecond bitProbability
00-1/2
011/2
101/2
111/2

Obviously if we were able to look through both doors we'd end up with the meaningless prediction that we'd expect to see 00 a negative number of times. But suppose that things are arranged so that we can only look through one door. Maybe the boxes self-destruct if one or other door is opened but you still get enough time to see what was behind the door. Now what happens?

If you look through the first door the probability of seeing 1 is P(10)+P(11)=1. We get the same result if we look through the second door. We only get probabilities in the range zero to one. As long as we're restricted to one door we get meaningful results.

If we were to perform this experiment repeatedly with different runs of the machine, each time picking a random door to look through, we'd eventually become very confident that every box contained 11. After all, if we freely choose which door to look through, and we always see 1, there's no place 0's could be 'hiding'.

But now suppose a new feature is added to the box that allows us to compare the two bits to see if they are equal. It reveals nothing about what the bits are, just their state of equality. And of course, after telling us, it self-destructs. We now find that the probability of the two bits being different is P(01)+P(10)=1. So if we randomly chose one of the three possible observations each time the machine produced the box we'd quickly run into the strange situation that the two bits both appear to be 1, and yet are different. But note that although the situation is weird, it's not meaningless. As long as we never get to see both bits at the same time we never directly observe a paradox. If we met such boxes in the real world we'd be forced to conclude that maybe the boxes knew which bit you were going to look at and changed value as a result, or that maybe you didn't have the free will to choose door that you thought you had, or maybe, even more radically, you'd conclude that the bits generated by the machine were described by negative probabilities.

That's all very well, but obviously the world doesn't really work like this and we never see boxes like this. Except that actually it does! The EPR experiment has many similarities to the scenario I described above. The numbers aren't quite the same, and we're not talking about bits in boxes, but we do end up with a scenario involving observations of bits that simply don't add up. If we do try to explain what's going on using probability theory, we either conclude there's something weird about our assumptions of locality or causality or we end up assigning negative probabilities to the internal states of our systems. In fact, you can read the details in an article by David Schneider. Being forced to conclude that we have negative probabilities in a physical system is usually taken as a sign that we have a contradiction. In the case of Bell's theorem it shows that we can't interpret what we see in terms of probability theory and hence that the weirdness of quantum mechanics can't be explained in terms of some hidden random variable that we can't see. QM simply doesn't obey the rules you'd expect of hidden variables we can't see.

But in a paper called Negative Probability, Feynman tried taking the idea of negative probabilities seriously. He showed that you could reformulate quantum mechanics completely in terms of them so that you no longer needed to think in terms of the complex number valued 'amplitudes' that physicists normally use. This means the above isn't just an analogy, it's actually a formal system within which you can do QM, although I haven't touched on the bit that refers to the dynamics of quantum systems. So if you can get your head around the ideas I've talked about above you're well on your way to understanding some reasons why quantum mechanics seems so paradoxical.

At this point you may be wondering how nature contrives to hide these negative probabilities from direct observation. Her trick is that making one kind of observation disturbs up the state of what you've observed so that you can't make the other kind of observation on a pristine state. You have to pick one kind of observation or the other. Electrons and photons really are a lot like the boxes I just described.

So why don't physicists use this formulation? Despite the fact that negative numbers seem simpler to most people than imaginary numbers, the negative number formulation of QM is much more complicated. What's more, because it makes exactly the same predictions as regular QM there's no compelling reason to switch to it. And anyway, it's not as if directly observing negative probabilities is any more intuitive or meaningful than imaginary ones. Once you've introduced negative ones, you might as well go all the way!

This all ties in with what I said a while back. The important thing about QM is that having two ways to do something can make it less likely to happen, not more.

For a different perspective this is an interesting comment.

Footnote: We can embed QM in negative probability theory. But can we do the converse? Can every negative probability distribution be physically realised in a quantum system? I've a hunch the answer is obvious but I'm too stupid to see it.

Sunday, March 23, 2008

Transforming a comonad with a monad

I can't yet read most of this paper on Combining a Monad and a Comonad because I've no experience with 2-categories. Nonetheless, there's something useful I can extract from the bits that do make sense - the idea of combining a monad with a comonad in the way that we already combine monads using monad transformers. In my previous post I hinted at the idea that the cobind operation for a comonad was a bit like an operation on a SIMD computer that allows every processor to get access to information local to every other processor. The idea now will be to allow those processors to also exploit monads to perform operations like IO.

First let me first reintroduce the code for the array comonads I considered before so that this post again becomes a self-contained literate Haskell post:



> {-# OPTIONS -fno-monomorphism-restriction #-}
> {-# LANGUAGE MultiParamTypeClasses,FlexibleInstances,Arrows #-}

> import Data.Array
> import Data.Foldable as F
> import Control.Monad
> import Control.Arrow

> data Pointer i e = P { index::i, array::Array i e } deriving Show

> bind f x = x >>= f

> class Functor w => Comonad w where
> (=>>) :: w a -> (w a -> b) -> w b
> x =>> f = fmap f (cojoin x)
> coreturn :: w a -> a
> cojoin :: w a -> w (w a)
> cojoin x = x =>> id
> cobind :: (w a -> b) -> w a -> w b
> cobind f x = x =>> f

> instance Ix i => Functor (Pointer i) where
> fmap f (P i a) = P i (fmap f a)

> instance Ix i => Comonad (Pointer i) where
> coreturn (P i a) = a!i
> P i a =>> f = P i $ listArray bds (fmap (f . flip P a) (range bds))
> where bds = bounds a


Remember how things work with monads. If m is a monad, then given a function a -> m b we can lift it to a function m a ->m b. Similarly, with a comonad w we can lift a function w a -> b to a function w a -> w b. In fact, I wrote about all this a while back.

Just like before, we can write a simple 'blur' function, except this time it also performs some I/O.


> wrap i = if i<0 then i+5 else if i>4 then i-5 else i

> blur (P i a) = do
> let k = wrap (i-1)
> let j = wrap (i+1)
> let s = 1.0*a!k + 2.0*a!i + 1.0*a!j
> print $ "sum = " ++ show s
> return s


Notice that blur is of the form w a -> m b. So how can we compose these things?

Consider g :: w a -> m b and f :: w b -> m c. Using bind and cobind we can 'lift' the head and tail of these functions:

bind g :: w a -> w (m b) and cobind f :: m (w b) -> m c.

We can almost compose these functions. But the catch is that we need a function from w (m b) to m (w b). Conveniently, there's a prototypical example of such a thing in the ghc libraries, the function sequence which distributes any monad over []. As we can easily move back and forth between arrays and lists we can write:


> class Distributes m w where
> distribute :: w (m a) -> m (w a)

> instance (Monad m,Ix i) => Distributes m (Pointer i) where
> distribute (P i ma) = do
> let bds = bounds ma
> a <- sequence (elems ma)
> return $ P i (listArray bds a)


And now we can compose these funny dual-sided Kleisli/coKleisli arrows. But before writing a composition function, note what we have. We're looking at functions that can make a-ish things to b-ish things, and we can compose them like functions directly from a to b. This is precisely the design pattern captured by arrows. So we can write:


> data A m w a b = A { runA :: w a -> m b }

> instance (Distributes m w,Monad m,Comonad w) => Arrow (A m w) where
> A g >>> A f = A $ bind f . distribute . cobind g
> first (A f) = A $ \x -> do
> u <- f (fmap fst x)
> return $ (u,coreturn (fmap snd x))
> pure f = A $ return . f . coreturn


Now check out page 1323 of Signals and Comonads by Uustalu and Vene. Honestly, I found that paper after I wrote the above code. It's almost line for line identical!

Anyway, now we can try an example. First some data to work on:


> x = P 0 $ listArray (0,4) (map return [0.0..4.0])


This is like last post's except that as I've now used a non-normalised blur operation I can show how arrow notation makes it easy to fix that:


> g = proc a -> do
> b <- A blur -< a
> n <- A blur -< 1
> returnA -< a-b/n


That's a very simple (and inefficient) high pass filter by the way. At the end of the day, however, we still need to be able to lift our arrows to act on x:


> liftCM :: (Distributes m w, Monad m, Comonad w) => A m w a b -> w (m a) -> w (m b)
> liftCM (A f) = cobind (\x -> distribute x >>= f)

> y = liftCM g x


And convert the result to something we can look at:


> result = sequence (toList (Main.array y))


It might not be quite the result you expected because we see a lot of I/O. Each computation that is tainted with I/O will carry that I/O with it. So every bit of I/O that was computed by any value that that went into the final result remains hanging round its neck until the bitter end, with duplications if the value was duplicated.

Of course there are lots of other monad/comonad pairs to try, so maybe there are some other interesting combinations lurking around.

Exercise: liftCM lifts an arrow A m w a b to a function w (m a) -> w (m b). You can also implement a lift to m (w a) -> m (w b). Implement it and figure out what it does.

Blog Archive

About Me