In Memory of Dr Purchase Mathematics Teacher at Dulwich College 2002-2017
It was with great sadness that the College heard last November that Dr Purchase had died. Dr Purchase joined the College in 2002, having taught briefly at Hills Road Sixth Form College and at Eton College beforehand. Dr Purchase oversaw the writing of this magazine, DC Mathematica, each year, helping organise the articles submitted by boys and staff and arranging the printing in time for Founder’s Day. Dr Purchase’s love of Mathematics was evident to those with whom he came into contact. The subject of his PhD thesis was galactic clusters, solving the n -body problem using computational numerical methods. In his memory, the prize for the best article for DC Mathematica will be named the Purchase Prize from now on. Dr Purchase will be much missed by his colleagues at the College, both in the Mathematics Department and more widely. While many knew Dr Purchase as a Mathematics teacher, anyone in his form would tell you that he was far more than that. In addition to being my form tutor, he was a mentor, especially to me, but I am certain to many others as well and in my time under his tutelage, he taught me far more than just Pure Mathematics. Dr Purchase was always willing to listen to any concerns I had, be they academic or not, and it seemed to me that he would always do his absolute best to provide me with a solution. While he was light years ahead of me in terms of academic knowledge, we always spoke as equals – I never felt talked down to. I know that he will be sorely missed by his students and all those who had the privilege to know him. Dr Purchase was also widely respected by his students, one of whom, Theo Podger (13MF3) writes:
‘The Art of Indifference’ – Boris Ter-Avanesov (12FM)
‘Hawking’s Tea’ – Mr Ottewill
‘Chinese Remainder Theorem’ – Hin Chi Lee (12JLG)
‘The Importance and Role of India in Maths’ – Saajid Khan (7W)
‘The Birthday Paradox’ – James Storey (8R)
‘Music is Maths’ – by Samuel Smith (12AC)
‘Prime Numbers’ – Jeremy Samoogan (7R)
‘The Mechanics of Snowboarding’ – Toby Evans (10P)
‘Parallax: Applications’ – Bryan Tan (11T), Jensen Tong (11F)
‘A Physical Approach to The Basel Problem’ – Jinkai Li (12JAR), Bryan Lu (12JLG)
‘Conway’s Soldiers’ – Lunzhi Shi (11H)
‘Frequency Extracting, Sound Editing and The Fourier Transform’ – Simon Mengzhe Xu (JLG)
Boris Ter-Avanesov, Simon Mengzhe Xu (formatting), Ezana Adugna (design)
The Art of Indifference by Boris Ter-Avanesov (12FM)
This article will discuss the issue of multiple contradictory solutions arising in both pure and applied probability problems as a consequence of even the subtlest of variations in the interpretation of the phrase “at random”. Instrumental to this discussion are the principle of indifference, the principle of maximum ignorance and the principle of transformation groups. I explore some proposed solutions with particular focus on the work of E.T. Jaynes using Bertrand’s Paradox and other similar problems to enunciate some of the central issues in probability theory. In the physical world, we can categorise systems as being deterministic, chaotic or random. With deterministic systems, provided that we can measure the initial conditions with sufficient accuracy, we can predict the evolution of the system at any point in the future. Chaotic systems, though believed to follow deterministic physical laws, are particularly sensitive to the values of their initial conditions and whilst we might be able to predict the evolution of the system quite reliably in the short term, over longer time periods the effects of errors in the measurement of the initial conditions accumulate and give rise to unpredictable behaviour. We use these types of systems as sources of chance because they can be effectively random, such as the flip of a coin, the roll of a dice or future weather patterns. However, in truly random systems no amount of accuracy in the measurement of initial conditions equips us any the better to forecast the state of the system even at the very next instant in time. We apply the concept of chance to situations where we lack enough information to say for certain what the outcome will be, irrespective of which of the three categories the system falls into. When we have absolutely no information then as far as we are aware we are dealing with the third type of system and our best guess is to attribute equal likeliness to all possible outcomes. For this reason we use the uniform distribution or more generally the principle of indifference as the starting point in such analyses. The principle of indifference, sometimes called the principle of insufficient reason, is a very old rule of thumb that has been in use since the earliest writers on probability such as Jacob Bernoulli and Pierre Laplace. It states that if you have n options which are indistinguishable except by name then you assign equal probabilities of 1 to each of them. The principle has had success in both abstract and applied mathematics, for example with James Clerk Maxwell's predictions of the behaviour of gases. Unfortunately there are cases when it seems to lead to incorrect results. Even in deterministic physical systems we are quite prepared to see a small amount of variation in the results of experiments. We do not expect to see variation in the results of purely mathematical investigations. Surely, inconsistency in the answers to mathematical questions is seen as contradictory and hence paradoxical as we imagine these rigorous formal systems to preclude the types of subtle variation that lead to the spread of physical results. However, there are numerous examples in probability in which perfectly valid alternative methods for the same problem give rise to contradictory results.
‘Consider an equilateral triangle inscribed in a circle. Suppose a chord of the circle is chosen at random. What is the probability that the chord is longer than a side of the triangle?’ Bertrand presents three arguments to answer this problem: The ‘random endpoints method’, the ‘random radius method’, and the ‘random midpoint method’, all three of which seem valid yet yield different numerical values.
The random endpoints method entails choosing two random points on the circumference and joining them to form a chord of the circle. Now, the triangle can be rotated so that one of these points coincides with one of its vertices. As visible in figure 1, a chord that goes through the angle at that vertex is longer than a side length whilst a chord that doesn’t is shorter. Since said angle is necessarily 60 ° , the probability of the chord exceeding the sides in length is 60 180 - or more simply, 1 3 . Notice that this is equivalent to picking an angle in the range 0 – 180 that the chord will form with a tangent.
The random radius method requires for a radius of the circle to be drawn and the triangle rotated such that the radius is perpendicular to one of the sides of the triangle. Next, a random chord parallel to the side is to be drawn through the radius. From figure 2 we can see that such a chord is longer than the side if the point where it meets the radius is closer to the centre than the point where the radius intersects the side. The side of the triangle bisects the radius exactly so this corresponds to 1 2 of such chords.
Finally, the random midpoint method involves the following procedure: a
randomly selected point within the circle is selected and the chord that has said point as its midpoint is drawn through the circle. A concentric circle with 1 2 the radius should be drawn as shown in figure 3. If the chosen point lies within the smaller circle, then the chord drawn through it is longer than a side of the triangle. The area of the smaller circle is 1 4 that of the larger circle, which means that the probability of the chord drawn being longer than a side length of the triangle is 1 4 .
Furthermore, a similar problem can be found in ‘Fifty Challenging Problems In Probability’ (Frederick Mosteller, 1965), in which the question asks ‘if a chord is randomly drawn in a circle, what is the probability that its length will exceed the radius?’. This particular problem also has three solutions.
The first method: ‘Assume that the distance of the chord from the centre of the circle is evenly (uniformly) distributed from 0 to r. Since a regular hexagon of side r can be inscribed in a circle, to get the probability, merely find the distance d from the center and divide by radius. Note that this is the altitude of an equilateral triangle of side r. Therefore from plane geometry we get d = √ 2 − 2 4 = √3 2 and consequently, the desired probability is approx. = 0.866’ The second method: ‘Assume that the midpoint of the chord is evenly distributed over the interior of the circle. Consulting the figure again, we see that the chord is longer than the radius when the midpoint of
the chord is within d of the centre. Thus all points in the circle of radius d, concentric with the original circle, can serve as midpoints of the chord. Their fraction, relative to the area of the original circle, is 𝜋 2 𝜋 2 =
= 3 4 = 0.75. This probability is the square of the result we got in method 1.
The third method: ‘Assume that the chord is determined by two points chosen so that their positions are independently evenly distributed over the circumference of the original circle. Suppose that the first point falls at A in this figure. Then, for the chord to be shorter than the radius, the second point must fall on the arc ABC, whose length is 1 3 the circumference. Consequently, the probability that the chord is longer than the radius is 1- 1 3 = 2 3 .’
There are many problems that can give multiple contradictory answers, particularly in geometric probability and the application of probability to physical situations. In some famous problems often only one method is cited when in fact similar ambiguities could arise if various assumptions are made differently. In the problem of Buffon’s Needle, as well as question A6 on the 1992 paper of the notoriously difficult Putnam examination, we must be given extra information as to how the random variable is selected in order for there to be a unique solution. For instance, in the verbatim statement of the Putnam problem –‘Four points are chosen independently and at random on the surface of a sphere (using the uniform distribution). What is the probability that the center of the sphere lies inside the resulting tetrahedron?’- we are told where to apply the uniform distribution and are not given the opportunity to explore other methods for choosing random points which may affect the answer. As previously stated, there is a broad consensus to use the uniform distribution (U) in cases where a preference for any particular outcome does not exist – hence indicating randomness. The problem arises when there are different options for which random variable this is to be applied to. In Bertrand’s Paradox, method 1 assigns a uniform distribution to the angle between the chord and tangent, so we can say 𝜃 ~ U (0 ° , 180 ° ). Method two assigns the uniform distribution to the chord midpoints in a line which forms a radius i.e. L ~ U (0, r) and method three assigns the uniform distribution to midpoints chosen from an area. The fact that these three random variables and their associated sample spaces: angle, length and area, are of different types is the key to understanding the ambiguity. If one of these random variables is chosen to be uniformly distributed it is often the case that the others necessarily can’t be. This is made clearer in the case of another problem. We are asked to draw a random cube with side length (L) somewhere between 3 and 5 cm (so that the surface area is between 54 cm 2 and 150 cm 2 and the volume is between 27 cm 3 and 125 cm 3 ). Most people would assume that there is a uniform distribution over the side length itself, L ~ U (3, 5). For a uniformly distributed random variable X ~ U (a, b) the average or expected value is E(X) = + 2 , which in this case would give a side length of 4 cm. Based on this the average surface area would be 96 cm 2 and the average volume would be 64 cm 3 . However, if we instead assume a uniform distribution over the surface area such that A ~ U (54, 150) the expected value will be 102 cm 2 , but this corresponds to an average side length of 4.123 cm. The problem worsens if instead we assume a uniform distribution over the volume, V ~ U (27, 125), which yields an expected volume of 76 cm 3 giving an average side length of 4.236 cm. Even though length, surface area and volume are functions of each other and so as random variables are wholly dependant, this does not mean that they have the same distribution. In fact, if one of them is uniformly distributed, the other two necessarily cannot be. This should highlight how variations in solutions can arise due to subtle differences in the application of the uniform distribution and therefore in the interpretation of randomness. But how can we know which random variable the uniform distribution should be applied to? The Issue
How Do We Choose?
One proposed solution to this issue is to take the average of all the different numerical solutions obtained by the different methods. This is called a meta-average or universal average, which in the case of Bertrand’s
( 1 2
+ 1 3
+ 1 4
= 13 36 ≈ 0.361 . This seems simple enough but actually leads to even more
paradox would give
questions. Is it problematic that the meta-average is different to any of the individual solutions? What if there are infinitely many solutions so that we can’t find an average? Can we assume that our alternative methods are to contribute equally to the meta-average or should we use a weighted average? Even in this case where we have three unique values, should we be so hasty to give them equal weighting? Casually assigning uniform distributions is the oversight that led us into this quandary, and applying it here, at the level of alternative methods rather than alternative outcomes of random variables, may be particularly hard to justify. And we are no better off if specific solutions arise more frequently than others; should their weighting in the meta-average be increased? It would seem unnatural to count a solution equally if it is more
prevalent, however increasing its weighting suggests that we think it is ‘more correct’, which is surely a problematic concept in mathematics in and of itself. In any case, the fatal flaw of the meta-average is that it is always susceptible to revision unless we can somehow prove that our methods form an exhaustive set. Can we ever be certain that there is no risk of a novel method being found that yields yet another different solution, thus changing the meta-average? Another solution was proposed by Edwin T. Jaynes in his 1973 paper “The Well-Posed Problem”. He says the issue arises because “we have not been reading out all that is implied by the statement of the problem; the things left unspecified must be taken into account just as carefully as the ones that are specified”. He demonstrates how this idea can help us choose between the three possible answers to Bertrand’s paradox as well as place restrictions on which methods can be applied to similar problems. The key is that the three methods discussed by Bertrand give rise to different probability distributions of chord lengths (or distributions of positions of midpoints, since a chord is uniquely defined by its midpoint). It seems obvious that we need to know “which probability distribution describes our state of knowledge when the only information available is that given in the statement of the problem”, but Jaynes’ genius insight is to realise that “if we start with the assumption that Bertrand’s problem has a definite solution in spite of the many things left unspecified, then the statement of the problem automatically implies certain invariance properties, which in no way depend on our intuitive judgments.”
Basically, because the problem does not state the orientation, size or position of the circle, then the correct method must be general enough to give the same answer if these parameters are altered – we need to be able to arrive at the same probability regardless of rotations, enlargements or changing position. We say the solution must be rotation invariant, scale invariant and translation invariant as these are left unspecified in the statement of the problem. Jaynes formulates these ideas mathematically and demonstrates that the probability distribution of chords must be of a certain form in order to meet these three criteria, which is enough to eliminate two of the three possible solutions. His method does not
necessarily show us which distribution is correct, but allows us to eliminate distributions that are definitely wrong (because they violate the indifference criteria). We can also demonstrate this visually by graphing the distributions of chords and their midpoints when generated “randomly” according to each of the three methods. After adjusting for the fact that the centre of the circle is the only midpoint that does not uniquely define a chord (an infinite number of diameters share this midpoint), we see that method 2 is the only one that is both scale invariant and translation invariant. The distribution looks the same if we change the size or location of the circle. Method 3 is only scale invariant and method 1 is neither. This confirms the result from Jaynes’ calculations – that 2 out of the 3 methods lead to chord distributions that are not invariant in the desired ways and therefore must be rejected. Fortunately in this case this leaves only one candidate.
The idea that equivalent problems must yield the same result is what Jaynes calls the principle of maximum ignorance. To Jaynes, problems are equivalent if they only differ in respect to factors not mentioned in the statement of the problem, and we cannot allow the variation of factors not specified in the problem to alter our result - our result needs to be general enough to cover all the different situations the problem could be describing. Jaynes' method for ensuring this via invariance is called the principle of transformation groups (which could be a good topic for a future article). However, his critics, namely Darrell Rowbottom, Nicholas Shackel and Diederik Aerts, note that whilst this method tries to ensure that the principle of indifference i.e. the uniform distribution is used correctly and applied to the right random variable, the principle of maximum ignorance is itself using the principle of indifference – not on the level of random variables, but on the level of equivalent problems. He is effectively using a principle to help ensure the correct use of that same principle. The worry is that any problems associated with applying the principle of indifference on the level of random variables might still persist when used on the higher level of equivalent problems. ‘with most other writers on probability theory that it is dangerous to apply this principle at the level of indifference between events, because our intuition is a very unreliable guide in such matters, as Bertrand’s paradox illustrates. However, the principle of indifference may, in [his] view, be applied legitimately at the more abstract level of indifference between problems; because that is a matter that is definitely determined by the statement of a problem, independently of our intuition.’ It does therefore seem like a less risky application of the principle of indifference, but by no means fool proof – how can you be sure that you have included all the relevant invariances? The statement of a problem will always leave a great many things unstated, and whilst some of them have no bearing on the solution, for example the time of day or the strength of the dollar, there is always a risk that we have not accounted for all of the relevant factors. This bears similarity to the problems faced by scientists in trying to perform a “fair test” when it is hard to guarantee that they are controlling (or keeping constant) all but the independent and dependent variables. ‘on the one hand, one cannot deny the force of arguments which, by pointing to such things as Bertrand’s paradox, demonstrate the ambiguities and dangers in the principle of indifference. But on the other hand, it is equally undeniable that use of this principle has, over and over again, led to correct, nontrivial, and useful predictions. Thus it appears that while we cannot wholly accept the principle of indifference, we cannot wholly reject it either; to do so would be to cast out some of the most important and successful applications of probability theory.’ As previously mentioned, Maxwell was responsible for one of the first great triumphs of kinetic theory in which he was able to predict various macroscopic properties of gases such as viscosity, thermal conductivity, diffusion rates etc. from information that seemed inadequate to determine these states uniquely. He ‘was able to predict all these quantities correctly by a ‘pure thought’ probability analysis which amounted to recognizing the ‘equally possible’ cases.’ Because Maxwell’s theory leads to testable predictions, the question of him applying the uniform distribution correctly does not belong to the realm of philosophy but to the realm of verifiable fact. Whilst this is offered as a defense by Jaynes, it also demands that these successes be explained by Jaynes’ new principles. New theories must to be able to explain the past successes of old theories. Jaynes acknowledges this, saying that he agrees A second reassurance offered by Jaynes is that
To this end Jaynes suggests
“that the cases in which the principle of indifference has been applied successfully in the past are just the ones in which the solution can be “reverbalized” so that the actual calculations used are seen as an application of indifference between problems, rather than events.”
This reverbalising is a most important “higher level problem”, that of how to form questions that are “well- posed”. If we had criteria for this then we should be able to distinguish what types of problems are susceptible to Jaynes’ principles and methods. So Bertrand’s paradox has led us to a much deeper problem; how to “well-pose” problems and how to determine which types of problems are well-posable. This is what I believe Jaynes means by the “Well-Posed Problem”. If this is solved then via Jaynes’ principles we may have a method that, whilst not guaranteeing that it produces the precise probability distributions, at least provides a method for finding distributions that are not definitely wrong, which is a step away from this century old paradox.
Bibliography: Bertrand, J. Calcus des Probabilités. Jaynes, E. T. The Well-Posed Problem. Stewart, I. Does God Play Dice? Rowbottom, D. P. Bertrand’s Paradox Revisited: Why Bertrand’s ‘Solutions’ Are All Inapplicable Mosteller, F. Fifty Challenging Problems in Probability Aerts, D. Solving the Hard Problem of Bertrand’s Paradox https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)#Further_reading https://en.wikipedia.org/wiki/Principle_of_indifference
Hawking’s Tea by Mr Ottewill
Stephen Hawking, who died earlier this year, is rightly judged to have been one of the greatest theoretical physicists of the twentieth century, if not of all time. He came up with pioneering ideas in cosmology, not least on fundamental questions such as the start of the universe, as well as making great strides in combining general relativity with quantum mechanics.
An autobiography of Hawking called ‘Stephen Hawking – A Life in Science’ by Michael White and John Gribbin, relates the following anecdote from a fellow student at his high school in St Albans:
One particular example of Stephen’s highly developed insight left a lasting impression on John McClenahan. During a sixth-form physics lesson, the teacher posed the question, “If you have a cup of tea, and you want it with milk and it’s far too hot, does it get to a drinkable temperature quicker if you put the milk in as you pour the tea, or should you allow the tea to cool down before adding the milk?” While his contemporaries were struggling with a muddle of concepts to argue the point, Stephen went straight to the heart of the matter and almost instantly announced the correct answer: “Ah! Milk in first, of course,” and then went on to give a thorough explanation of his reasoning: because a hot liquid cools more quickly than a cool one, it pays to put the milk in first, so that the liquid cools more slowly. This article considers what some of the ‘muddle of concepts’ that his contemporaries might have been looking at are, i.e. what might be supposed to be the ‘standard’ way to answer the question. In doing this we uncover a potential counter argument to Hawking’s insight.
The main physics concepts needed to tackle the problem are:
(1) The formula for the temperature of a mixture formed by combining two substances with different temperatures is: