Means, Metrics and Matrices | Rajendra Bhatia | Distinguished Lecture Series

https://www.youtube.com/watch?v=L1iVQ86fl4c

[03:43] Good evening everybody.
[03:45] I'm so happy I'm This is Shamra Chudri.
[03:48] I'm very very happy to see so many people um assembled here um for this talk um on a um kind of Monday afternoon um after a long day.
[04:00] Thank you very much for coming.
[04:03] Uh I wanted to make an announcement about uh the general uh occasion.
[04:08] The occasion is that we are this is the first of hopefully a series of lectures that we are starting at Ashoka University um given by our uh seniormost professors.
[04:24] uh we have in our university four distinguished professors and uh there will be I hope more to come and uh also of course we have a very large number of senior professors and we thought that we will start a series of talks which would be given for not the
[04:43] talks which would be given for not the general public but the general um general public but the general um academic public of the university and academic public of the university and we'll invite people from outside the university at other universities ities.
[04:51] to attend from um these professors on topics that would be by not just people from that discipline but a more general um audience and this is the first of its kind today we are starting off and so it's I'm very happy to um to see that you are here for that occasion.
[05:17] where we start a new venture uh at the university we're going to recorded these uh talks and these will be um later on put on our um our YouTube channel for people to um actually hear this talk later on as well.
[05:32] So um uh and and currently I'm told this is also being livereamed on YouTube.
[05:38] I'm very very privileged that the first of this talk
[05:44] privileged that the first of this talk is being given by our distinguished.
[05:47] is being given by our distinguished professor professor Rajendra Bhya who um.
[05:51] professor professor Rajendra Bhya who um has been um.
[05:54] has been um a senior professor at the at our.
[05:57] a senior professor at the at our mathematics department for a long time.
[05:59] mathematics department for a long time and before that as all of you know he um.
[06:05] and before that as all of you know he um uh was uh for most of his career at the.
[06:07] uh was uh for most of his career at the Indian statistical institute.
[06:09] Indian statistical institute. uh before that having done his PhD at the ISI he.
[06:13] that having done his PhD at the ISI he went off and um as a research fellow at.
[06:17] went off and um as a research fellow at the University of California in Berkeley.
[06:19] the University of California in Berkeley also at the Tata Institute in Bombay and.
[06:22] also at the Tata Institute in Bombay and then um he worked at the in his.
[06:25] then um he worked at the in his institute um according to various.
[06:28] institute um according to various metrics um he's supposed to be the most.
[06:32] metrics um he's supposed to be the most cited mathematician in uh in India and.
[06:36] cited mathematician in uh in India and uh we're not surprised because we A lot.
[06:39] uh we're not surprised because we A lot of you have uh of course are familiar.
[06:41] of you have uh of course are familiar with his books um his textbooks in.
[06:45] with his books um his textbooks in general algebra matrices and today when I asked him to pick a topic that would appeal to a lot um he came up with a really catchy title and so I'm we're really looking forward to uh professor Bharti's talk.
[07:03] pres is a a fellow of all three nationalmies also the the world academy of sciences and for a scientist working in India.
[07:12] Any prize that you can get I think he has starting from the Bhartagar award and then uh of course uh being the the fellowships of the um of the uh academies uh the the junior award from INSA as well as the JC Bose award u from the uh the government.
[07:34] So um we uh uh are very privileged to start off our series with Professor Bhartia's talk.
[07:43] [applause]
[07:54] Thank you very much.
[07:56] It's a privilege for me to start this series.
[07:59] Uh a few weeks ago I got an email from the vice chancellor that I should give a public lecture.
[08:06] So I was a bit anxious.
[08:08] What does that mean?
[08:09] So I telephoned him and said could you please explain to me what is public.
[08:18] He said I can't do that but I can tell you what is a lecture.
[08:25] So having been so instructed I have to live up to this.
[08:31] I've chosen a topic uh which as we said is catchy.
[08:35] So the first part will be comprehensible to everybody including the public outside the gate and then progressively it'll become a little more mathematical as it has to be.
[08:49] And I start with a puzzle which I've asked quite a few people here.
[08:53] I'm a you would already know about this.
[08:57] This is a shopkeeper Hyani shopkeeper.
[08:59] This is a shopkeeper Hyani shopkeeper who is very honest, scrupulously honest and god-fearing.
[09:06] You see here the traditional weighing balance.
[09:11] So what is the principle of this?
[09:14] You have a fulcrum at the center, two arms of equal length, then two pans at the two ends.
[09:24] And if you place equal weights in these pans, then the beam is horizontal balanced.
[09:31] If one side is heavier, then the beam would tilt to that side.
[09:35] And the shopkeeper show you that they are selling you honestly 1 kilogram.
[09:41] And by showing you that they have balanced the beam.
[09:46] If the two arms of the balance are not equal, then to achieve a balance, you have to put a heavier weight on the shorter side.
[09:54] Only then the balance beam would be horizontal.
[10:01] So here is a shopkeeper in Asavur.
[10:05] So here is a shopkeeper in Asavur who knows his balance is slightly faulty.
[10:10] who knows his balance is slightly faulty and he does not want to make dishonest money.
[10:14] money. So what he decides to do is he says he will use the two pens alternatively.
[10:17] So what he decides to do is he says he will use the two pens alternatively.
[10:20] First he'll put the weight on the left and the good sugar on the right and then he will change them.
[10:23] First he'll put the weight on the left and the good sugar on the right and then he will change them.
[10:25] and the good sugar on the right and then he will change them. So he doesn't mind if one of the customers gets slightly less or slightly more but he wants to be honest to God.
[10:28] So he doesn't mind if one of the customers gets slightly less or slightly more but he wants to be honest to God.
[10:30] if one of the customers gets slightly less or slightly more but he wants to be honest to God.
[10:33] honest to God. He doesn't want to make extra money.
[10:34] He doesn't want to make extra money.
[10:37] extra money. So the question is is does he gain lose or come out even?
[10:41] is does he gain lose or come out even?
[10:45] Now it is interesting. I have asked this puzzle from several people, bank officers, police officers, mathematicians and almost always in the first instance says it'll he'll be fine.
[10:47] Now it is interesting. I have asked this puzzle from several people, bank officers, police officers, mathematicians and almost always in the first instance says it'll he'll be fine.
[10:50] puzzle from several people, bank officers, police officers, mathematicians and almost always in the first instance says it'll he'll be fine.
[10:52] officers, police officers, mathematicians and almost always in the first instance says it'll he'll be fine.
[10:54] mathematicians and almost always in the first instance says it'll he'll be fine.
[10:57] and almost always in the first instance says it'll he'll be fine.
[10:59] says it'll he'll be fine. He'll be balanced. He he will not be losing or gaining anything.
[11:00] He'll be balanced. He he will not be losing or gaining anything.
[11:03] will not be losing or gaining anything.
[11:04] Of course, mathematicians or computer scientists or physicists correct themselves in a little way in little 2 minutes, five minutes.
[11:10] But first reaction of almost everybody seems to be that he is doing fine.
[11:13] Easy.
[11:19] So this is how I first thought of the problem when I was asked this.
[11:24] So suppose one of the arms is twice as long as the other.
[11:29] Of course that's absurd.
[11:31] Everybody would notice that this is very bad balance.
[11:33] But for a mathematician that's okay to start thinking.
[11:37] So then one of the weighings would give him 2 kilograms and the other would give him 1/2.
[11:43] So he'll be dishing out 2 and a half kilograms of the sugar.
[11:47] So he's losing.
[11:51] If one arm was more than twice as long, then already in one of the wings he'll give you more than twice.
[11:55] So he will be losing.
[11:58] So the case left to consider is when one of the arms is longer than the
[12:04] when one of the arms is longer than the other but not as much as twice.
[12:07] It's other but not as much as twice.
[12:07] It's just maybe 2% longer or 5% longer.
[12:11] So now you cannot go on verifying case by case.
[12:13] So there's a flash because you're familiar with some mathematics, some other things.
[12:17] You realize that what you are trying to do is to show that x + 1 /x is bigger than two.
[12:29] So in one of the veins he gets x and in the other one upon x.
[12:32] So the total is more than two and it's equal to two if and only if x is equal to one.
[12:39] So only with a perfect balance you can be honest to god otherwise you cannot be using this procedure.
[12:47] So this seems to be what we'll call a pretentiously maybe a law of nature or law of mathematics.
[12:53] X + 1 /x is always bigger than or equal to 2 and equal to two only when x is equal to 1.
[13:02] Of course, now you have to
[13:05] equal to 1.
[13:05] Of course, now you have to prove that.
[13:07] prove that.
[13:07] So here are three proofs.
[13:11] So here are three proofs.
[13:11] School algebra.
[13:13] School algebra.
[13:13] Follow the steps you see there.
[13:17] X + 1 /X is bigger than or equal to 2 if and only
[13:19] if X² + 1 is bigger than 2X.
[13:30] Okay.
[13:32] Okay.
[13:35] All right.
[13:35] So this is a alge school algebra proof that uh x + 1 /x is always bigger than 2 unless x is equal to 1.
[13:47] Next proof is calculus.
[13:50] Next proof is calculus.
[13:50] This is precisely what calculus is used for.
[13:57] You look at the function x + 1 /x.
[13:57] Then this function has minimum at x= 1.
[14:01] That's the only minimum and the minimum value is two.
[14:03] So that's my second proof.
[14:08] Third proof which is closely related to Third proof which is closely related to what I'm going to talk about is the what I'm going to talk about is the arithmetic geometric mean inequality.
[14:18] arithmetic geometric mean inequality. Given two numbers a and b Given two numbers a and b their arithmetic mean is a plus b /2.
[14:24] their arithmetic mean is a plus b /2. Geometric mean is square root of a b. Geometric mean is square root of a b.
[14:27] And we learned quite early that the And we learned quite early that the arithmetic mean is bigger than the arithmetic mean is bigger than the geometric mean unless again the numbers geometric mean unless again the numbers are equal.
[14:35] are equal. So you had x and 1 /x. So you had x and 1 /x. The geometric mean is one. The The geometric mean is one.
[14:40] The arithmetic mean is always bigger than arithmetic mean is always bigger than one. So x + 1 /x is bigger than two. one. So x + 1 /x is bigger than two.
[14:49] So with this introduction I want to So with this introduction I want to start talking of means. start talking of means.
[14:56] So I have two positive numbers. Think of So I have two positive numbers. Think of them as observations, measurements. them as observations, measurements.
[15:01] Let's say three and seven. So the mean is the average in this case So the mean is the average in this case five. So you may think of this as an five. So you may think of this as an algebra kind of thing that I have done
[15:09] algebra kind of thing that I have done some algebra.
[15:11] some algebra.
[15:13] It's the if you think of those as points on the line.
[15:15] on the line then I'm looking at the midpoint as the beam.
[15:17] So that I'm doing some kind of geometry.
[15:20] geometry or it could be center of mass as in the balance problem.
[15:24] So this is kind of mechanical way of looking at the average.
[15:26] and for this talk it's the last one which is going to play an important role.
[15:33] So given a1 and a2 introduce this function of x x - a1^ 2 + x - a2.
[15:37] then this function is called the dispersion and it's minimized when x is equal to the mean of the two.
[15:46] So this is a statistics way of looking at the mean.
[15:51] Now I can go to more than two points.
[15:58] I have m observations.
[16:00] So I can take a1 plus a2 plus a m / m.
[16:03] I could be in higher dimensional spaces not just on the line.
[16:06] As long as I can
[16:12] not just on the line.
[16:15] As long as I can add things I can define this mean and add things I can define this mean and this uh is also the center of a cluster of observations.
[16:23] the center of a cluster of observations.
[16:27] So sorry the red dot you see in amidst all those the red dot you see in amidst all those dots.
[16:33] So that's the minimizer of the dispersion.
[16:34] Once again look at this function summation norm x - h^ 2.
[16:39] The minimizer of that is attain is the point x is equal to the average.
[16:45] So this is called the arithmetic mean.
[16:50] There are other classical means.
[16:52] In fact the Greeks uh studied 11 different means.
[16:55] Three of them are better known.
[16:58] arithmetic mean, geometric mean and the harmonic mean.
[17:00] So the geometric mean of two positive numbers is the square root of a1 a2.
[17:03] So for Greeks the motivation could have been find the square whose area is equal to
[17:13] find the square whose area is equal to that of a rectangle which has sides a1 and a2.
[17:18] In modern times for example economists studying rates of growth.
[17:21] If the economy grows 20% this year and 10% next year then the mean rate of growth is not going to be 15.
[17:28] It would be the geometric mean of the two.
[17:33] So for M variables, you have to take the product and take the mth root.
[17:38] Then there is the harmonic mean.
[17:44] Take the inverses of the m numbers add them up divide by m and then again take the inverse.
[17:49] So this kind of thing is called the harmonic mean.
[17:53] So for examples when you study Kirchoff's laws in electricity this is the resistance of M parallel resistors.
[18:06] Again an elementary puzzle and I wonder what the answer would be from most of the people.
[18:11] I drove from Delhi to
[18:13] the people.
[18:13] I drove from Delhi to Chandigar at a speed of 120 kilometers per hour and drove back at the speed of 80.
[18:19] per hour and drove back at the speed of 80.
[18:21] 80.
[18:21] What was my average speed?
[18:24] What was my average speed?
[18:24] The likely answer is going to be 100.
[18:28] That's the mean of 120 and 80.
[18:32] But it's not.
[18:34] So why?
[18:36] So one way distance from Delhi to Chandigat is 240 km.
[18:38] Chandigat is 240 km.
[18:40] At 120 I reached there in 2 hours.
[18:44] At 80 I returned in 3 hours.
[18:47] So the average speed is 480 over 5 which is 96 kilometers per hour.
[18:50] kilometers per hour.
[18:50] So not the average as you would expect.
[18:53] as you would expect.
[18:53] So this is the harmonic mean of 120 and 80.
[18:56] The proof that that's the average speed is there in front of you.
[19:00] Then electrical engineers physicists use what is called the root mean square.
[19:02] So you square the m numbers divide that by
[19:15] you square the m numbers divide that by m and then take the square root.
[19:17] So this is also very popular.
[19:19] So there are different kinds of means which people consider for their needs.
[19:27] So a mathematician now wants to generalize.
[19:33] So you have seen harmonic mean, arithmetic mean, root mean square.
[19:39] So it becomes a special case of something like this.
[19:42] You take the tth power of all your observations, divide it by m and then take the tth root or even more generally you take any continuous monoton function apply that to your observations a to a m divide by m and then take the inverse of that.
[20:01] So there's generalization and a further generalization.
[20:05] These are called cologoro means.
[20:10] The geometric mean fits in here.
[20:13] If you take your function as log and its
[20:16] take your function as log and its inverse exponential, then the geometric inverse exponential, then the geometric mean is exponential of log a1 + log a2 +
[20:22] mean is exponential of log a1 + log a2 + log a m / m.
[20:25] log a m / m. So that also fits in here.
[20:30] So this was extending the arithmetic
[20:33] So this was extending the arithmetic idea replacing
[20:37] idea replacing inverse square by different functions.
[20:41] inverse square by different functions. Now I could also extend the geometric
[20:43] Now I could also extend the geometric idea or the statistical idea. I have a
[20:46] idea or the statistical idea. I have a space with some distance.
[20:48] space with some distance. Mathematicians call it a metric space.
[20:51] Mathematicians call it a metric space. That's a space with some distance
[20:53] That's a space with some distance function.
[20:54] function. But maybe no algebra like plus or
[20:57] But maybe no algebra like plus or algebraic operation other than plus for
[21:00] algebraic operation other than plus for example the surface of a sphere.
[21:03] example the surface of a sphere. So if my observations are points on the
[21:06] So if my observations are points on the sphere then I might like an average
[21:09] sphere then I might like an average which is also a point on the sphere.
[21:11] which is also a point on the sphere. So what do you do? You have a distance
[21:13] So what do you do? You have a distance function D. So take d² x a dispersion as
[21:19] So take d² x a dispersion as we had and arg min stands for the point where this function is minimized.
[21:25] So that's the notation arg min.
[21:28] So set up this dispersion function and look for the minimizer of that.
[21:32] You can call that as mean.
[21:35] This was introduced by fresh a.
[21:39] So it's called the fresh a mean.
[21:42] Now you have to be careful.
[21:44] This may not exist.
[21:48] For example, if I have a punctured disc or an annulus, then the would be mean would be the center which is not in the space.
[21:52] So mean may not exist.
[21:55] It may not be unique.
[21:57] For example, if you have two antiportal points, north pole and the south pole, then every point on the equator is a mean.
[22:04] So you need to put conditions on the space as well as on the metric.
[22:13] So that's what I have just said with the exception of two antiportal points for every two points there is unique
[22:20] every two points there is unique jordisic from one to the other and the
[22:23] jordisic from one to the other and the midpoint of that would be the mean.
[22:30] Even for positive numbers
[22:33] Even for positive numbers there are occasions when we consider
[22:35] there are occasions when we consider different distances rather than just a
[22:37] different distances rather than just a minus b.
[22:41] minus b. So for example we are all familiar with the logarithmic scale
[22:44] familiar with the logarithmic scale that distance between a and b is log a
[22:48] that distance between a and b is log a minus log b.
[22:50] minus log b. Now notice that this can also be written
[22:52] Now notice that this can also be written as modulus of log a over b or log b / a.
[22:57] as modulus of log a over b or log b / a. So the recctor scale in seismology,
[23:00] So the recctor scale in seismology, pH scale in chemistry, decibel scale in
[23:03] pH scale in chemistry, decibel scale in acoustics
[23:05] acoustics are all on the logarithmic scale.
[23:11] So what is the mean size of living
[23:13] So what is the mean size of living creatures between a microbe and a
[23:15] creatures between a microbe and a mammoth?
[23:16] mammoth? If you were to think of that just as the
[23:18] If you were to think of that just as the arithmetic mean, it's hopeless. The
[23:20] arithmetic mean, it's hopeless.
[23:22] The microbe is completely negligible compared to the mammoth.
[23:24] And what's the mean size of objects between a molecule and a mountain?
[23:29] So once again you have to take logarithmic scale and the geometric mean which seem more appropriate for such problems.
[23:39] So I have summarized here some distance different distances on positive numbers and the corresponding fresh means they lead to usual distance leads to the arithmetic mean.
[23:52] If you take the distance between a and b to be one upon a minus one upon b you get the harmonic mean.
[23:59] If you take log of a minus log of b you get the geometric mean and so on.
[24:02] So the cologor means and the fer means for which the motivation was slightly different in this case turn out to be the same.
[24:15] This is again for what will follow a little later.
[24:17] If my distance is
[24:21] little later. If my distance is square root of a minus square root of b
[24:24] square root of a minus square root of b then the mean you get is 12 of a + b / 2
[24:27] then the mean you get is 12 of a + b / 2 plus square root of a b. So this is
[24:31] plus square root of a b. So this is another way of generating means. I have
[24:33] another way of generating means. I have taken a mean of means so to say. I have
[24:35] taken a mean of means so to say. I have taken the average of the arithmetic mean
[24:37] taken the average of the arithmetic mean and the geometric mean. And you can do
[24:39] and the geometric mean. And you can do different things like this.
[24:43] So just to give you a little relief here
[24:46] So just to give you a little relief here is a quote from Pascal.
[24:50] is a quote from Pascal. What is man in nature? Nothing in
[24:52] What is man in nature? Nothing in comparison with the infinite and all in
[24:55] comparison with the infinite and all in comparison with nothing. A mean between
[24:57] comparison with nothing. A mean between nothing and everything. I don't know
[25:00] nothing and everything. I don't know what he was thinking of but if you just
[25:03] what he was thinking of but if you just do 0 plus infinity over two you get
[25:05] do 0 plus infinity over two you get infinity. If you do the harmonic mean
[25:08] infinity. If you do the harmonic mean you get zero. It's only it's only if you
[25:11] you get zero. It's only it's only if you take the geometric mean which is the
[25:12] take the geometric mean which is the subject of my talk that that statement
[25:16] subject of my talk that that statement makes sense.
[25:21] Now for matrices. So I think so far it
[25:24] Now for matrices. So I think so far it was for everybody. For what follows you
[25:27] was for everybody. For what follows you need to know a little bit of matrices
[25:29] need to know a little bit of matrices which I believe everybody does in school
[25:33] which I believe everybody does in school and maybe first year here. So a matrix
[25:36] and maybe first year here. So a matrix is a square array of numbers which you
[25:39] is a square array of numbers which you see there.
[25:41] see there. So you can imagine that there are n
[25:43] So you can imagine that there are n objects. The objects could be people,
[25:46] objects. The objects could be people, particles, electrodes and there are
[25:48] particles, electrodes and there are intersections. A I J is in interaction
[25:51] intersections. A I J is in interaction between IT and J object.
[26:00] So just a little bit of history,
[26:03] So just a little bit of history, matrices were introduced
[26:06] matrices were introduced a little before 1850.
[26:08] a little before 1850. People were doing equations and
[26:10] People were doing equations and determinants for a long time.
[26:12] determinants for a long time. But the laws of matrix algebra as we
[26:14] But the laws of matrix algebra as we know now were written down by Arthur Kay
[26:17] know now were written down by Arthur Kay in 1850.
[26:19] in 1850. So this was a big change for
[26:21] So this was a big change for mathematicians
[26:23] mathematicians for two millennia or more they were
[26:26] for two millennia or more they were operating on numbers or geometrical
[26:28] operating on numbers or geometrical objects. Now here was a new kind of
[26:31] objects. Now here was a new kind of object which they got to play with and
[26:34] object which they got to play with and an important fact is that in this system
[26:38] an important fact is that in this system a b is not equal to ba. So matrix
[26:41] a b is not equal to ba. So matrix multiplication
[26:42] multiplication is not commutative.
[26:48] Now the first major application of this
[26:50] Now the first major application of this idea came 75 years later and that's a
[26:55] idea came 75 years later and that's a good way to convince people that leave
[26:56] good way to convince people that leave the mathematicians alone. Don't
[26:59] the mathematicians alone. Don't [laughter]
[27:00] [laughter] don't ask for applications every day. So
[27:04] don't ask for applications every day. So 75 years later Heisenberg
[27:07] 75 years later Heisenberg formulated quantum mechanics. Actually
[27:09] formulated quantum mechanics. Actually he didn't quite know matrices but he got
[27:12] he didn't quite know matrices but he got this idea and then he came to his
[27:14] this idea and then he came to his teacher Max Bourne and colleague Jordan.
[27:17] teacher Max Bourne and colleague Jordan. So they developed they understood that
[27:19] So they developed they understood that what he was doing is matrices.
[27:22] what he was doing is matrices. So this approach to quantum mechanics
[27:24] So this approach to quantum mechanics came to be known as matrix
[27:28] came to be known as matrix matrix mechanics.
[27:30] matrix mechanics. I recommend three books related to this
[27:33] I recommend three books related to this story. Heisenberg has a kind of
[27:36] story. Heisenberg has a kind of autobiographical
[27:38] autobiographical uh book physics and beyond.
[27:42] uh book physics and beyond. Jun has a famous book brighter than a
[27:45] Jun has a famous book brighter than a thousand sons which are sort of
[27:49] thousand sons which are sort of biographies of people involved with the
[27:51] biographies of people involved with the atom bomb and he paints a bit too
[27:54] atom bomb and he paints a bit too romantic a picture of Heisenberg.
[27:58] romantic a picture of Heisenberg. uh the issue is whether he knew whether
[28:00] uh the issue is whether he knew whether he participated in the Nazi project or
[28:03] he participated in the Nazi project or not, whether he was involved in making
[28:05] not, whether he was involved in making the atom bomb for Germany or not. So
[28:08] the atom bomb for Germany or not. So this book says that he wasn't and he
[28:11] this book says that he wasn't and he tried to talk to Bhar and convey that he
[28:13] tried to talk to Bhar and convey that he is away from it but it's not quite the
[28:16] is away from it but it's not quite the case. Copenhagen is a play and then a
[28:19] case. Copenhagen is a play and then a movie based on that theme. Very
[28:22] movie based on that theme. Very interesting.
[28:26] >> [clears throat]
[28:30] >> Okay. Now I have to talk about some
[28:32] >> Okay. Now I have to talk about some special kinds of matrices.
[28:34] special kinds of matrices. A matrix is symmetric if a iig j is
[28:37] A matrix is symmetric if a iig j is equal to a j i. You see examples of such
[28:40] equal to a j i. You see examples of such matrices there. So most matrices in
[28:43] matrices there. So most matrices in physics,
[28:44] physics, statistics, graph theory for various
[28:47] statistics, graph theory for various reasons tend to be symmetric and we have
[28:51] reasons tend to be symmetric and we have a special theory of dealing with such
[28:52] a special theory of dealing with such matrices.
[28:54] matrices. So I will be talking mainly of symmetric
[28:56] So I will be talking mainly of symmetric matrices.
[29:02] Then I want to tell you about
[29:03] Then I want to tell you about correlation matrices.
[29:06] correlation matrices. You have vector observations. There are
[29:08] You have vector observations. There are two vectors U and V. So everybody would
[29:11] two vectors U and V. So everybody would have studied the inner product or the
[29:13] have studied the inner product or the dotproduct U dot V. So it's a measure of
[29:17] dotproduct U dot V. So it's a measure of the angle between these two vectors or
[29:20] the angle between these two vectors or the correlation between them.
[29:23] the correlation between them. So if u and v are orthogonal this would
[29:25] So if u and v are orthogonal this would be zero. If u and v point in the same
[29:28] be zero. If u and v point in the same direction then it would be maximized and
[29:31] direction then it would be maximized and so on.
[29:32] so on. So suppose I have n vectors x1 x2 xn
[29:37] So suppose I have n vectors x1 x2 xn then their correlation matrix
[29:40] then their correlation matrix is the one whose iig j entry is the
[29:43] is the one whose iig j entry is the inner product between x i and xj. That's
[29:46] inner product between x i and xj. That's the correlation between x i and xt.
[29:50] the correlation between x i and xt. So that's important class as you can see
[29:56] So that's important class as you can see positive definite matrices.
[29:58] positive definite matrices. So I have a matrix which is symmetric to
[30:01] So I have a matrix which is symmetric to begin with.
[30:03] begin with. It's said to be positive definite let's
[30:05] It's said to be positive definite let's say if it's a correlation matrix.
[30:08] say if it's a correlation matrix. But for a mathematician that's not
[30:10] But for a mathematician that's not revealing enough.
[30:12] revealing enough. So second definition is the inner
[30:14] So second definition is the inner product u comma a u is positive for all
[30:17] product u comma a u is positive for all vectors u or a is symmetric and all its
[30:20] vectors u or a is symmetric and all its igon values are positive.
[30:22] igon values are positive. So roughly speaking they are like
[30:24] So roughly speaking they are like positive numbers in this analysis.
[30:27] positive numbers in this analysis. So I'll denote them by a bigger than or
[30:29] So I'll denote them by a bigger than or equal to zero. That's positive
[30:31] equal to zero. That's positive semidefinite or positive definite
[30:33] semidefinite or positive definite matrices.
[30:34] matrices. So if you like you can think of them as
[30:36] So if you like you can think of them as just correlation matrices.
[30:40] just correlation matrices. So I have listed
[30:42] So I have listed subjects
[30:44] subjects where all such matrices are positive
[30:47] where all such matrices are positive definite. So co-variance matrices in
[30:50] definite. So co-variance matrices in statistics, density matrix in quantum
[30:53] statistics, density matrix in quantum mechanics,
[30:55] mechanics, quantum information theory, hessens in
[30:57] quantum information theory, hessens in convex analysis, they're all positive
[30:59] convex analysis, they're all positive definite. So if my observations lead to
[31:02] definite. So if my observations lead to positive definite matrices,
[31:05] positive definite matrices, then I may need to average them in
[31:07] then I may need to average them in various contexts. Just as I said, if
[31:09] various contexts. Just as I said, if your observations are on the sphere,
[31:12] your observations are on the sphere, then you want an average which is also
[31:13] then you want an average which is also on the sphere.
[31:16] on the sphere. And I want different kinds of means,
[31:18] And I want different kinds of means, geometric, arithmetic, harmonic for
[31:22] geometric, arithmetic, harmonic for positive definite matrices.
[31:26] positive definite matrices. Now care is needed.
[31:29] Now care is needed. If A and B are positive definite, then
[31:31] If A and B are positive definite, then their product is rarely positive
[31:34] their product is rarely positive definite. So only when a is equal to ba
[31:37] definite. So only when a is equal to ba which as I said for matrices is
[31:39] which as I said for matrices is so-called trivial case you don't get a
[31:41] so-called trivial case you don't get a positive definite matrix
[31:44] positive definite matrix and this taking the square root which we
[31:47] and this taking the square root which we need to do if I'm going to define a
[31:48] need to do if I'm going to define a geometric mean can be a tricky business
[31:51] geometric mean can be a tricky business for matrices.
[31:54] for matrices. So let us say that b is a square root of
[31:56] So let us say that b is a square root of a if b² is equal to a.
[32:02] First question, does every matrix has a
[32:03] First question, does every matrix has a square root?
[32:05] square root? The answer is no.
[32:08] The answer is no. You can check very easily that the first
[32:10] You can check very easily that the first 2x2 matrix written there has no square.
[32:14] 2x2 matrix written there has no square. Then the next disturbing thing
[32:18] Then the next disturbing thing as simple a matrix as the identity
[32:20] as simple a matrix as the identity matrix
[32:21] matrix has infinitely many square roots.
[32:24] has infinitely many square roots. They're written in front of you there.
[32:27] They're written in front of you there. But physically it's easy to think of
[32:28] But physically it's easy to think of this. put a mirror then the reflection
[32:32] this. put a mirror then the reflection reflected back gives you the identity.
[32:34] reflected back gives you the identity. So there are infinitely and those are
[32:36] So there are infinitely and those are reflection matrices in fact which you
[32:37] reflection matrices in fact which you see there. So identity has infinitely
[32:40] see there. So identity has infinitely many square roots. However,
[32:46] many square roots. However, every positive definite matrix has a
[32:48] every positive definite matrix has a unique positive definite square root.
[32:51] unique positive definite square root. So that fact I'm going to use.
[32:58] Okay, suppose a and b are positive
[33:00] Okay, suppose a and b are positive definite. Clearly their arithmetic mean
[33:02] definite. Clearly their arithmetic mean is a plus b over two. Harmonic mean is
[33:05] is a plus b over two. Harmonic mean is what you say written there. Geometric
[33:07] what you say written there. Geometric mean leads to problems. I can neither do
[33:10] mean leads to problems. I can neither do square root of a * square root of b
[33:12] square root of a * square root of b because that's not a positive definite
[33:14] because that's not a positive definite matrix and I cannot do square root of a
[33:17] matrix and I cannot do square root of a either. [snorts] So
[33:21] either. [snorts] So there's a problem.
[33:24] there's a problem. What is a good matrix version of square
[33:27] What is a good matrix version of square root of AB? Of course, now good
[33:31] root of AB? Of course, now good means various things.
[33:34] means various things. There would be some theoretical
[33:35] There would be some theoretical requirements which it must satisfy.
[33:38] requirements which it must satisfy. It should be useful in proving theorems
[33:41] It should be useful in proving theorems within mathematics first and then it
[33:44] within mathematics first and then it should have applications in other
[33:45] should have applications in other subjects like physics, medicine,
[33:47] subjects like physics, medicine, engineering.
[33:48] engineering. So here is an example
[33:52] So here is an example in MRI scans.
[33:55] in MRI scans. This A is a diffusivity matrix.
[33:59] This A is a diffusivity matrix. You give a die in the brain and then you
[34:01] You give a die in the brain and then you chase where it is going and you get a
[34:04] chase where it is going and you get a matrix
[34:06] matrix and determinant of that matrix is the
[34:08] and determinant of that matrix is the volume of water in a brain tissue.
[34:11] volume of water in a brain tissue. So let's uh accept that that's what the
[34:13] So let's uh accept that that's what the doctors or scanners are telling us.
[34:17] doctors or scanners are telling us. Now as you know one scan will not give
[34:19] Now as you know one scan will not give you complete information. So you may
[34:23] you complete information. So you may like to average the information you have
[34:24] like to average the information you have got from two or 20 scans.
[34:29] got from two or 20 scans. So look at the simple example which I
[34:30] So look at the simple example which I have given there. If a is that matrix 21
[34:33] have given there. If a is that matrix 21 1 and b is 1 1 2 both have determinant
[34:37] 1 and b is 1 1 2 both have determinant one. But if you take the average in the
[34:41] one. But if you take the average in the sense of arithmetic mean a plus b / 2
[34:44] sense of arithmetic mean a plus b / 2 then the determin of determinant of that
[34:46] then the determin of determinant of that is 5 / 4 which is bigger than both
[34:49] is 5 / 4 which is bigger than both determinants.
[34:51] determinants. So this is called the swelling
[34:53] So this is called the swelling phenomenon in the MRI literature.
[34:56] phenomenon in the MRI literature. You have two normal scans and you
[34:58] You have two normal scans and you average them you get an abnormal thing
[35:01] average them you get an abnormal thing happening there. So the MRI people
[35:05] happening there. So the MRI people request you give us a mean which doesn't
[35:07] request you give us a mean which doesn't have this problem.
[35:12] So what are the reasonable conditions if
[35:14] So what are the reasonable conditions if which a mean should satisfy? I have two
[35:17] which a mean should satisfy? I have two numbers A and B
[35:19] numbers A and B then which are positive. So mean should
[35:22] then which are positive. So mean should be also a positive number. It should be
[35:24] be also a positive number. It should be symmetric.
[35:26] symmetric. It should be between the minimum and the
[35:28] It should be between the minimum and the maximum.
[35:30] maximum. Condition four. If I multiply both the
[35:32] Condition four. If I multiply both the numbers by alpha a positive number then
[35:36] numbers by alpha a positive number then the mean should also get multiplied by
[35:37] the mean should also get multiplied by that
[35:39] that condition five if I increase one of the
[35:41] condition five if I increase one of the arguments then the mean should also
[35:44] arguments then the mean should also increase these are obvious natural
[35:46] increase these are obvious natural requirements you can have more it should
[35:48] requirements you can have more it should be a continuous function it should be a
[35:49] be a continuous function it should be a concave function I haven't written those
[35:52] concave function I haven't written those so these everybody can see are the
[35:54] so these everybody can see are the requirements for a mean of numbers
[35:58] requirements for a mean of numbers for matrices I have to have analogs of
[36:00] for matrices I have to have analogs of All of those plus a few more to take
[36:03] All of those plus a few more to take into account the non commutativity of
[36:06] into account the non commutativity of the multiplication
[36:11] among these if I want to call something
[36:14] among these if I want to call something geometric mean then what are the
[36:16] geometric mean then what are the properties it should have
[36:19] properties it should have first one if a and b commute it should
[36:21] first one if a and b commute it should become the ordinary square root of ab
[36:23] become the ordinary square root of ab that's the way it should behave for
[36:25] that's the way it should behave for numbers it should be between the
[36:26] numbers it should be between the harmonic mean and the arithmetic mean
[36:30] harmonic mean and the arithmetic mean The geometric mean of the inverse of a
[36:32] The geometric mean of the inverse of a and b should be the inverse of the
[36:34] and b should be the inverse of the geometric mean.
[36:37] geometric mean. Determinant of the geometric mean should
[36:38] Determinant of the geometric mean should be square root of that a b etc. So you
[36:41] be square root of that a b etc. So you can write conditions depending on
[36:44] can write conditions depending on where you want to use it.
[36:53] >> Yes. Thanks.
[36:58] Okay.
[37:00] Okay. [snorts] So here is something
[37:02] [snorts] So here is something spectacular, puzzling, strange, whatever
[37:06] spectacular, puzzling, strange, whatever you want to call it. So people wrote the
[37:08] you want to call it. So people wrote the requirements from the geometric mean for
[37:10] requirements from the geometric mean for the geometric mean and then it turns out
[37:14] the geometric mean and then it turns out the only one or the best one they could
[37:17] the only one or the best one they could think of is this strange looking object
[37:19] think of is this strange looking object here.
[37:20] here. So you can see first multiply B on the
[37:23] So you can see first multiply B on the left and the right by this then take its
[37:26] left and the right by this then take its square root etc. The only thing you can
[37:28] square root etc. The only thing you can see from here is that if a and b commute
[37:31] see from here is that if a and b commute then this is just square root of a.
[37:35] then this is just square root of a. And as is usual most of these things are
[37:37] And as is usual most of these things are first thought of by electrical engineers
[37:38] first thought of by electrical engineers or physicists mathematicians enter the
[37:41] or physicists mathematicians enter the scheme later. So this is in a paper on
[37:45] scheme later. So this is in a paper on mathematical physics around 1975.
[37:49] mathematical physics around 1975. Later people learned that engineers had
[37:51] Later people learned that engineers had already done this. This is the solution
[37:55] already done this. This is the solution of this equation which you see here.
[37:57] of this equation which you see here. Once again the intuition is if these
[37:59] Once again the intuition is if these were numbers then you can take this a
[38:01] were numbers then you can take this a inverse to the other side it becomes x²
[38:04] inverse to the other side it becomes x² is equal to a and this is the square
[38:06] is equal to a and this is the square root. Of course in non-commutivity
[38:09] root. Of course in non-commutivity things are not that simple. So that is
[38:12] things are not that simple. So that is the geometric mean of two matrices
[38:15] the geometric mean of two matrices studied for a long time and there's a
[38:18] studied for a long time and there's a problem for nearly 30 years. How do you
[38:22] problem for nearly 30 years. How do you extend it to matrices?
[38:25] extend it to matrices? It's already bad enough for two.
[38:28] It's already bad enough for two. So there is no obvious way that you
[38:30] So there is no obvious way that you could extend it to M.
[38:38] So [snorts] let me first demystify that
[38:40] So [snorts] let me first demystify that formula.
[38:42] formula. For many years I studied these and never
[38:44] For many years I studied these and never understood. Nobody explained in their
[38:45] understood. Nobody explained in their papers how they had got the idea. It
[38:47] papers how they had got the idea. It would be as if something has been given
[38:49] would be as if something has been given to some person by God that you have this
[38:52] to some person by God that you have this formula.
[38:55] formula. So how would anybody think of this?
[39:00] So how would anybody think of this? So as I told you one of the exams for a
[39:02] So as I told you one of the exams for a mean of positive numbers
[39:05] mean of positive numbers would be if you multiply both the
[39:07] would be if you multiply both the numbers by alpha then the
[39:09] numbers by alpha then the [clears throat] mean also gets
[39:10] [clears throat] mean also gets multiplied by alpha.
[39:12] multiplied by alpha. Now here is a matrix analog of this.
[39:16] Now here is a matrix analog of this. I have to introduce not just scalers but
[39:18] I have to introduce not just scalers but also matrices. The analog is
[39:21] also matrices. The analog is you take xar ax and xar bx
[39:26] you take xar ax and xar bx then the mean of that should be xar *
[39:29] then the mean of that should be xar * mean of a * x.
[39:32] mean of a * x. So suppose you grant this as a natural
[39:34] So suppose you grant this as a natural requirement on matrices. So on the next
[39:37] requirement on matrices. So on the next slide you'll see this symbolically I'm
[39:39] slide you'll see this symbolically I'm denoting xar* a x by gamma x applied to
[39:43] denoting xar* a x by gamma x applied to a.
[39:45] a. So
[39:49] I'm given two matrices A and B positive
[39:51] I'm given two matrices A and B positive matrices
[39:53] matrices apply gamma A to the power min - 1/2 to
[39:57] apply gamma A to the power min - 1/2 to both then this pair becomes the identity
[40:00] both then this pair becomes the identity time A minus/ B A minus/ now one of the
[40:04] time A minus/ B A minus/ now one of the matrices is identity
[40:07] matrices is identity so there's no choice if I have to take
[40:09] so there's no choice if I have to take the geometric mean it has to be the
[40:10] the geometric mean it has to be the square root of this and then I undo what
[40:14] square root of this and then I undo what I have done above
[40:15] I have done above there were gamma a to the power minus
[40:17] there were gamma a to the power minus 1/2 fine I do gamma a to the power half
[40:20] 1/2 fine I do gamma a to the power half then I get this formula
[40:22] then I get this formula so if you want that condition then this
[40:26] so if you want that condition then this is the only possibility
[40:28] is the only possibility not just the best possibility it's the
[40:30] not just the best possibility it's the only one so the person
[40:34] only one so the person >> huh
[40:34] >> huh >> you could do
[40:35] >> you could do >> yes you can check again miraculously
[40:38] >> yes you can check again miraculously it's symmetric in a and b though it
[40:39] it's symmetric in a and b though it doesn't look like that
[40:43] doesn't look like that so now this idea again does not extend
[40:45] so now this idea again does not extend to three or more matrices. If I had a b
[40:48] to three or more matrices. If I had a b c first one will become identity then b
[40:51] c first one will become identity then b and c would be different.
[40:53] and c would be different. So there is no way of extending this to
[40:55] So there is no way of extending this to three
[41:00] clock here.
[41:02] clock here. I was looking for a clock.
[41:18] Yeah. So this was an open problem for 30
[41:21] Yeah. So this was an open problem for 30 years
[41:24] years till again
[41:26] till again a surprising connection with Romanian
[41:28] a surprising connection with Romanian geometry
[41:30] geometry was found.
[41:33] was found. So I would recall for you the
[41:35] So I would recall for you the logarithmic distance
[41:37] logarithmic distance for numbers.
[41:39] for numbers. I looked at the distance log a minus log
[41:42] I looked at the distance log a minus log b which is also log of a over b log of b
[41:46] b which is also log of a over b log of b / a with a modus sign. You pick up a
[41:49] / a with a modus sign. You pick up a minus which is killed by the modulus.
[41:52] minus which is killed by the modulus. A possible matrix version of this would
[41:54] A possible matrix version of this would be take the norm of log a minus log b.
[41:57] be take the norm of log a minus log b. What is the norm? It's written there.
[41:59] What is the norm? It's written there. That's just the distance on matrices.
[42:04] That's just the distance on matrices. This is useful but somewhat prosaic. So
[42:06] This is useful but somewhat prosaic. So as I said the crux of matrix
[42:10] as I said the crux of matrix multiplication is not commutative.
[42:14] multiplication is not commutative. So
[42:16] So instead of log a minus log b suppose I
[42:20] instead of log a minus log b suppose I change the picture I look up look at
[42:23] change the picture I look up look at this as log of b / a
[42:27] this as log of b / a and then I generalize to positive
[42:29] and then I generalize to positive matrices
[42:31] matrices then the formula would be log of a to
[42:33] then the formula would be log of a to the power min -2 b a to the power minus2
[42:37] the power min -2 b a to the power minus2 and that makes a world of a difference
[42:40] and that makes a world of a difference it opens up a connection
[42:42] it opens up a connection with remanion geometry
[42:45] with remanion geometry What is the manion geometry in three
[42:47] What is the manion geometry in three slides or four four slides I have a
[42:50] slides or four four slides I have a manifold? Imagine the surface of the
[42:52] manifold? Imagine the surface of the sphere.
[42:54] sphere. Then at a point M I have the tangent
[42:56] Then at a point M I have the tangent space to it which is the best linear
[42:59] space to it which is the best linear approximation. So globe and look at the
[43:01] approximation. So globe and look at the plane tangent to the globe. In each of
[43:04] plane tangent to the globe. In each of these tangent spaces I have an inner
[43:06] these tangent spaces I have an inner product.
[43:08] product. So geometries call that inner product
[43:10] So geometries call that inner product the remmanion metric. analysts are a
[43:13] the remmanion metric. analysts are a little uncomfortable with that because
[43:14] little uncomfortable with that because they think of metric as a distance. So
[43:17] they think of metric as a distance. So that's the remmanion metric.
[43:20] that's the remmanion metric. If I have such a metric, I know how to
[43:22] If I have such a metric, I know how to compute the length of curves.
[43:26] compute the length of curves. So you have a curve in the manifold.
[43:29] So you have a curve in the manifold. At any point on the curve, look at the
[43:31] At any point on the curve, look at the tangent vector
[43:33] tangent vector that's in one of these tangent spaces.
[43:36] that's in one of these tangent spaces. In the tangent space, you have the
[43:37] In the tangent space, you have the notion of a length. So just take the
[43:40] notion of a length. So just take the length of this curve gamma to be norm of
[43:43] length of this curve gamma to be norm of the derivative gamma prime t and
[43:45] the derivative gamma prime t and integrate
[43:47] integrate with the catch that the norm is varying
[43:49] with the catch that the norm is varying from point to point.
[43:52] from point to point. Okay, so that's the notion of length of
[43:54] Okay, so that's the notion of length of a curve
[43:56] a curve and in some manifolds not in all
[44:01] and in some manifolds not in all you can find point curves of minimal
[44:03] you can find point curves of minimal length joining A and B for example on
[44:05] length joining A and B for example on the sphere great circles
[44:08] the sphere great circles and the length of such a jasic then
[44:10] and the length of such a jasic then gives you a metric in the sense of
[44:11] gives you a metric in the sense of distance on the manifold.
[44:15] distance on the manifold. So that's remanion geometry in three
[44:16] So that's remanion geometry in three slides.
[44:18] slides. In my case, this manifold is positive
[44:21] In my case, this manifold is positive definite matrices.
[44:23] definite matrices. The tangent space at any point is the
[44:26] The tangent space at any point is the space of hermission matrices
[44:28] space of hermission matrices and inner product on this space
[44:31] and inner product on this space is given by what you see there x comma y
[44:35] is given by what you see there x comma y at the point a is trace a inverse x a
[44:38] at the point a is trace a inverse x a inverse y. By the way, the famous Indian
[44:43] inverse y. By the way, the famous Indian statistician CRA
[44:45] statistician CRA in when he was about 24 years old
[44:49] in when he was about 24 years old introduced this in the case of vectors
[44:52] introduced this in the case of vectors or probability distributions. It's
[44:54] or probability distributions. It's called the Fisher row metric. This is
[44:58] called the Fisher row metric. This is the non-comutative version of that
[45:00] the non-comutative version of that matrix version.
[45:07] Now it turns out that if you calculate
[45:09] Now it turns out that if you calculate the remmanion length according to this
[45:12] the remmanion length according to this uh prescription you get exactly what I
[45:15] uh prescription you get exactly what I wrote as the as the logarithmic
[45:18] wrote as the as the logarithmic distance. So instead of log so let me
[45:20] distance. So instead of log so let me sum up instead of log a minus log b I
[45:23] sum up instead of log a minus log b I looked at log of a over b then I made a
[45:26] looked at log of a over b then I made a matrix version it turns out to be
[45:29] matrix version it turns out to be exactly what you would get by this
[45:30] exactly what you would get by this remanion root and it has some useful
[45:34] remanion root and it has some useful invariance properties
[45:36] invariance properties that the distance between xar ax and xar
[45:40] that the distance between xar ax and xar bx is the same as the distance between a
[45:42] bx is the same as the distance between a and b and it's invariant under
[45:45] and b and it's invariant under inversions that's very useful for
[45:48] inversions that's very useful for And then suddenly a miracle happen and I
[45:50] And then suddenly a miracle happen and I have a personal story to tell about this
[45:52] have a personal story to tell about this which maybe not in this lecture later
[45:56] which maybe not in this lecture later somewhere.
[45:59] somewhere. Then a miracle happens.
[46:02] Then a miracle happens. You want to write down if you are an
[46:04] You want to write down if you are an analyst you don't understand very easily
[46:06] analyst you don't understand very easily what geometries do every day curvature
[46:08] what geometries do every day curvature and remanion metrics and so on. You want
[46:11] and remanion metrics and so on. You want an equation or an inequality which is
[46:14] an equation or an inequality which is even better for an analyst than than an
[46:17] even better for an analyst than than an equation.
[46:19] equation. So [snorts] I want to write down
[46:20] So [snorts] I want to write down explicitly the equation of the jordisk
[46:23] explicitly the equation of the jordisk joining two points a and b
[46:27] joining two points a and b >> that is two operator
[46:30] >> that is two operator that's the in your language probably
[46:34] that's the in your language probably doesn't matter here I mean this the
[46:36] doesn't matter here I mean this the points are
[46:38] points are little different so I try to write the
[46:41] little different so I try to write the equation of a jordic joining two points
[46:45] equation of a jordic joining two points and
[46:47] and it turns out to be what you see there in
[46:52] it turns out to be what you see there in the middle is a to the power min -2 b a
[46:55] the middle is a to the power min -2 b a to the power 12 to the power t is
[46:58] to the power 12 to the power t is varying between 0 and 1 then on two
[47:00] varying between 0 and 1 then on two sides a to the power 1/2 a to the power
[47:02] sides a to the power 1/2 a to the power 1/2 so that mysterious formula
[47:08] 1/2 so that mysterious formula which was the geometric mean between a
[47:10] which was the geometric mean between a and b
[47:12] and b is exactly the midpoint of this joe
[47:17] Right. A miracle as you may realize.
[47:26] So it's geometric mean in a very
[47:28] So it's geometric mean in a very profound sense and it opens up what you
[47:33] profound sense and it opens up what you should do for three or more matrices.
[47:35] should do for three or more matrices. You see as I told you for two the
[47:37] You see as I told you for two the formula is bad enough for two for three
[47:40] formula is bad enough for two for three it's hopeless. So if the
[47:44] it's hopeless. So if the geometric mean of two matrices is the
[47:46] geometric mean of two matrices is the midpoint of this ja set then you can
[47:49] midpoint of this ja set then you can imagine for three matrices I would get
[47:51] imagine for three matrices I would get some kind of triangle and I should look
[47:53] some kind of triangle and I should look for its center or for matrices look for
[47:56] for its center or for matrices look for its bar center.
[47:58] its bar center. So
[48:02] I had this metric delta.
[48:05] I had this metric delta. Look at the
[48:07] Look at the what was I calling that uh
[48:11] what was I calling that uh dispersion. So look at the dispersion
[48:13] dispersion. So look at the dispersion with respect to that metric
[48:16] with respect to that metric and minimize that. Now of course there
[48:18] and minimize that. Now of course there technical work to be done. Minimizer
[48:21] technical work to be done. Minimizer must exist. It must be unique and so on.
[48:23] must exist. It must be unique and so on. But it so happens that that's all true
[48:26] But it so happens that that's all true and its solution to this matrix equation
[48:29] and its solution to this matrix equation which you see there
[48:31] which you see there no explicit formula has been found.
[48:35] no explicit formula has been found. So that's one part of the story.
[48:38] So that's one part of the story. So longstanding problem of defining an
[48:41] So longstanding problem of defining an appropriate geometric mean of several
[48:43] appropriate geometric mean of several positive definite matrices is solved in
[48:45] positive definite matrices is solved in this way.
[48:47] this way. There's a three-way connection between
[48:48] There's a three-way connection between remanion geometry matrix analysis and
[48:51] remanion geometry matrix analysis and applications
[48:53] applications applications in areas like diffusion,
[48:55] applications in areas like diffusion, tensor imaging, radar data, brain
[48:59] tensor imaging, radar data, brain computer interface, image processing,
[49:02] computer interface, image processing, machine learning. Now all those sound
[49:04] machine learning. Now all those sound big words. So what is it? I'll give you
[49:06] big words. So what is it? I'll give you an example. Brain computer interface.
[49:10] an example. Brain computer interface. You have a robotic limb
[49:12] You have a robotic limb and you want to control it by some
[49:14] and you want to control it by some message from your brain. So they put
[49:17] message from your brain. So they put let's say 16 electrodes on your head
[49:19] let's say 16 electrodes on your head skull and you do certain things and it
[49:23] skull and you do certain things and it gives a command via that computer to
[49:25] gives a command via that computer to move the limb this way or that. So your
[49:28] move the limb this way or that. So your data is now 16 by 16 positive definite
[49:32] data is now 16 by 16 positive definite matrices and you have to teach the
[49:34] matrices and you have to teach the computer how to average that data
[49:37] computer how to average that data whether this means left or this means
[49:38] whether this means left or this means right. So it turns out that the
[49:42] right. So it turns out that the appropriate distance or the most useful
[49:44] appropriate distance or the most useful distance for those is exactly what I
[49:47] distance for those is exactly what I showed you.
[49:49] showed you. So there are lots of applications
[49:57] just for fun here is a paper of which I
[50:00] just for fun here is a paper of which I am one of the co-authors in journal
[50:03] am one of the co-authors in journal called brain computer interfaces.
[50:08] called brain computer interfaces. Don't ask me any question on that. I
[50:10] Don't ask me any question on that. I [laughter]
[50:11] [laughter] I just did the mathematics and there are
[50:12] I just did the mathematics and there are other people who did the uh interface so
[50:16] other people who did the uh interface so to say. Here's another paper the average
[50:19] to say. Here's another paper the average I
[50:21] I so it caught my eye so to say but and
[50:24] so it caught my eye so to say but and again as you read the abstract you will
[50:26] again as you read the abstract you will see uh exponential and logarithmic
[50:32] see uh exponential and logarithmic scalings being used. uh so this is a
[50:36] scalings being used. uh so this is a very useful concept in practice in
[50:38] very useful concept in practice in applications and in mathematics
[50:42] applications and in mathematics okay I'll have to tell you a little bit
[50:44] okay I'll have to tell you a little bit more so here is a quote from Wankare
[50:50] more so here is a quote from Wankare one geometry cannot be more true than
[50:53] one geometry cannot be more true than another it can only be more convenient
[50:57] another it can only be more convenient of course the comment was made in some
[50:59] of course the comment was made in some other context but I'm going to talk of
[51:01] other context but I'm going to talk of another metric and hence another
[51:03] another metric and hence another geometry
[51:04] geometry on the space of positive definite mis
[51:07] on the space of positive definite mis and this is to do with some subject
[51:09] and this is to do with some subject called optimal transport which is very
[51:11] called optimal transport which is very important and very popular
[51:15] important and very popular but again looking at it as a child
[51:19] but again looking at it as a child I looked at the distance log a minus log
[51:22] I looked at the distance log a minus log b and tried to make a matrix version of
[51:25] b and tried to make a matrix version of that
[51:26] that now I look at square root of a minus
[51:28] now I look at square root of a minus square root of b again in classical
[51:31] square root of b again in classical statistics this is called helinger
[51:33] statistics this is called helinger distance or bhachara distance and I try
[51:36] distance or bhachara distance and I try to make a matrix version.
[51:39] to make a matrix version. So what would it be? I take the norm of
[51:42] So what would it be? I take the norm of square root of a minus square roo<unk>
[51:44] square root of a minus square roo<unk> of b expand that what you get here is
[51:48] of b expand that what you get here is trace of a + b minus 2 * trace of a to
[51:51] trace of a + b minus 2 * trace of a to the power b to the power 1/2 and then
[51:54] the power b to the power 1/2 and then square root.
[51:56] square root. So
[51:57] So there changing log of a minus log of b
[52:02] there changing log of a minus log of b to log of a divided by b made a lot of
[52:05] to log of a divided by b made a lot of difference and so does it here. Instead
[52:09] difference and so does it here. Instead of a to the power 1/2 b to the power 1/2
[52:12] of a to the power 1/2 b to the power 1/2 I change it to this square root of a to
[52:16] I change it to this square root of a to the power 12 b a to the power 12 and
[52:19] the power 12 b a to the power 12 and that makes a world of a difference. So a
[52:22] that makes a world of a difference. So a b not equal to ba is the heart of matrix
[52:25] b not equal to ba is the heart of matrix analysis, quantum mechanics and many
[52:28] analysis, quantum mechanics and many other subjects.
[52:31] So how
[52:36] so first you have to prove it's a
[52:37] so first you have to prove it's a metric. I will not get into that. Uh
[52:41] metric. I will not get into that. Uh but I want to emphasize that what we are
[52:45] but I want to emphasize that what we are looking at is trace of a + b / 2 minus
[52:49] looking at is trace of a + b / 2 minus trace of this which is again this is the
[52:51] trace of this which is again this is the arithmetic mean of A and B and this is a
[52:53] arithmetic mean of A and B and this is a kind of geometric mean. So arithmetic
[52:56] kind of geometric mean. So arithmetic geometric mean of matrix products is a
[53:00] geometric mean of matrix products is a very crucial idea behind all this. So
[53:02] very crucial idea behind all this. So first you have to prove that this is
[53:04] first you have to prove that this is positive then you have to prove it
[53:06] positive then you have to prove it satisfies the triangle inequality and
[53:08] satisfies the triangle inequality and then you have to use it.
[53:12] Okay. So what is the optimal transport
[53:14] Okay. So what is the optimal transport problem?
[53:16] problem? So you see this arrangement of furniture
[53:18] So you see this arrangement of furniture here. I want to pick up this and put
[53:20] here. I want to pick up this and put somewhere else at minimal cost.
[53:25] somewhere else at minimal cost. So the mathematical formulation is the
[53:27] So the mathematical formulation is the following. You have two probability
[53:28] following. You have two probability distributions which are like mass
[53:31] distributions which are like mass distributions
[53:33] distributions and look at a probability. So those were
[53:35] and look at a probability. So those were on dimensional space RD.
[53:39] on dimensional space RD. Look at a probability distribution on
[53:42] Look at a probability distribution on the product space whose marginalss are
[53:45] the product space whose marginalss are mu and mu. Marginal mean just means pro
[53:47] mu and mu. Marginal mean just means pro projection.
[53:49] projection. Then there's a cost function which is
[53:51] Then there's a cost function which is the cost of moving unit mass from X to
[53:54] the cost of moving unit mass from X to Y. Typically that is norm X - Y square.
[53:58] Y. Typically that is norm X - Y square. Then the problem is to minimize this.
[54:03] Then the problem is to minimize this. So C sorry
[54:09] CXY the cost function integrated with
[54:12] CXY the cost function integrated with respect to this product measure and then
[54:14] respect to this product measure and then minimized over all such product
[54:16] minimized over all such product measures. So that's the problem
[54:20] measures. So that's the problem and
[54:22] and under some conditions it has a unique
[54:24] under some conditions it has a unique solution
[54:26] solution which gives you a metric on the space of
[54:28] which gives you a metric on the space of probability measures. This is called the
[54:30] probability measures. This is called the Waselstein distance or Canarovic
[54:32] Waselstein distance or Canarovic Rubenstein metric booth by the way was a
[54:37] Rubenstein metric booth by the way was a functional analyst who went on to get a
[54:40] functional analyst who went on to get a Nobel prize in economics in 1976.
[54:45] uh
[54:47] uh his main work is in functional analysis.
[54:49] his main work is in functional analysis. He's one of the founders of linear
[54:50] He's one of the founders of linear programming and also optimal allocation
[54:55] programming and also optimal allocation of resources.
[54:56] of resources. this also
[54:59] this also >> perhaps
[55:02] >> perhaps >> perhaps I I don't know I know
[55:04] >> perhaps I I don't know I know kandarovich has functional yeah uh and
[55:08] kandarovich has functional yeah uh and there have been two fields medals in the
[55:10] there have been two fields medals in the work in recent years on for work on
[55:14] work in recent years on for work on optimal transport vani and finali
[55:18] optimal transport vani and finali uh so it's an important subject
[55:21] uh so it's an important subject now I want to look at an important
[55:23] now I want to look at an important special case of this which is the
[55:24] special case of this which is the so-called gaussian is you have two
[55:27] so-called gaussian is you have two random vectors X and Y distributed
[55:31] random vectors X and Y distributed normally distributed with zero A that
[55:33] normally distributed with zero A that just means A and B are covariance
[55:35] just means A and B are covariance matrices of X and Y covariance matrices
[55:38] matrices of X and Y covariance matrices I explained earlier
[55:40] I explained earlier and
[55:42] and that means a I is just the expectation
[55:45] that means a I is just the expectation of X I * XJ problem is minimize the
[55:49] of X I * XJ problem is minimize the expectation of X - Y^ 2 subject to that
[55:53] expectation of X - Y^ 2 subject to that condition so that's a special so-called
[55:56] condition so that's a special so-called Gaussian case of the optimal transport
[55:59] Gaussian case of the optimal transport problem. So for somebody doing matrix
[56:02] problem. So for somebody doing matrix analysis, it's easier to look at this.
[56:09] >> Expectation of norm x - y square. That's
[56:11] >> Expectation of norm x - y square. That's the cost of shifting things from x to y.
[56:17] >> Yeah. Yeah.
[56:22] So the answer is the minimum value is
[56:24] So the answer is the minimum value is that metric which you saw there trace of
[56:26] that metric which you saw there trace of the arithmetic mean minus trace of the
[56:29] the arithmetic mean minus trace of the geometric mean and then take the square
[56:31] geometric mean and then take the square root of that whole thing.
[56:34] root of that whole thing. Now if I had done that metric uh from
[56:37] Now if I had done that metric uh from the Romanian geometry perspective then
[56:40] the Romanian geometry perspective then it's natural to think of this also
[56:41] it's natural to think of this also somehow force a remmanion structure on
[56:43] somehow force a remmanion structure on this to get more results.
[56:47] this to get more results. So you can do that. There are
[56:50] So you can do that. There are differences. That was a manifold of
[56:52] differences. That was a manifold of negative curvature. This is a manifold
[56:53] negative curvature. This is a manifold of positive curvature. So there are
[56:55] of positive curvature. So there are different ways of doing these things. Uh
[57:04] so a connection between the two problems
[57:08] so a connection between the two problems that earlier geometric mean the Romanian
[57:10] that earlier geometric mean the Romanian mean which I wrote about take the mean
[57:12] mean which I wrote about take the mean of a inverse and b the optimal transport
[57:16] of a inverse and b the optimal transport plan is x comma tx which means a
[57:18] plan is x comma tx which means a material at x should go to the point tx
[57:21] material at x should go to the point tx where t is given by that. So the two
[57:23] where t is given by that. So the two problems are connected
[57:27] again. Now you can ask about jordis x
[57:32] again. Now you can ask about jordis x several variable versions uh and so on.
[57:36] several variable versions uh and so on. This is called multi marginal optimal
[57:38] This is called multi marginal optimal transport problem. So all those things
[57:40] transport problem. So all those things can be done
[57:42] can be done in quantum information theory. A and b
[57:45] in quantum information theory. A and b are density matrices or states. A
[57:47] are density matrices or states. A density matrix is just a positive
[57:49] density matrix is just a positive definite matrix whose trace is equal to
[57:52] definite matrix whose trace is equal to one. So that's the analog of a
[57:54] one. So that's the analog of a probability vector
[57:56] probability vector and numbers positive which sum up to
[57:58] and numbers positive which sum up to one. So then this object is called the
[58:01] one. So then this object is called the fidelity between those two states and
[58:04] fidelity between those two states and one has to study various properties is a
[58:06] one has to study various properties is a very very active area of research. I'll
[58:09] very very active area of research. I'll stop here.
[58:11] stop here. Thank you.
[58:13] Thank you. [applause]
[58:21] That's absolutely wonderful. Thank you
[58:22] That's absolutely wonderful. Thank you so much. President, will you take some
[58:24] so much. President, will you take some questions from the audience? Yeah, sure.
[58:25] questions from the audience? Yeah, sure. If you mean um are there any questions?
[58:30] If you mean um are there any questions? >> Yes. To begin with,
[58:32] >> Yes. To begin with, >> can we talk about a rotation matrix?
[58:35] >> can we talk about a rotation matrix? >> Yes. Yes. Yes. Right.
[58:37] >> Yes. Yes. Yes. Right. >> Yes. See, there are problems like this.
[58:39] >> Yes. See, there are problems like this. So, let's look at the simplest case
[58:41] So, let's look at the simplest case sphere.
[58:44] sphere. So I have two points on this sphere. As
[58:48] So I have two points on this sphere. As long as they are not north pole and
[58:50] long as they are not north pole and south pole the unique midpoint
[58:53] south pole the unique midpoint but now there are infinitely many. So
[58:55] but now there are infinitely many. So you have to put some restriction.
[58:58] you have to put some restriction. So what part of the sphere you can take.
[59:02] So what part of the sphere you can take. So as long as it has any equator same
[59:06] So as long as it has any equator same problem will arise.
[59:08] problem will arise. So if you take the upper half of the
[59:11] So if you take the upper half of the sphere then you have to prove now given
[59:13] sphere then you have to prove now given any n points there is unique barry
[59:16] any n points there is unique barry center.
[59:17] center. Now rotation matrices same kind of
[59:19] Now rotation matrices same kind of problems will arise. So there are
[59:22] problems will arise. So there are restrictions to be placed but a barry
[59:25] restrictions to be placed but a barry center can be defined found and and used
[59:28] center can be defined found and and used >> in close form like
[59:31] >> in close form like >> I'll have to check one. Yeah.
[59:36] >> Can I ask you one thing? I mean I was
[59:37] >> Can I ask you one thing? I mean I was very uh interested in the um NMR the MRI
[59:42] very uh interested in the um NMR the MRI example you gave.
[59:43] example you gave. >> Yeah.
[59:44] >> Yeah. >> Where by [clears throat] averaging the
[59:46] >> Where by [clears throat] averaging the images one can get a swelling.
[59:49] images one can get a swelling. >> Yes. It is not really there but you
[59:52] >> Yes. It is not really there but you interpreted that there is a swelling.
[59:53] interpreted that there is a swelling. >> That's right. And and that's because
[59:54] >> That's right. And and that's because it's using geometric means there.
[59:56] it's using geometric means there. >> No no no geometric mean avoids that.
[59:58] >> No no no geometric mean avoids that. >> Ah
[59:58] >> Ah >> we saw that I had
[01:00:00] >> we saw that I had >> by taking a straight mean.
[01:00:01] >> by taking a straight mean. >> Yes. Yes. And is that because of the
[01:00:03] >> Yes. Yes. And is that because of the nature of the of the relation?
[01:00:04] nature of the of the relation? >> Determinant of a plus b / 2 is not
[01:00:06] >> Determinant of a plus b / 2 is not determinant of a plus determinant of b
[01:00:08] determinant of a plus determinant of b /2. Determinant is
[01:00:11] /2. Determinant is >> not linear. You can't do that. But if
[01:00:13] >> not linear. You can't do that. But if you take the average to mean the this
[01:00:15] you take the average to mean the this geometric mean then it works.
[01:00:20] geometric mean then it works. >> That's one of the reasons they like it.
[01:00:22] >> That's one of the reasons they like it. >> But there are there are others. There
[01:00:24] >> But there are there are others. There could be others.
[01:00:26] could be others. >> Anybody else?
[01:00:30] Some of these things my students are
[01:00:32] Some of these things my students are learning. So I'm very happy that
[01:00:34] learning. So I'm very happy that >> I'm glad they are here. Now of course in
[01:00:36] >> I'm glad they are here. Now of course in machine learning you didn't actually
[01:00:38] machine learning you didn't actually pursue the machine learning part of it
[01:00:40] pursue the machine learning part of it because these distances are very very
[01:00:42] because these distances are very very important. I I don't know much about it
[01:00:44] important. I I don't know much about it but yes
[01:00:45] but yes >> so [clears throat] kernels in machine
[01:00:46] >> so [clears throat] kernels in machine learning would be these positive
[01:00:48] learning would be these positive definite matrices
[01:00:50] definite matrices >> but the metric distances are very
[01:00:52] >> but the metric distances are very important in almost
[01:00:52] important in almost >> all
[01:00:57] any other questions
[01:01:01] any other questions again thank you very much for coming
[01:01:03] again thank you very much for coming everybody thank you [applause]
[01:01:06] everybody thank you [applause] >> I would also like to make the
[01:01:08] >> I would also like to make the announcement that the next lecture in
[01:01:10] announcement that the next lecture in this series is on the last day of the
[01:01:12] this series is on the last day of the semester on 28th of November. It will be
[01:01:15] semester on 28th of November. It will be given by professor Arindam Chakraorti. I
[01:01:18] given by professor Arindam Chakraorti. I haven't got the topic from him yet, but
[01:01:20] haven't got the topic from him yet, but we will announce. Okay. Thank you very
[01:01:22] we will announce. Okay. Thank you very much.

Means, Metrics and Matrices | Rajendra Bhatia | Distinguished Lecture Series

Means, Metrics and Matrices | Rajendra Bhatia | Distinguished Lecture Series

Summary