Optimization For The People
Don't worry! I don't expect you to understand this video just yet. Just trying to arouse curiosity!
Optimization From Early High School To Multivariable Calculus
Intro (Feel Free to Skip if you want to jump right in)
I am setting out with the goal of introducing some basic concepts in optimization to math enthusiasts across a broad range of ages and knowledge levels. As such, my hope is to pose a problem and explore several ways to think about solving this problem, building up in complexity but hopefully at a natural pace. I might, somewhat arbitrarily and for lack of better nomenclature, break some of these approaches down as the early high school approach, the mid high school approach, the calculus approach, and the multi-variable calculus approach. If I have done my job at the end, I will not simply have given a bunch of approaches which can only be understood if you are at or above the associated level, but rather, my goal is the explanations evolve in such a way that if you are at the 9th grade level, you not only understand the 9th grade approach, but this helps you begin to appreciate the next level, maybe even the ones after that. Conversely, I hope that a student who has studied more advanced math might read this and appreciate that the intuition that drives some of the abstract ideas they have seen in a class like multi-variable calculus can be associated with some elegant and less intimidating concepts. I will also add that I do not wish to diminish the early high school approach in comparison to the others. The simplicity or name of an approach does not indicate how “smart” it is. In fact, the inspiration for this post came from the elegance of the “simplest” approaches.
This is still a work in progress and I am not an experienced teacher so I encourage teachers and math enthusiasts to take this and remake it as their own for the purposes of teaching some elegant math as well as building up intuition of complex techniques from simpler building blocks.
Before I dig into the problem, I want to motivate some thinking. When I was a child and I would discuss an interesting math problem with my dad (nicknamed Dobbins), he would always encourage me to think about some basic tools before just trying to solve a problem mechanically. Is there a symmetry that can give us insight? Are there special cases? Are there extreme cases? What would we expect at infinity… or zero? As a child, I attributed this type of thinking to my father. Of course, had I thought about it, I would have realized he was just trying to pass on the tools and passion he had learned from others. He had read works by some of the greats in math and science as well as in math and science education, people such as Feynmann, Polya, and Strogatz. He was appreciating the beauty of the intuition many great math educators were trying to pass forward, not inventing it. And nonetheless, for simplicity, and because I miss him dearly, I will call the amalgamation of these approaches the Dobbins approach.
The Problem.
The problem statement is relatively simple. It was introduced to me in calculus but I think it could have been introduced much sooner:
Suppose you are a farmer and you are trying to put a rectangular fence up around some land you will use to grow produce. You are constrained to work with the amount of fencing you have. How do you size the rectangular fence to have the maximum area inside of it?
The rephrasing.
We should look at the little details of the problem that help us formulate this as a math problem. Some constraints were given. The fence has to be rectangular. We have a fixed amount of fencing. In other words, the perimeter P is fixed. How can we choose the side lengths of this fence to get the largest area A possible.
The Dobbins Approach: What’s your favorite rectangle?
Immediately, we might want to ask some questions. Is there a limit on the maximum area? Or a minimum? When I first approached my father with this problem, he immediately suggested the Dobbins approach. He said, “it sounds like it must be a special type of rectangle. What could make one rectangle more special than another.”
I do not expect a reader to know the answer by now nor to believe that my father’s response can be characterized as a solution in and of itself even if they do. But my hope is that through reading the coming approaches to solving the problem, in some way the Dobbins approach given here actually does lend itself to a solution, or at least some really beautiful thinking.
The Early High School Approach: The Extremes
Though he did not explicitly suggest it for this problem, this next approach falls into the Dobbins class as well. We go back to the question and ask what are our degrees of freedom? Well we have a rectangle of fixed perimeter P. But we do have control over one side length. Play around with the interactive plot below. You can make P whatever you like. Then you can adjust the bottom side length a accordingly. Ask yourself what are the constraints on a? Why can we only adjust one side? How does the result change if I make a the right side rather than the bottom side?
Adjust a using the slider to explore its full range and watch what happens to the rectangle. As you are doing this think about the questions above. Maybe it will become clear as you play around that by fixing P, we have only one degree of freedom to adjust. When we set P and then a, the other side length, given by b in the interactive, is constrained by the relationships of a rectangle (Those relationships are made explicit on the left side of the interactive but before looking, see if you can derive them yourself).
In playing with a, you might notice that something interesting happens as you move towards the extremes. When a=0, making b=P/2, the rectangle collapses into a vertical line of zero area. When a=P/2, constraining b=0, the rectangle once again collapses into a line, this time horizontal, also of zero area. Note the symmetry of the constraint. We can flip the values of a and b to give the same rectangle, just rotated 90 degrees.
Another way of saying this is that if we start with a at the center of the slider (a=P/2), we have a=b, a square. As we slide away from the square in either direction, the rectangle gets thinner and longer and gets closer and closer to becoming a line of length P/2, (technically 2 overlapping lines of length P/2). And a line has no area. The more the dimensions depart from that of a square, the closer they get to a line. So the more you depart from a square, the more we approach the minimal area, zero.
By this logic, it would make sense that a square given by:
would maximize the area of a rectangle given a fixed perimeter. Let’s recall the Dobbins approach from the previous section. We were looking for a special type of rectangle. What would make any rectangle any more special than another? Think on it for a bit. From the approach in this section, I see two special rectangles, a square and a line, and a continuum of rectangles in between that can be achieved in two ways when we allow for flipping the values of a and b.
Diving Deeper Into the Geometry
Still not fully convinced? I think it’s good if you are not. I took a bit of an intuitive leap there when I said that by moving away from a square, we are moving towards zero area. It is true. As you adjust a away from the center, you will eventually approach a line at the limits. But we never actually showed that maximum area occurs in the center rather than somewhere between a=P/4 and a=0. We are just assuming that the area changes with side length a in a way similar to the image below:
That is, we know that when a=0 and when a=P/2 we have zero area. But we have not shown that the area monotonically decreases from the peak area at a=P/4 (a square) when a slides in either direction. For instance, what if the area relationship was to look like this:
In this hypothetical example, the peak area is achieved between the square and the line. It is important to note that if the choice of a which maximized the area occurred at some other place, a=x, for which the rectangle was not a square, then we would have two options which maximized the area. One at:
And another one at:
In a sense the square collapses these two options into one. When a=P/4, b=P/4 as well and switching them yields no change, not even in rotation, since they are identical. In any case, the above figure is correct in illustrating that no matter what the correct solution, the area magnitude should be symmetric about the square condition.
In a later section, we will focus on the mid high school algebra approach to derive the right curve for the area but for now we will try to take a more geometric approach.
Let’s move forward and try to show that the square does, in fact, give the maximum area and there is no other maximum between the square and the line. Let’s start by focusing on the constraint: the perimeter is a fixed value of P. Given that we are working with a rectangle, we can then write the following relationship between the two side lengths:
Note this is identical to writing the equation for the sum of all four sides:
Now instead of thinking of these variables simply as side lengths, we can think of this constraint as a the graph of a line:
If you are used to seeing the line in slope intercept form, we can rewrite this as:
You can see that line for a given perimeter P in the interactive below:
Note that we truncated the line to have some physical boundaries. Since x and y represents sides of a rectangle, neither one can be negative and neither one can be greater than P/2.
Something is really beautiful about this line. In a sense, it contains all of the information for any rectangle we can make. That will become clearer in the interactive below:
As we slide one side length, a, around its permissible range, we see we get all the possible rectangles forming under the constraint line with one vertex sliding along the line. Take some time to think about what the constraint line represents and what coordinates on a graph mean geometrically and see if you can convince yourself that the vertex of the rectangle should fall on this line.
Set the slider so that a=P/2. We then see we start with a horizontal green line that hits the bottom of the constraint line. What happens to the area of the rectangle the moment we slide a a little to the left? We know both by the interactive and by intuition that the area of this rectangle goes from zero to something non-zero. But the bigger question we are trying to answer is, does it keep increasing until we reach a square?
Another way of thinking about this question comes from looking at the purple shaded vertical rectangle and the blue shaded horizontal rectangle in the interactive below:
Note that as we slide a, we are actually taking small steps to change the size of a. Slide a from right to left (P/2 to 0) and see if you can convince yourself that the purple shaded rectangle contains the section (and associated area) we lost when moving from the last step to the current one. Similarly, see if you can convince yourself that the blue shaded rectangle is the section (and associated area) we gained. When we move a from left to right, the significance of the two shaded areas flip. The blue shaded area takes on the meaning of the area we will lose on the next step and the purple shaded area becomes the area we stand to gain. Also notice that each of those shaded rectangles has one side whose length is given by the length of the step in a. Let’s call that step ∆a. If we call the other side of either of those shaded rectangles
for the purple rectangle and
for the blue one, then the area of each rectangle will be:
So, determining which area is larger, the area we lost or the area we gained, comes down to determining which length is larger.
The height of the purple rectangle is what the height of the green rectangle will be on the next step in a to the right.
The width of the blue rectangle is the width of the green rectangle, a.
So, we can see that as we start moving a from right to left, starting at a=P/2, the area we gained is larger than the area we lost (the area is increasing) as long as a, the current width, is greater than the previous b (the previous height). Play around with a for a bit and, armed with this idea, you might convince yourself that this means that as a moves from the extremes, the rectangle is monotonically gaining area until we reach a square, and then it starts monotonically losing area… which means our special rectangle, the square, maximizes the area for a fixed perimeter!
That was fun! I think there are some very elegant ideas in that approach that I now want to make a little more formal.
Let's give finer control over the step size ∆a. In the previous interactive, it was fixed at ∆a=0.1. Now we will add a slider for ∆a, given by the variable d in the following interactive:
Play around with the step size d and see how the shaded areas change as d changes. Similar to our collapsing green rectangle, as d gets smaller and smaller, the shaded areas approach two lines that overlap. So we get closer and closer to a place where:
So if we extend this intuition, whenever a>b or b>a, there is area to be gained! That means the only time there is no area to be gained by changing a is when we have reached our special rectangle, the square!
While I called this the early high school approach, if you followed it then you are also catching onto many of the fundamental ideas of the coming approaches and the fundamentals of calculus and optimization as well.
A Conceptual Interlude
Looking back on what you have seen so far, it is worth taking some time to appreciate why a square might be the answer. We had a problem with a constraint on perimeter. Think about setting the fence post by post. As we are setting down posts, we are exhausting perimeter. And so we want to ask, are we making progress towards the goal of not only setting down fence posts, but enclosing an area? If we continue to set too far down a line, we are never really making progress toward the goal of enclosing the space. And if we set the first edge too short, then the other edge will progress as a line that is not making progress toward that goal! In that sense, a square contains that balance between laying down perimeter and making progress towards enclosing the area!
Challenge On The Side
A square makes sense as it intuitively satisfies a balance between exhausting our finite resource in perimeter and simultaneously making progress towards the goal of enclosing area. What if we were not restricted to a rectangle but we could use any 2D shape? What shape would we choose in order to enclose the most area with a fixed amount of fencing given by P?
A Graphical Approach (The Mid High School Approach)
Remember this curve for the relationship between area and side length?
You might have wondered where that curve came from and if it was accurate. We will now derive that relationship. Let’s review some of the relationships of a rectangle. If, like in previous sections, the side lengths are a and b then the relationship for the area and perimiter are given by:
or, dividing both sides by 2 as we saw in the previous section:
Now we can solve for b in terms of a:
And plug b into the relationship for area:
You may recognize the area equation as the equation for a parabola from one of your algebra classes. Play around with P in the graphic below and see what happens to the parabola:
Pay attention to where the vertex falls as you move P and think about what the vertex means. It is the peak of this graph which represents area for changing side length a. So, we can not only see what the peak area is for any given a, but what side length a is associated with that peak!
Now, keep P fixed at whatever value you choose, and slide a. As a slides, you will see the corresponding point on the parabola slide as well. Pay attention to the value of a in relation to P at the peak of this parabola.
Now I asked you to do all this because I thought the exercise might be insightful, but we can explicitly find a solution using the quadratic function tools we learned in algebra. If we bring back the image of the parabola from above, you might notice that this quadratic has roots (x-axis crossings) at 0 and P/2.
This falls right out of the quadratic above:
Or you could use the quadratic formula on this version of the equation:
And get the same results for the roots. Due to the symmetry of the parabola, we expect the peak to fall right between the two roots, 0 and P/2. So the value of the side length a at the peak is P/4. And if a=P/4 then so does b. This corresponds to a square!
Another way we could have solved for the peak is using the formula for the x-coordinate of the peak of a parabola:
Challenge On The Side
Where does this equation for the x-coordinate associated with the peak of the parabola come from? One solution comes from calculus. But for those who have not taken calculus, see if you can derive this from the quadratic formula for the roots. You might then ask yourself where the quadratic formula comes from. Try to see if you can figure that out as well (or google it)!
Once again giving:
So, we now have yet another way of showing that a square is the choice of rectangle that maximizes the area for a fixed perimeter!
Another Conceptual Interlude!
Let’s put everything we have learned so far into a single interactive graphic:
Set P to whatever value you like and then slide a and look at what is happening to the different elements in the graphic. Try to think about the relationships between them. Pay particular attention to what happens to the shaded areas as our point moves along the parabola. Remember how we showed that when a>b or b>a, there is area to be gained by changing a? Well we can also see how much area there is to be gained by sliding a.
If we take the approach of moving a from left to right, we interpret the purple shaded area as the area we stand to gain by taking a small step to the right (increasing a) and the blue shaded as the area we stand to lose. You might notice when we start at a=0, there is a lot of area to be gained and none to be lost. And then, as we slowly slide a to the right, the area to be gained decreases while the area to be lost increases until they balance each other out. And the increase in area is very steep at the beginning but becomes more gradual as we approach the square and then the area starts decreasing after the square. Note where the point on the parabola is as you pay attention to these steep and then less steep changes in area. The area parabola itself is steepest at a=0 and then its steepness gradually declines until we hit the peak (representing the square) which has no steepness. So, one way of looking for a maximum is looking for the place where the curve has no steepness (zero slope).
If you look at the interactive it might make sense to you that the area to be gained can be approximated for small step sizes ∆a as:
And the area to be lost is given by:
So the total area we stand to gain (or lose) can be approximated as:
And this approximation gets more and more accurate as ∆a gets smaller.
We said before that the area will reach its maximum when there is no area to be gained by moving in any direction. So for small ∆a, we expect this to happen at a place where ∆A=0:
This once again gives a square!
You may not know it yet, but if you followed until now, you have understood some of the most fundamental and elegant concepts in calculus! And if I rewrite the above equation for the change in area:
As a ratio of the change in area to the change in side length:
Then we have the local steepness (or slope) of the parabola at different values of a. In calculus, this is called the derivative! There are many elegant examples that connect calculus concepts like the derivative to simpler geometric concepts like the area to be gained above.
Resources For Further Learning
See this Essence of Calculus playlist by 3Blue1Brown for more elegant examples of intuitive calculus. Pay particular attention to Derivative formulas through geometry.
The Calculus Approach
You may not have taken calculus before but if you have made it this far, you have essentially seen and understood the calculus approach and we are just going to work through the mechanics of getting to a solution with some graphics for aid.
We return to the equation for area after substituting in the constraint,
Now we take the derivative of the area with respect to the side length a:
Then we set the derivative to zero:
So once again, each side length of the rectangle is given by:
Yielding our special rectangle, the square!
We can go a step further and plot the tangent line (line of local slope) on the parabola:
Note how the orange tangent line is longer for steeper parts of the curve and gradually gets less steep as we move toward the peak! That tangent line was calculated using the derivative:
We have once again examined the same concept from the last section of using steepness to find the maximum but this time, using the language of calculus.
Integrating Approaches We Have Seen So Far
Play around with the interactive below and see if you can make sense of all of the different patterns coming together.
The Multi-Variable Calculus Approach
Another approach we could have taken is by looking for all feasible areas and then choosing the highest one. What I mean by this is we start out with the relationship for area:
Since we are going to apply to this graphically, I will write it as:
Or:
Remember, the side lengths must be positive, so we will only plot positive coordinates. For now, think of A as a constant. We will then test different values of that constant. See the following interactive to see what this curve looks like for different values of A:
Note, whereas in previous sections we embedded the constraint,
into the area curve forming a parabola, this time we have not said anything yet about the constraint. Now we will add the constraint line as a separate function in the interactive below:
In the above interactive, fix P to a desired value and then start sliding area A to the right starting from zero. Note that if the area curve crosses the constraint line, then there exists at least one rectangle with that area that meets the constraint on P. Do you see any symmetries in the crossing locations?
Let’s keep sliding A as high as we can go while still touching the constraint line. In other words, we are looking for the highest area that satisfies the constraint, and therein must lie the solution! What do you notice as you push A higher and higher?
The last place we touch the constraint line is just barely kissing it. In other words, we have pushed up the area and therefore the area curve until it is tangent to the constraint! If it is tangent, then the two curves have the same derivative at this point:
Set the two derivatives equal and solve:
This is once again the equation for our special rectangle, the square!
What we just saw was a special 2-D case of a broader multi-dimensional problem. When we set A to any specific value, we can call that curve a "level set" or "contour curve" of the more general 3-D surface given by:
Resources For Further Learning
Here are some resources to learn more about contour curves:
Now A is a variable instead of a constant that we change a few times. Saying that the area curve is tangent to the constraint line generalizes to saying that the two have parallel gradients (when compressed into 2 dimensions) at the location of the solution. We can rewrite the constraint equation as:
Resources For Further Learning
Here is are some videos to learn more about the gradient and its graphical interpretation. Even if you have taken multi-variable calculus, it might still be useful to remind yourself why the gradient is the direction of steepest ascent and why the gradient is perpendicular to contour curves.
Saying that A and g have parallel gradients is equivalent to saying that one is a constant multiple of the other:
If you have taken multi-variable calculus, you might recognize this constant multiple as a Lagrange multiplier. This equation says that:
Now that we know that x=y, we already know we have a square but we can solve for it explicitly by plugging back into the constraint equation:
In the previous equations we solved for the gradient vector:
Let's add the plots of the gradient vectors at the locations of intersection with the constraint in the interactive below. Adjust A and see what happens to these gradient vectors.
When we worked in 2 dimensions, the derivative gave us the a line tangent to the curve. You might notice that when we generalize to multiple dimensions, the gradient gives a vector that is normal to our level curves. And when A reaches the maximum feasible area, the two gradient vectors overlap and are normal to the constraint curve.
If you have taken multi-variable calculus before, this is the method of Lagrange multipliers. Though, if you are like me, you may not have internalized the intuition that drives the mechanics of Lagrange multipliers… until now.
Resources For Further Learning And A Challenge
Here is a more detailed video on Lagrange Multipliers by Dr. Trefor Bazett. Take some time to think about the graphical intuition of Lagrange multipliers. Now take some time to think what meaning the value of the Lagrange multiplier could have in our problem and more generally. It may not be obvious so if you get stuck, check out this video.
Integration of all approaches
Play around with the a slider in the interactive below and see how all of the approaches come together:
Conclusion And Challenges For Future Work
My hope is that in going through this you were able to see some patterns and connections between the approaches and that you found some elegance or insights you might be able to bring to other problems. If you are a younger student, I hope the concepts were introduced in such a way that you were able to learn some of the thinking that might be beyond your grade level. And my hope for more seasoned math students is that they can work backwards and connect some of the complex and abstract approaches they learned in school to simpler and more intuitive ideas that govern them.
And now that you have seen all of the approaches, some more rigorous than others, think back to the Dobbins approach. It clearly was not a rigorous proof. It just asked the question “What makes one rectangle more special than the others.” Can you see what makes a square special? After seeing all of these approaches, is the Dobbins approach more or less convincing than when you first saw it?
See if you can use any of the above approaches and patterns of thinking to figure out:
1) What shape would we choose if instead of being restricted to a rectangle, we were restricted to a parallelogram? Or a quadrilateral? Build up your thinking to more and more general shapes.
2) If you have a fixed amount of material, given by surface area S, to make a rectangular box, how would you maximize the volume of the box? Try to answer the problem using multiple approaches.
3) We mentioned that one reason a square made sense is that it intuitively satisfies a balance between exhausting our finite resource in perimeter and simultaneously making progress towards the goal of enclosing area. What if we were not restricted to a rectangle but we could use any 2D shape? What shape would we choose in order to enclose the most area with a fixed amount of fencing given by P? In the process of finding that answer, think about why other shapes might be bad? Why would we not want a shape that squiggles (non-convex)? What 3D shape would be best for a fixed surface area S?
4) Look back at the multi-variable calculus approach and try to think about the meaning of that constant multiple we called a Lagrange multiplier. What meaning could it take on in this example? More generally, how can we interpret the values of Lagrange multipliers.