k-server, part 2： continuous time mirror descent

We continue our -server series (see post 1 here). In this post we briefly discuss the concept of a fractional solution for -server, which by analogy with MTS will in fact be a fractional “anti-solution”. Then we introduce the continuous time version of MTS and explain how it applies for -server. Finally the most important part of the post is the description of the basic potential based analysis of mirror descent and how to interpret it in the context of MTS.

State representation for MTS

Recall the MTS problem: one maintains a random state
(where
is equipped with a distance
), and given a new cost function
, one can update the state to
with the corresponding price to pay:
. Equivalently one can maintain the probability distribution
of: indeed given one can obtain by optimal coupling, that is the couple of random variables minimizes the expected distance over all couples such that has marginal and has marginal (the latter quantity is called the Wasserstein- distance between and ). In this view the (expected) service cost is now a linear function, that is where , and the movement cost between and is the Wasserstein- distance.

We will further assume the existence of a convex body and a norm in ( ) such that the Wasserstein- distance between two distributions is equal to where are “expanded states” with and . For a weighted star metric it suffices to take , but we will see in the fourth post that for trees one indeed need to expand the state space.

Fractional solution for -server

Recall the -server problem: one maintains a random configuration , and given a new request one must update the configuration to such that . One could equivalently maintain a distribution as before. In the fractional problem one in fact only maintains the moment of this distribution, , defined by (in particular servicing a request a location means that one must have ). Again the metric here on the variables is the Wasserstein distance induced by (a.k.a., the optimal transport distance). Importantly we note that this view is not equivalent to maintaining a full distribution (indeed a lot of information is lost by recording only the first moment). This raises the subtle issue of whether one can round online a fractional solution to a proper integral solution whose total movement is of the same order of magnitude. We will not touch upon this question here, and we focus on the fractional -server problem, see for example Section 5.2 here for more. We note however that the existence of a polynomial time algorithm for this rounding task is an open problem.

To think of the request as a linear cost function (like in MTS) it is more appropriate to work with the anticonfiguration . In this view a request is equivalent to a cost vector . Finally like in MTS we will assume the existence of an expanded state space and a norm that measures movement in this expanded view.

Continuous time decision making

We will now move to a continuous time setting, where the (discrete time) sequence of cost vectors is revealed continuously as a path , with (and ). The decision maker’s reponse is a path that lives in . In this setting the service cost of the algorithm is and the movement cost is equal to where is the time derivative of the path . We note that there is small subtelty here to translate the continuous time service cost into a meaningful discrete time service cost, but we will not worry about this here since it does not affect the argument for -server (where there is only a movement cost). If you are curious see the appendix here.

For -server we will use where is the currently requested location, and we move to the next request at the first time such that (which means that satisfies , i.e., there is a server at the requested location.

Mirror descent

If you have never seen the mirror descent framework now is a good time to take a quick look here.

Very succintly mirror descent with mirror map can be written as follows, with a step-size :

where we recall that is the Legendre-Fenchel transform of (i.e., the map whose gradient is the inverse map of the gradient of ) and is the Bregman divergence associated to .

We now want to take to in the above definition to find the continuous time mirror descent update. For that let us recall the first order optimality condition for constrained optimization. Denote for the normal cone of at , i.e., the set of directions which are negatively correlated with any direction going into the body. One then has for any convex function ,

In particular we see that (note that )

and thus taking to we morally get

This type of equation is known as a differential inclusion, and with the added constraint that the path must live in the constraint set we get a viability problem. In our paper we show that a solution indeed exist (and is unique) under mild assumptions on .

Potential based analysis

The mirror descent algorithm is fully described by:

Denoting we see that for any fixed ,

The above calculation is the key to understand mirror descent: it says that if the algorithm is currently paying more than , i.e., , then it is actually getting closer to in the sense that is decreasing. Put differently: when the algorithm pays, it also learns. This key insight is sufficient for online learning, where one competes against a fixed point . However in MTS and -server we compete against a path , and thus we also need to evaluate by how much the Bregman divergence can go up when is moving. This is captured by the following calculation:

Putting together the two above calculations we obtain the following control on the service cost of mirror descent in terms of the service cost and movement cost of the optimal path:

Lemma: The mirror descent path satisfies for any comparator path ,

At this point the big question is: how do we control the movement of mirror descent? In the next post we will see how this plays out on a weighted star.