We continue our
-server series (see post 1 here). In this post we briefly discuss the concept of a fractional solution for
-server, which by analogy with MTS will in fact be a fractional “anti-solution”. Then we introduce the continuous time version of MTS and explain how it applies for
-server. Finally the most important part of the post is the description of the basic potential based analysis of mirror descent and how to interpret it in the context of MTS.
State representation for MTS
- Recall the MTS problem: one maintains a random state
- (where
- is equipped with a distance
- ), and given a new cost function
- , one can update the state to
- with the corresponding price to pay:
- . Equivalently one can maintain the probability distribution
- of
- indeed given
one can obtain
by optimal coupling, that is the couple of random variables
minimizes the expected distance
over all couples
such that
has marginal
and
has marginal
(the latter quantity is called the Wasserstein-
distance between
and
). In this view the (expected) service cost is now a linear function, that is
where
, and the movement cost between
and
is the Wasserstein-
distance.
We will further assume the existence of a convex body
and a norm
in
(
) such that the Wasserstein-
distance between two distributions
is equal to
where
are “expanded states” with
and
. For a weighted star metric it suffices to take
, but we will see in the fourth post that for trees one indeed need to expand the state space.
Fractional solution for
-server
Recall the
-server problem: one maintains a random configuration
, and given a new request
one must update the configuration to
such that
. One could equivalently maintain a distribution
as before. In the fractional problem one in fact only maintains the
moment of this distribution,
, defined by
(in particular servicing a request a location
means that one must have
). Again the metric here on the
variables is the Wasserstein distance induced by
(a.k.a., the optimal transport distance). Importantly we note that this view is not equivalent to maintaining a full distribution (indeed a lot of information is lost by recording only the first moment). This raises the subtle issue of whether one can round online a fractional solution to a proper integral solution whose total movement is of the same order of magnitude. We will not touch upon this question here, and we focus on the fractional
-server problem, see for example Section 5.2 here for more. We note however that the existence of a polynomial time algorithm for this rounding task is an open problem.
To think of the request as a linear cost function (like in MTS) it is more appropriate to work with the anticonfiguration
. In this view a request
is equivalent to a cost vector
. Finally like in MTS we will assume the existence of an expanded state space
and a norm
that measures movement in this expanded view.
Continuous time decision making
We will now move to a continuous time setting, where the (discrete time) sequence of cost vectors is revealed continuously as a path
, with
(and
). The decision maker’s reponse is a path
that lives in
. In this setting the service cost of the algorithm is
and the movement cost is equal to
where
is the time derivative of the path
. We note that there is small subtelty here to translate the continuous time service cost into a meaningful discrete time service cost, but we will not worry about this here since it does not affect the argument for
-server (where there is only a movement cost). If you are curious see the appendix here.
For
-server we will use
where
is the currently requested location, and we move to the next request at the first time such that
(which means that
satisfies
, i.e., there is a server at the requested location.
Mirror descent
If you have never seen the mirror descent framework now is a good time to take a quick look here.
Very succintly mirror descent with mirror map
can be written as follows, with a step-size
:
where we recall that
is the Legendre-Fenchel transform of
(i.e., the map whose gradient is the inverse map of the gradient of
) and
is the Bregman divergence associated to
.
We now want to take
to
in the above definition to find the continuous time mirror descent update. For that let us recall the first order optimality condition for constrained optimization. Denote
for the normal cone of
at
, i.e., the set of directions which are negatively correlated with any direction going into the body. One then has for any convex function
,
In particular we see that (note that
)
and thus taking
to
we morally get
This type of equation is known as a differential inclusion, and with the added constraint that the path must live in the constraint set
we get a viability problem. In our paper we show that a solution indeed exist (and is unique) under mild assumptions on
.
Potential based analysis
The mirror descent algorithm is fully described by:
Denoting
we see that for any fixed
,
The above calculation is the key to understand mirror descent: it says that if the algorithm is currently paying more than
, i.e.,
, then it is actually getting closer to
in the sense that
is decreasing. Put differently: when the algorithm pays, it also learns. This key insight is sufficient for online learning, where one competes against a fixed point
. However in MTS and
-server we compete against a path
, and thus we also need to evaluate by how much the Bregman divergence can go up when
is moving. This is captured by the following calculation:
Putting together the two above calculations we obtain the following control on the service cost of mirror descent in terms of the service cost and movement cost of the optimal path:
Lemma: The mirror descent path
satisfies for any comparator path
,
At this point the big question is: how do we control the movement of mirror descent? In the next post we will see how this plays out on a weighted star.