Optimize a Data Center is a challenging programming problem presented in the Qualification Round of the Hash Code 2015. Among 230 teams, What’s in a name? from École normale supérieure ranked first in the qualification, but third in the final round. By challenging, I mean it is not possible to come up with a deterministic polynomial-time optimal answer. I am not in a position to either provide a rigorous proof of its complexity or its reduction to a known NP-hard problem. But in this blog post I will investigate the following question: Can we provide an optimal solution using integer programming? In practice, that would allow us to come up with an optimal solution to small-sized problems. Without further ado, let’s start with the problem definition.
Here, I will present a brief summary of the actual problem. A data center is modeled as rows of slots in which servers can be placed. And some of the slots are known to be unavailable.
Each server is characterized by its size and capacity. Size is the number of consecutive slots occupied by the machine. Capacity is the total amount of CPU resources of the machine (an integer value).
Servers in a data center are also logically divided into pools. Each server belongs to exactly one pool. The capacity of a pool is the sum of the capacities of the available servers in that pool.
The guaranteed capacity of a pool is the minimum capacity it will have when at most one data center row goes down. Given a schema of a data center and a list of available servers, the goal is to assign servers to slots within the rows and to logical pools so that the lowest guaranteed capacity of all pools is maximized.
Consider the following data center schema and a list of available servers. For simplicity, it is assumed that server capacities are equal to server sizes.
Following layout is a solution to the above given problem. Here different pools are denoted in distinct colors.
Before modeling the IP (Integer Program), I will start with stating the problem input.
While modeling the formulation, I will need to provide constraints to avoid placing servers to unavailable slots. Rather than doing that, Atabey Kaygun hinted me to represent the data in blocks. That is, instead of , , and , I will transform this data into a single lookup table called that denotes the size of the available slots at th row and th block. For instance, consider the following layout:
Here blocks will look as follows:
For simplicity, I will adopt the following index notation:
Given that denotes the available blocks, the IP model can be defined as follows:
The presented IP model contains a minimax objective, which to the best of my knowledge is not tractable by popular linear programming optimizers, such as CPLEX or lpsolve. But I have a trick in my pocket to tackle that. Let’s assume that we know the optimal objective, say . Then we can model the entire IP as follows:
What this model states is this: I am not interested in the optimization objective, return me the first found feasible solution. That is, the optimizer will return us the first variable set the moment it finds a feasible solution satisfying constraints.
Now things are getting interesting. If we can find bounds to , than we can use these bounds to bisect the optimal ! For the lower bound, we know that . The upper bound is a little bit tricky, but we can come up with a quite loose bound: . (I will not go into details of how to come up with a stricter upper bound.) So by picking we can bisect the optimal guaranteed capacity.
For testing purposes, I wrote a simple Python script that reads an input problem file and calls CPLEX iteratively. A sample output of the script is as follows:
$ ./dc.py ./cplex.sh prob/dc-min.in soln/dc-min 0
2016-03-03 08:35:25 DEBUG reading setup: prob/dc-min.in
2016-03-03 08:35:25 DEBUG R=2, S=5, U=1, P=2, M=5
2016-03-03 08:35:25 INFO solving for 0 <= g=8 < 16
2016-03-03 08:35:25 DEBUG writing problem: soln/dc-min-8.lp
2016-03-03 08:35:25 DEBUG running solver
2016-03-03 08:35:25 DEBUG writing cplex output: soln/dc-min-8.out
2016-03-03 08:35:25 DEBUG no solution
2016-03-03 08:35:25 DEBUG stepping back
2016-03-03 08:35:25 INFO solving for 0 <= g=4 < 8
2016-03-03 08:35:25 DEBUG writing problem: soln/dc-min-4.lp
2016-03-03 08:35:25 DEBUG running solver
2016-03-03 08:35:25 DEBUG writing cplex output: soln/dc-min-4.out
2016-03-03 08:35:25 DEBUG solution score: 5
2016-03-03 08:35:25 DEBUG writing solution: soln/dc-min-4.soln
2016-03-03 08:35:25 DEBUG stepping forward
2016-03-03 08:35:25 INFO solving for 5 <= g=6 < 8
2016-03-03 08:35:25 DEBUG writing problem: soln/dc-min-6.lp
2016-03-03 08:35:25 DEBUG running solver
2016-03-03 08:35:25 DEBUG writing cplex output: soln/dc-min-6.out
2016-03-03 08:35:25 DEBUG no solution
2016-03-03 08:35:25 DEBUG stepping back
2016-03-03 08:35:25 INFO solving for 5 <= g=5 < 6
2016-03-03 08:35:25 DEBUG writing problem: soln/dc-min-5.lp
2016-03-03 08:35:25 DEBUG running solver
2016-03-03 08:35:25 DEBUG writing cplex output: soln/dc-min-5.out
2016-03-03 08:35:25 DEBUG solution score: 5
2016-03-03 08:35:25 DEBUG writing solution: soln/dc-min-5.soln
2016-03-03 08:35:25 DEBUG stepping forward
It also outputs intermediate IP formulation files, CPLEX output, and solution file. The format of the problem and solution files are detailed in the official problem description.