This page describes the selection methodology that was used for selecting the collection and benchmark sets of instances.
For the first time in its history, the instance sets of MIPLIB 2017 have been compiled using an automated, feature-based procedure. A wide representation of today's MIP instances without overrepresentation of a particular type are the declared goals of the process.
From over 5000 instances that originated from original submissions and the pool of available instances from previous MIPLIBs, instance features have been collected that describe the main properties of the involved variable and constraint types as well as statistics about the objective function, constraint matrix and right hand sides, and variable bound types. All features have been collected after applying a trivial presolving to the instances and were appropriately scaled. If available, instance features also comprised decomposition metrics of a decomposition computed with GCG.
Subsets of the submission that are known to represent a single optimization model for different data have been assigned to groups, consisting of up to 360 instances. For all such groups with at least 6 (11) instances, a diverse preselection has been performed by selecting a set of exactly 5 (10) representatives, such that the minimum pairwise feature space distance within the selection is maximized. Besides, a selection should incorporate at least 1 instance from the group that is suitable for benchmark (see below). This diversity optimization problem has been formulated as a MIP and solved to optimality individually for 238 instance groups. Unselected instances from the considered groups were discarded for the remaining selection process.
The surviving instance set has been partitioned into clusters using k-means clustering simultaneously for 10 different subspaces of the feature space, such as the size-related features or decomposition features. Another MIP, the Collection MIP, was formulated to maximize the number of instances in the Collection Set. As side constraint, the Collection MIP selects instances equally from every cluster within its respective family, to reach a good total coverage of available instance feature space. Additional constraints demand the Collection to feature at least 1 instance from every new submission, but limit the total number of instances within a group to 5 to avoid overrepresentation.
For creating the benchmark set, the benchmark suitability of an instance was determined. Among other requirements, an instance is considered suitable for benchmark if it
Finally, based on the collected performance results, the remaining instances were clustered again. Generally speaking, the approach clusters instances for which a solver has performed particularly well or bad, with special treatment for instances where it even timed out. The resulting performance clusterings for every solver were used to formulate selection constraints in the same spirit as for the coverage of the feature space. The Benchmark MIP finds an optimal selection of instances that are suitable for benchmark and that cover the feature and performance space well. The Benchmark Set is based on an optimal solution for the Benchmark MIP with slight modifications (2 instances had to be exchanged because they were later discovered to be numerically troublesome).