Analysis: Sabotage by Nathan Pinsker and Bruce Merry


This problem is slightly unusual. We're dealing with the average of sets of numbers here, which is more complicated to work with than other functions (such as their sum). Fortunately, an easier version of the problem will provide some insight into how to solve the original.

A similar problem is as follows: instead of minimizing the average milk production, we just try to minimize the sum of all remaining machines' productions. This is equivalent to finding the subinterval with maximum possible sum, which has a fairly well-known linear time solution. (For those unfamiliar with this problem, Wikipedia has an excellent summary here). However, this easier problem doesn't immediately offer a solution to the original one, as it is unclear how to translate the method used from sums to averages.

We can still use these insights to derive information about averages, though! In particular, we can easily check whether a subarray has a positive or negative average, because the subarray's average and its sum always have the same sign. By similar reasoning, we can choose a subarray and then figure out the sign of the average of all elements *not* in that subarray. Extending on this idea, we can subtract a number B from every machine's milk production, then find the subarray of machines with maximum possible sum. The remaining elements not in our subarray will have a positive sum if and only if they have an average of at least B in our unmodified array. This means we have the ability to check, for any number B, whether a subarray of milking machines exists that, when removed, leaves milking machines with average at least B. We can use binary search to find the minimum B for which such a subarray exists, which solves the problem.

Brian Dean's solution is below:

#include <stdio.h>
#define MAX_N 100000

int N, S[MAX_N];

int round3(double x) { return (int)(1000.0 * x + 0.5); }

int possible(double guess)
{
  int i, total=0;
  double best, current;
  for (i=0; i<N; i++) total += S[i];
  best = current = S[1] - guess;
  for (i=2; i<N-1; i++) {
    if (current < 0) current = 0;
    current += S[i] - guess;
    if (current > best) best = current;
  }
  return best >= total - guess * N;
}

double solve(void)
{
  double low = 1.0, high = MAX_N * 10000.0;
  while (round3(low) != round3(high)) 
    if (possible((low+high)/2)) high = (low+high)/2;
    else low = (low+high)/2;
  return low;
}

int main(void)
{
  int i;
  freopen ("sabotage.in", "r", stdin);
  freopen ("sabotage.out", "w", stdout);
  scanf ("%d", &N);
  for (i=0; i<N; i++) scanf ("%d", &S[i]);
  printf ("%.3lf\n", solve());
  return 0;
}

An alternative approach lets us compute an exact answer in O(N log N) time. Suppose we are left with p milking machines at the start and q machines at the end, with sums P and Q. The average over all these machines is (P + Q) / (p + q), which is also the slope of a line from (-p, -P) to (q, Q). We can now apply ideas from computational geometry to find the optimal q for each p. We will iterate p downwards, so that after each query we have one new value of q to consider.

Since we are searching for a minimum slope, it is easy to see that we do not need to retain all (q, Q) points: only those on the lower hull have any use. It is also easy to add a new (q, Q) point and update the lower hull, in amortized constant time (simply popping off points from the end that are no longer in the hull). We can then do a binary search to find the point on the lower hull that makes the smallest slope relative to (-p, -P).

Bruce Merry's solution is below:

#include <fstream>
#include <iomanip>
#include <algorithm>
#include <numeric>
#include <vector>
#include <complex>

using namespace std;

typedef long long ll;
typedef complex<ll> pnt;

static ll cross(const pnt &a, const pnt &b)
{
    return imag(conj(a) * b);
}

static ll cross(const pnt &a, const pnt &b, const pnt &c)
{
    return cross(b - a, c - a);
}

int main()
{
    ifstream in("sabotage.in");
    ofstream out("sabotage.out");

    int N;
    in >> N;
    vector<int> m(N);
    for (int i = 0; i < N; i++)
        in >> m[i];

    double ans = HUGE_VAL;
    pnt base(-(N - 2), -accumulate(m.begin(), m.begin() + (N - 2), 0));
    vector<pnt> hull;
    hull.push_back(pnt(1, m.back()));

    for (int i = N - 2; i > 0; i--)
    {
        int L = -1;
        int R = hull.size() - 1;
        while (R - L > 1)
        {
            int mid = (L + R) / 2;
            if (cross(base, hull[mid], hull[mid + 1]) > 0)
                R = mid;
            else
                L = mid;
        }
        pnt slope = hull[R] - base;
        ans = min(ans, (double) slope.imag() / slope.real());
        base.real()++;
        base.imag() += m[i - 1];
        pnt next = hull.back() + pnt(1, m[i]);
        while (hull.size() > 1 &&
            cross(hull[hull.size() - 2], hull[hull.size() - 1], next) <= 0)
            hull.pop_back();
        hull.push_back(next);
    }
    out << fixed << setprecision(3) << ans << '\n';
    return 0;
}