Note that the benefits from a marginal cow on a job is decreasing - in particular, the marginal benefit of going from c cows to c+1 cows is a(c(c+1)), which is decreasing. Therefore, we get an O(KlogN) solution just by operating greedily: start with a single cow on each job, and for the next K−N steps, add a cow to a job that decreases the answer by the most.
However, because K is large, this solution is too slow. Instead, we binary search on a parameter x, which represents the last marginal benefit we got from adding a cow to a job.
For a fixed x, we can compute how many cows we can add until our benefit becomes less than x. We find the value x such that we use K cows (which exists by monotonicity) and then compute the answer. More precisely, for some x we will be close to K, say with T workers used. Then the actual answer should be adjusted by (K−T)∗x because all the extra / less workers will be providing marginal value x. This is what the last few lines of the code below are doing.
#include <bits/stdc++.h> using namespace std; typedef long long ll; const int MAXN = 1e5+5; ll n, k, a[MAXN]; ll ct(double x) { ll tot = 0; for (int i = 0; i < n; ++i) // Using the quadratic formula to solve a[i]/(c(c+1)) >= x tot += ll((sqrt(1 + 4*a[i]/x)-1)/2); return tot; } int main() { cin >> n >> k; k -= n; for (int i = 0; i < n; ++i) cin >> a[i]; double lo = 0, hi = 1e18; for (int i = 0; i < 200; ++i) { double mid = (lo+hi)/2; if (ct(mid) >= k) lo = mid; else hi = mid; } double ans = 0; ll tot = 0; for (int i = 0; i < n; ++i) { ll x = ll((sqrt(1 + 4*a[i]/lo)-1)/2); ans += 1.0*a[i]/(x+1); tot += x; } cout << (ll)round(ans - (k-tot)*lo) << endl; }