(Analysis by Yang Liu)

Note that the benefits from a marginal cow on a job is decreasing - in particular, the marginal benefit of going from $c$ cows to $c+1$ cows is $\frac{a}{(c(c+1))}$, which is decreasing. Therefore, we get an $O(K \log N)$ solution just by operating greedily: start with a single cow on each job, and for the next $K-N$ steps, add a cow to a job that decreases the answer by the most.

However, because $K$ is large, this solution is too slow. Instead, we binary search on a parameter $x$, which represents the last marginal benefit we got from adding a cow to a job.

For a fixed $x$, we can compute how many cows we can add until our benefit becomes less than $x$. We find the value $x$ such that we use $K$ cows (which exists by monotonicity) and then compute the answer. More precisely, for some $x$ we will be close to $K$, say with $T$ workers used. Then the actual answer should be adjusted by $(K-T)*x$ because all the extra / less workers will be providing marginal value $x$. This is what the last few lines of the code below are doing.

#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

const int MAXN = 1e5+5;

ll n, k, a[MAXN];

ll ct(double x) {
ll tot = 0;
for (int i = 0; i < n; ++i)
// Using the quadratic formula to solve a[i]/(c(c+1)) >= x
tot += ll((sqrt(1 + 4*a[i]/x)-1)/2);
}

int main() {
cin >> n >> k;
k -= n;

for (int i = 0; i < n; ++i)
cin >> a[i];

double lo = 0, hi = 1e18;
for (int i = 0; i < 200; ++i) {
double mid = (lo+hi)/2;
if (ct(mid) >= k)
lo = mid;
else
hi = mid;
}

double ans = 0;
ll tot = 0;

for (int i = 0; i < n; ++i) {
ll x = ll((sqrt(1 + 4*a[i]/lo)-1)/2);
ans += 1.0*a[i]/(x+1);
tot += x;
}

cout << (ll)round(ans - (k-tot)*lo) << endl;
}