(Analysis by Nathan Pinsker)

Before anything else, we should, for each location, sort the microcontrollers that can be installed at that location by their cost. This is motivated by the fact that the correct answer can be obtained by a greedy method of construction: we should build the cheapest possible robot that we have not yet built, and do this $K$ times. It follows, then, that we should always use the cheapest possible microcontroller that we can at each position without making a robot we have already built.

One initial approach to this problem might be to start by installing the robot with the cheapest possible microcontroller at every position. Then, after each robot we create, we imagine $N$ new candidate robots. Each of these robots is identical to the one we have just installed, except exactly one of their microcontrollers is minimally more expensive. We put each of these robots into a priority queue with key equal to its cost; then we repeatedly extract the lowest-cost robot from the priority queue and put $N$ new robots in until we have built $K$ robots. Unfortunately, this approach is untenable, since the robots themselves can be represented by $N$ integers. Thus, the act of actually inserting each robot into the priority queue is $O(N \log K)$, and the priority queue itself would take as much as $O(NK)$ space. The entire algorithm would take $O(NK \log K)$ time as well, which is far too slow.

Since many people may initially take this approach, it is natural to ask whether these $K$ cheapest robots can actually be constructed in a reasonable amount of time at all. After all, since there are $K$ different robots to build, each with $N$ microcontrollers, so it may be tempting to assume that there is a lower bound of $O(NK)$ on even representing these robots. However, this is not so, because of the construction that we have given above. Each additional robot that we construct is guaranteed to differ from an existing robot in exactly one location, so it is possible to represent the $K$ robots in $O(N + K)$ space.

That said, we still need to actually construct these $K$ robots somehow. The next step (and a useful technique for programming in general) is to simplify things for ourselves by first solving a simpler problem, and using that solution to then solve the original problem. A natural simpler problem to consider is: if we knew that the most expensive robot we needed to build would cost $K$, could we quickly construct all robots that cost less than that number $K$? It turns out that the answer to this question is yes, and we can do so with a pruned brute-force search.

Some care is needed when enumerating over the possible robots that can be built, or the enumeration may be exponential in runtime (when there are a large number of robots that cost too much to build). To prune the search, we can sort the locations: the order we use is the difference in cost between from the cheapest model of microcontroller and the second-cheapest model that we can install at that location. When we are iterating over locations and we want to spend $K$ money on the remaining locations, we can locate with a binary search exactly which locations where we have the budget to install more expensive microcontrollers, and which locations where we don't. This causes the enumeration to be $O((n + k) \log n)$, since this optimization allows us to completely avoid making recursive calls that do not lead to additional valid robots.

Now that we have a way to enumerate over all robots that cost less than $K$, we need to find the cost of the most expensive robot we need to build. Luckily, we can do this with the exact same procedure: instead of enumerating over the robots that cost less than $K$, we instead simply ask for the number of robots that cost less than $K$ instead. This function is non-decreasing with respect to $K$, so we can binary search on this number in order to find the minimal cost that allows us to produce at least $N$ robots. We then enumerate all $N$ robots given this cost, giving a total runtime of $O((n+k) \log n \log |P|)$.

Mark Gordon's code is below:

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

int N, M;
vector<int> A[100010];
int F[100010];
int totalcount;
long long savings;

void search(int x, long long budget) {
  if (totalcount >= M) {
    return;
  }
  if (x != -1 && budget < A[x][0]) {
    x = upper_bound(F, F + x, (int)budget) - F - 1;
  }
  if (x == -1) {
    totalcount++;
    return;
  }
  search(x - 1, budget);
  for (int i = 0; i < A[x].size() && A[x][i] <= budget; i++) {
    search(x - 1, budget - A[x][i]);
  }
}

void enumerate(int x, long long budget) {
  if (x != -1 && budget < A[x][0]) {
    x = upper_bound(F, F + x, (int)budget) - F - 1;
  }
  if (x == -1) {
    savings += budget + 1;
    return;
  }
  enumerate(x - 1, budget);
  for (int i = 0; i < A[x].size() && A[x][i] <= budget; i++) {
    enumerate(x - 1, budget - A[x][i]);
  }
}

int main() {
  cin >> N >> M;
  
  long long base = 0;
  long long mx = 0;
  for (int i = 0; i < N; i++) {
    int sz; cin >> sz;
    
    vector<int> V(sz);
    for (int j = 0; j < sz; j++) {
      cin >> V[j];
    }
    sort(V.begin(), V.end());
    
    base += V[0];
    if (sz == 1) {
      i--;
      N--;
      continue;
    }
    for (int j = 1; j < sz; j++) {
      A[i].push_back(V[j] - V[0]);
    }
    mx += A[i].back();
  }
  
  sort(A, A + N);
  for (int i = 0; i < N; i++) {
    F[i] = A[i][0];
  }
  
  long long lo = 0;
  long long hi = mx;
  while (lo < hi) {
    long long md = (lo + hi) / 2;
    totalcount = 0;
    search(N - 1, md);
    if (totalcount < M) {
      lo = md + 1;
    } else {
      hi = md;
    }
  }
  
  savings = 0;
  if (lo > 0) {
    enumerate(N - 1, lo - 1);
  }
  cout << (base + lo) * M - savings << endl;
  
  return 0;
}