Let's define $g(x)$ to be the minimum number of operations FJ must perform for some integer $x$, $f(x,a_i)$ denote the closest integer $y$ to $a_i$ such that $y\equiv x \pmod M$ (in case of a tie, choose the smaller candidate), and $h(x,a_i)=|a_i-f(x,a_i)|$. Then $g(x)=\sum_i h(x, a_i)$, and our goal is to compute the minimum value of $g(x)$.
Approach 1:
Subtask 1: $N,M\le 1000$
Since $g(x)$ does not change if we add a multiple of $M$ to $x$, we only need to consider $x\in [0, M-1]$. For each such $x$, we can compute $g(x)$ in $O(N)$ time by iterating over each $i$. This solves a single test case in $O(NM)$ time.
Subtask 2: $N\le 1000$
Let's first consider how to solve the case $M=\infty$. In this case, $f(x,a_i)=x$ and $g(x)$ is a convex function of $x$. If we sort the $a_i$ in increasing order, then it achieves its minimum value of $\sum_{i=N-\lfloor N/2\rfloor +1}^{N}a_i-\sum_{i=1}^{\lfloor N/2\rfloor}a_i$ at $x\in [a_{\lfloor (N+1)/2\rfloor}, a_{\lceil (N+1)/2\rceil}]$.
When $M$ is finite, we can start by replacing each $a_i$ with its remainder modulo $M$ and sorting them in ascending order. Suppose that $g(x)$ attains its minimum value at $x=x_{\min}$. If we apply the $M=\infty$ logic above on the current $a_i$, we will get the correct answer if $f(x_{\min},a_i)$ is equal for all $a_i$. However, it is possible that $f(x_{\min},a_i)$ is equal to one value $v$ for a prefix of $a_i$ and $v+M$ for the remaining suffix, in which case the correct answer could be smaller. We can address this case by iterating over all possible lengths of this prefix, adding $M$ to all elements of this prefix, and then applying the $M=\infty$ logic to get a candidate answer. At the end, we output the minimum of all $N$ candidate answers.
This can be implemented in $O(N^2)$ time per test case. We note that after adding $M$ to the smallest element of $a$, we can maintain the property that $a$ is sorted by just moving the first element of the array to the end.
Ben's Python Implementation:
T = int(input()) for tc in range(T): N, M = map(int, input().split()) A = list(map(int, input().split())) A = sorted([x % M for x in A]) ans = float('inf') for _ in range(N): ans = min(ans, sum(A[N-N//2:]) - sum(A[:N//2])) A = A[1:] + [A[0] + M] # add M to first element of A and move it to the end print(ans)
Full Credit: All arrays in the subtask above that we run the $M=\infty$ solution for are fixed-length subarrays of the sorted array $[a_1,a_2,\dots,a_N,a_1+M,a_2+M, \dots, a_N+M]$. Thus, we can speed up the previous solution by storing prefix sums on this doubled array. The runtime is $O(N\log N)$ per test case due to sorting.
T = int(input()) for tc in range(T): N, M = map(int, input().split()) A = list(map(int, input().split())) A = sorted([x % M for x in A]) A = A + [x + M for x in A] ans = float('inf') cum = [0] for x in A: cum.append(cum[-1] + x) for i in range(N): ans = min(ans, cum[i + N] - cum[i + N - N // 2] - cum[i + N // 2] + cum[i]) print(ans)
Approach 2:
Similar to approach $1$, let's replace each $a_i$ with its remainder modulo $M$ and sort them in ascending order. Suppose we know $x$, we can determine the total number of operations using the following:
The essential observation for this approach is:
Proof: Let's first show the following fact: if $x$ is not equivalent to any $a_i \pmod M $, then $2g(x)\ge g(x-1)+g(x+1)$. By the decomposition of $g$ described above, it suffices to show that $2h(x, a_i) \ge h(x-1, a_i) + h(x+1, a_i)$ for all $i$. Observe that
Suppose $g$ attains its global minimum at $x=x_{\min}$, and that $x_{\min}\in [a_i, a_{i+1}]$ after shifting $x_{\min}$ by an appropriate multiple of $M$ for some $i\in [1,M]$ (let's define $a_{M+1}=a_1+M$). Then the fact we showed above implies that $g$ is concave on the interval $[a_i,a_{i+1}]$, which in turn implies $g(x_{\min})\ge \min(g(a_i), g(a_{i+1}))$. Thus, $g$ attains its global minimum at some $a_i$.
Now, we have limited the number of possible values of $x$ to at most $N$. Let's iterate over all $i$ and calculate the number of operations that must be performed when $a_i = x$. Since all elements in $b$ and $c$ must be in a continuous segment on the modulo circle, we can precompute the sum of all elements in $b$ and $c$ using prefix sums on $a$. To straighten out the modulo circle into a line, we can insert $a_i + M$ to the back of $a$ for each $1 \leq i \leq N$. This way, no minor adjustments with modulo in the expressions are needed. To determine the index that is the closest to the border of arrays $b$ and $c$ (i.e. $x - \lfloor \frac{M}{2} \rfloor)$, we can use binary search.
Chongtian's C++ Implementation:
#include <bits/stdc++.h> using namespace std; using ll = long long; int main(){ int T; cin >> T; while(T--){ int n, m; cin >> n >> m; vector<int> a(n); for(int& i: a){ cin >> i; i %= m; } sort(a.begin(), a.end()); vector<ll> b; for(int i = 0; i < n; i++) b.push_back(a[i]); for(int i = 0; i < n; i++) b.push_back(a[i] + m); int N = (int) b.size(); vector<ll> p(N); p[0] = b[0]; for(int i = 1; i < N; i++) p[i] = p[i-1] + b[i]; ll ans = 1e18; for(int i = 0; i < N; i++){ if(b[i] - m / 2 >= 0 && b[i] + (m - 1) / 2 < 2 * m){ auto lower = lower_bound(b.begin(), b.end(), b[i] - m / 2) - b.begin(); ll lower_sum = (i == 0 ? 0 : p[i - 1]) - (lower - 1 < 0 ? 0 : p[lower - 1]); ll lower_cnt = i - lower; auto upper = upper_bound(b.begin(), b.end(), b[i] + (m - 1) / 2) - b.begin(); ll upper_sum = p[upper - 1] - p[i]; ll upper_cnt = upper - 1 - i; ll res = (lower_cnt * b[i] - lower_sum) + (upper_sum - upper_cnt * b[i]); ans = min(ans, res); } } cout << ans << endl; } }