Processing math: 100%
(Analysis by Benjamin Qi, Chongtian Ma)

Let's define g(x) to be the minimum number of operations FJ must perform for some integer x, f(x,ai) denote the closest integer y to ai such that yx(modM) (in case of a tie, choose the smaller candidate), and h(x,ai)=|aif(x,ai)|. Then g(x)=ih(x,ai), and our goal is to compute the minimum value of g(x).

Approach 1:

Subtask 1: N,M1000

Since g(x) does not change if we add a multiple of M to x, we only need to consider x[0,M1]. For each such x, we can compute g(x) in O(N) time by iterating over each i. This solves a single test case in O(NM) time.

Subtask 2: N1000

Let's first consider how to solve the case M=. In this case, f(x,ai)=x and g(x) is a convex function of x. If we sort the ai in increasing order, then it achieves its minimum value of Ni=NN/2+1aiN/2i=1ai at x[a(N+1)/2,a(N+1)/2].

When M is finite, we can start by replacing each ai with its remainder modulo M and sorting them in ascending order. Suppose that g(x) attains its minimum value at x=xmin. If we apply the M= logic above on the current ai, we will get the correct answer if f(xmin,ai) is equal for all ai. However, it is possible that f(xmin,ai) is equal to one value v for a prefix of ai and v+M for the remaining suffix, in which case the correct answer could be smaller. We can address this case by iterating over all possible lengths of this prefix, adding M to all elements of this prefix, and then applying the M= logic to get a candidate answer. At the end, we output the minimum of all N candidate answers.

This can be implemented in O(N2) time per test case. We note that after adding M to the smallest element of a, we can maintain the property that a is sorted by just moving the first element of the array to the end.

Ben's Python Implementation:

T = int(input())
for tc in range(T):
    N, M = map(int, input().split())
    A = list(map(int, input().split()))
    A = sorted([x % M for x in A])
    ans = float('inf')
    for _ in range(N):
        ans = min(ans, sum(A[N-N//2:]) - sum(A[:N//2]))
        A = A[1:] + [A[0] + M]  # add M to first element of A and move it to the end
    print(ans)

Full Credit: All arrays in the subtask above that we run the M= solution for are fixed-length subarrays of the sorted array [a1,a2,,aN,a1+M,a2+M,,aN+M]. Thus, we can speed up the previous solution by storing prefix sums on this doubled array. The runtime is O(NlogN) per test case due to sorting.

T = int(input())
for tc in range(T):
    N, M = map(int, input().split())
    A = list(map(int, input().split()))
    A = sorted([x % M for x in A])
    A = A + [x + M for x in A]
    ans = float('inf')
    cum = [0]
    for x in A:
        cum.append(cum[-1] + x)
    for i in range(N):
        ans = min(ans, cum[i + N] - cum[i + N - N // 2] - cum[i + N // 2] + cum[i])
    print(ans)

Approach 2:

Similar to approach 1, let's replace each ai with its remainder modulo M and sort them in ascending order. Suppose we know x, we can determine the total number of operations using the following:

The essential observation for this approach is:

Proof: Let's first show the following fact: if x is not equivalent to any ai(modM), then 2g(x)g(x1)+g(x+1). By the decomposition of g described above, it suffices to show that 2h(x,ai)h(x1,ai)+h(x+1,ai) for all i. Observe that

max(h(x1,ai),h(x+1,ai))h(x,ai)+1
and
min(h(x1,ai),h(x+1,ai))=h(x,ai)1
since xai(modM). So
2h(x,ai)min(h(x1,ai),h(x+1,ai))+max(h(x1,ai),h(x+1,ai))=h(x1,ai)+h(x+1,ai),
as desired.

Suppose g attains its global minimum at x=xmin, and that xmin[ai,ai+1] after shifting xmin by an appropriate multiple of M for some i[1,M] (let's define aM+1=a1+M). Then the fact we showed above implies that g is concave on the interval [ai,ai+1], which in turn implies g(xmin)min(g(ai),g(ai+1)). Thus, g attains its global minimum at some ai.

Now, we have limited the number of possible values of x to at most N. Let's iterate over all i and calculate the number of operations that must be performed when ai=x. Since all elements in b and c must be in a continuous segment on the modulo circle, we can precompute the sum of all elements in b and c using prefix sums on a. To straighten out the modulo circle into a line, we can insert ai+M to the back of a for each 1iN. This way, no minor adjustments with modulo in the expressions are needed. To determine the index that is the closest to the border of arrays b and c (i.e. xM2), we can use binary search.

Chongtian's C++ Implementation:

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

int main(){
	int T; cin >> T;
	while(T--){	
		int n, m; cin >> n >> m; 
		vector<int> a(n);
		for(int& i: a){
			cin >> i;
			i %= m;
		}
		sort(a.begin(), a.end());
		vector<ll> b;
		for(int i = 0; i < n; i++) b.push_back(a[i]);
		for(int i = 0; i < n; i++) b.push_back(a[i] + m);
		
		int N = (int) b.size();
		vector<ll> p(N);
		p[0] = b[0];
		for(int i = 1; i < N; i++) p[i] = p[i-1] + b[i];
		
		ll ans = 1e18;
		for(int i = 0; i < N; i++){
			if(b[i] - m / 2 >= 0 && b[i] + (m - 1) / 2 < 2 * m){
				auto lower = lower_bound(b.begin(), b.end(), b[i] - m / 2) - b.begin();
				ll lower_sum = (i == 0 ? 0 : p[i - 1]) - (lower - 1 < 0 ? 0 : p[lower - 1]);
				ll lower_cnt = i - lower;
				
				auto upper = upper_bound(b.begin(), b.end(), b[i] + (m - 1) / 2) - b.begin();
				ll upper_sum = p[upper - 1] - p[i];
				ll upper_cnt = upper - 1 - i;

				ll res = (lower_cnt * b[i] - lower_sum) + (upper_sum - upper_cnt * b[i]);
				ans = min(ans, res);
			}
		}
		cout << ans << endl;
	}
}