For N≤20, any reasonable polynomial-time solution should work. One possible approach is to calculate the result for all n≤N,k≤(n2) in O(N7).
For additional points, we should find a way to compute di(a) without explicitly constructing the tree. The key condition is that j is an ancestor of i if a[j]=min(a[i…j]), so it follows that
First, we start with a sequence consisting of a[i] only. Then a[i+1] can be either greater than a[i] or less than a[i], contributing 0 or 1 inversion. Then a[i+2] can take on any of three different values relative to a[i] and a[i+1], contributing anywhere from 0 to 2 inversions. Continuing in this fashion, the possible numbers of inversions in the sub-permutation a[i…j−1] can be represented by the polynomial product
Adding a[j] contributes j−i inversions regardless of how many inversions a[i…j−1] has. Then we should add the remaining elements of the permutation, each of which can go anywhere in the sorted order. Thus, the final result is given by the generating function
After dividing, all we need is the coefficient of xk−(j−i). Since the product depends only on j−i, we only need to do N different divisions. Alternatively, we can maintain prefix and suffix products without needing to do division. The process for i>j is almost exactly the same, except a[j] contributes 0 inversions rather than i−j.
The whole solution runs in O(N3) time and O(N2) memory. My code follows. It turns out that M being prime is irrelevant ...
#include <bits/stdc++.h> using namespace std; typedef long long ll; typedef vector<int> vi; #define FOR(i,a,b) for (int i = (a); i < (b); ++i) #define F0R(i,a) FOR(i,0,a) #define ROF(i,a,b) for (int i = (b)-1; i >= (a); --i) #define R0F(i,a) ROF(i,0,a) #define trav(a,x) for (auto& a: x) #define pb push_back #define rsz resize #define sz(x) int(x.size()) void setIO(string name) { ios_base::sync_with_stdio(0); cin.tie(0); freopen((name+".in").c_str(),"r",stdin); freopen((name+".out").c_str(),"w",stdout); } int MOD; int n,k; typedef int T; struct mi { T val; mi() { val = 0; } mi(const ll& v) { val = (-MOD <= v && v <= MOD) ? v : v % MOD; if (val < 0) val += MOD; } mi& operator+=(const mi& m) { if ((val += m.val) >= MOD) val -= MOD; return *this; } mi& operator-=(const mi& m) { if ((val -= m.val) < 0) val += MOD; return *this; } }; typedef vector<mi> vmi; void ad(vmi& a, int b) { // multiply by (x^0+x^1+...+x^{b-1}) a.rsz(sz(a)+b-1); R0F(i,sz(a)-b) a[i+b] -= a[i]; FOR(i,1,sz(a)) a[i] += a[i-1]; } void sub(vmi& a, int b) { ROF(i,1,sz(a)) a[i] -= a[i-1]; F0R(i,sz(a)-b) a[i+b] += a[i]; a.rsz(sz(a)-b+1); } mi get(vmi& a, int b) { if (b < 0 || b >= sz(a)) return 0; return a[b]; } int main() { setIO("treedepth"); cin >> n >> k >> MOD; vmi v = {1}; FOR(i,1,n+1) ad(v,i); vmi ans(n,v[k]); FOR(dif,1,n) { sub(v,dif+1); mi x = get(v,k-dif), y = get(v,k); ad(v,dif+1); F0R(a,n-dif) { ans[a] += x; ans[a+dif] += y; } } F0R(i,n) cout << ans[i].val << ' '; }