For convenience, we will use $PBT$ as an acronym for perfect binary tree.
Root the tree at node $1$. Consider counting just the number of ways to create a forest of $PBT$s such that the root of each $PBT$ is the closest node in the $PBT$ to node $1$ in the original tree. Since all $PBT$s are oriented the same way, to build up larger $PBT$s, we need to merge two smaller $PBT$s at their roots. This motivates the following DP.
At node $x$, we can either create a new $PBT$ of height $0$ or merge $PBT$s of height $h-1$ from two of $x$'s children. For all children $c$ not used in creating the $PBT$ at $x$, we multiply by the product of $ans[c]$ for each of those subtrees. This gives us the following recurrences.
Here, $C$ is the set of children of node $x$ and the bottom sum is over all unordered pairs of children $c_1$ and $c_2$ of node $x$. The answer at $x$ can be calculated as
Now, we need to extend this to cover $PBT$s where the root of the $PBT$ is not the closest to node $1$. Suppose that we have a $PBT$ of height $h$ rooted at node $r$, but node $s \neq r$ is the closest node to node $1$. Let $r = x_0, x_1, \ldots, x_{d-1}, x_d = s$ be the path from node $r$ to node $s$, where $d$ is the distance between nodes $r$ and $s$. We can observe the following structure relative to the original tree rooted at node $1$.
This motivates us to define the following DP.
The observations about the structure of the $PBT$ can now be translated into transitions:
Note that $c_2$ and $c_3$ in the last transition are indistinguishable and their order doesn't matter, but all other children are distinguishable. To compute these sums efficiently, for each node $x$, we can define the following table.
We can use DP to compute this table for $0 \le i \le 2$ and $0 \le j \le 1$ by processing one child at a time. All of the sums that we need for our transitions can be found somewhere in this table.
Since computing the DP at each node takes $O(|C| \log N)$, the total time complexity is $O(N \log N)$.
My code:
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
template<class T> using V = vector<T>;
const ll M = 1e9+7;
const int LGN = 17;
void solve() {
int n; cin >> n;
V<V<int>> adj(n);
for (int i = 0; i < n-1; i++) {
int u, v; cin >> u >> v; u--; v--;
adj[u].push_back(v);
adj[v].push_back(u);
}
V<ll> ans(n, 0);
V<V<ll>> dp(LGN, V<ll>(n, 0)), pd(LGN, V<ll>(n, 0));
auto dfs = [&](auto&& dfs, int x, int p) -> void {
ll sums[LGN][3][2];
memset(sums, 0, sizeof(sums));
for (int i = 0; i < LGN; i++) sums[i][0][0] = 1;
for (auto y : adj[x]) {
if (y == p) continue;
dfs(dfs, y, x);
for (int i = 0; i < LGN; i++) {
for (int j = 2; j >= 0; j--) {
for (int k = 1; k >= 0; k--) {
if (j < 2) {
sums[i][j+1][k] += (i > 0 ? dp[i-1][y] : 0LL) * sums[i][j][k] % M;
sums[i][j+1][k] %= M;
}
if (k < 1) {
sums[i][j][k+1] += pd[i][y] * sums[i][j][k] % M;
sums[i][j][k+1] %= M;
}
sums[i][j][k] *= ans[y];
sums[i][j][k] %= M;
}
}
}
}
dp[0][x] += sums[0][0][0];
for (int i = 0; i < LGN; i++) {
if (i < LGN-1) dp[i+1][x] += sums[i+1][2][0];
if (i < LGN-1) pd[i][x] += sums[i+1][1][0];
if (i > 0) pd[i-1][x] += sums[i][1][1];
ans[x] += dp[i][x];
if (i == 0) ans[x] += sums[i][0][1];
else ans[x] += sums[i][2][1];
}
for (int i = 0; i < LGN; i++) {
dp[i][x] %= M;
pd[i][x] %= M;
}
ans[x] %= M;
};
dfs(dfs, 0, -1);
cout << ans[0] << endl;
}
signed main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int t; cin >> t;
while (t--) solve();
}