(Analysis by Avnith Vijayram)

For convenience, we will use $PBT$ as an acronym for perfect binary tree.

Root the tree at node $1$. Consider counting just the number of ways to create a forest of $PBT$s such that the root of each $PBT$ is the closest node in the $PBT$ to node $1$ in the original tree. Since all $PBT$s are oriented the same way, to build up larger $PBT$s, we need to merge two smaller $PBT$s at their roots. This motivates the following DP.

At node $x$, we can either create a new $PBT$ of height $0$ or merge $PBT$s of height $h-1$ from two of $x$'s children. For all children $c$ not used in creating the $PBT$ at $x$, we multiply by the product of $ans[c]$ for each of those subtrees. This gives us the following recurrences.

Here, $C$ is the set of children of node $x$ and the bottom sum is over all unordered pairs of children $c_1$ and $c_2$ of node $x$. The answer at $x$ can be calculated as

Now, we need to extend this to cover $PBT$s where the root of the $PBT$ is not the closest to node $1$. Suppose that we have a $PBT$ of height $h$ rooted at node $r$, but node $s \neq r$ is the closest node to node $1$. Let $r = x_0, x_1, \ldots, x_{d-1}, x_d = s$ be the path from node $r$ to node $s$, where $d$ is the distance between nodes $r$ and $s$. We can observe the following structure relative to the original tree rooted at node $1$.

This motivates us to define the following DP.

The observations about the structure of the $PBT$ can now be translated into transitions:

Note that $c_2$ and $c_3$ in the last transition are indistinguishable and their order doesn't matter, but all other children are distinguishable. To compute these sums efficiently, for each node $x$, we can define the following table.

We can use DP to compute this table for $0 \le i \le 2$ and $0 \le j \le 1$ by processing one child at a time. All of the sums that we need for our transitions can be found somewhere in this table.

Since computing the DP at each node takes $O(|C| \log N)$, the total time complexity is $O(N \log N)$.

My code:

#include <bits/stdc++.h>
using namespace std;
 
using ll = long long;
template<class T> using V = vector<T>;
const ll M = 1e9+7;
const int LGN = 17;
 
void solve() {
    int n; cin >> n;
    V<V<int>> adj(n);
    for (int i = 0; i < n-1; i++) {
        int u, v; cin >> u >> v; u--; v--;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
 
    V<ll> ans(n, 0);
    V<V<ll>> dp(LGN, V<ll>(n, 0)), pd(LGN, V<ll>(n, 0));
 
    auto dfs = [&](auto&& dfs, int x, int p) -> void {
        ll sums[LGN][3][2];
        memset(sums, 0, sizeof(sums));
        for (int i = 0; i < LGN; i++) sums[i][0][0] = 1;
 
        for (auto y : adj[x]) {
            if (y == p) continue;
            dfs(dfs, y, x);
            
            for (int i = 0; i < LGN; i++) {
                for (int j = 2; j >= 0; j--) {
                    for (int k = 1; k >= 0; k--) {
                        if (j < 2) {
                            sums[i][j+1][k] += (i > 0 ? dp[i-1][y] : 0LL) * sums[i][j][k] % M;
                            sums[i][j+1][k] %= M;
                        }                        
                        if (k < 1) {
                            sums[i][j][k+1] += pd[i][y] * sums[i][j][k] % M;
                            sums[i][j][k+1] %= M;
                        }
                        sums[i][j][k] *= ans[y];
                        sums[i][j][k] %= M;
                    }
                }
            }
        }
        
        dp[0][x] += sums[0][0][0];
        for (int i = 0; i < LGN; i++) {
            if (i < LGN-1) dp[i+1][x] += sums[i+1][2][0];
               
            if (i < LGN-1) pd[i][x] += sums[i+1][1][0];
            if (i > 0) pd[i-1][x] += sums[i][1][1];
 
            ans[x] += dp[i][x];
            if (i == 0) ans[x] += sums[i][0][1];
            else ans[x] += sums[i][2][1];
        }
        for (int i = 0; i < LGN; i++) {
            dp[i][x] %= M;
            pd[i][x] %= M; 
        }
        ans[x] %= M;
    };
 
    dfs(dfs, 0, -1);
    cout << ans[0] << endl;
}
 
signed main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
 
    int t; cin >> t;
    while (t--) solve();
}