Consider minimizing the number of transpositions to make a single array $a$ into a palindrome. We will assume for convenience that the number of 'G's in $a$ is an even number $2k$. The case where the number of 'G's is odd is essentially the same, except that you have to account for moving the middle 'G' to the center (and using $-1$ instead if the length of $a$ isn't odd).

Notice that we should never swap two adjacent 'G's as it doesn't change the array. It follows that the 'G's in $a$, relative to each other, stay in the same order as they were originally, meaning that to make $a$ into a palindrome, we want to match the first 'G' in $a$ with the last 'G', the second 'G' with the second to last 'G' and so on.

Therefore, for $1 \leq j \leq k$, let $a_j$ be the position of the $j$th 'G' in
$a$ and let $b_j$ be the position **counted from the end** of the $j$th to
last 'G' in $a$. The number of 'H's from the beginning to the $j$th 'G' is
$a_j - j$, and the number of 'H's from the $j$th to last 'G' to the end is
$b_j - j$. To make them match, we have to make the amount of 'H's between each
and their respective ends the same, which requires
$|(a_j - j) - (b_j - j)| = |a_j - b_j|$ transpositions.

Given this, to solve the problem in $\mathcal O(N^3)$ we can simply iterate over the $O(N^2)$ subarrays, and for each one calculate the values of $a_j, b_j$, then simply sum $|a_j - b_j|$ over all $j$ in $\mathcal O(N)$.

import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; public class Palindromes { public static void main(String[] args) throws IOException { BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); char[] lineup = in.readLine().toCharArray(); long answer = 0; for (int l = 0; l < lineup.length; l++) { for (int r = l; r < lineup.length; r++) { long here = 0; int a = l; int b = r; while (true) { while (a < lineup.length && lineup[a] == 'H') { a++; } while (b >= 0 && lineup[b] == 'H') { b--; } if (a > b) { break; } if (a == b) { if ((l - r) % 2 == 0) { here += Math.abs(a - ((l + r) / 2)); } else { here = -1; } break; } here += Math.abs((a - l) - (r - b)); a++; b--; } answer += here; } } System.out.println(answer); } }

To optimize the runtime to $O(N^2)$, we can consider fixing the middle two 'G's, then calculating the answer for all corresponding subarrays more quickly. Say that the middle two 'G's we fixed are at positions $x < y$ in the entire array. The idea is that when we fix the middle two 'G's, every single 'G' is matched to the same other 'G': the $j$th 'G' to the left of $a_x$ is always matched to the $j$th 'G' to the right of $a_y$.

Following this idea, let's define $u_j$ to be the position of the $j$th 'G' to the left of position $x$, and $v_j$ to be the position of the $j$th 'G' to the right of $y$. Given that the subarray we are currently considering has endpoints $l \leq r$, the positions of these 'G's in the subarray are $u_j - l + 1$ counted from the beginning and $r - v_j + 1$ counted from the end respectively. Therefore, the number of transpositions needed to match them is

$$|(u_j - l + 1) - (r - v_j + 1)| = |(u_j + v_j) - (l + r)|.$$

Consider this quantity as a function of $l + r$. Specifically, let's write
$$f_j(s) = |(u_j + v_j) - s|.$$

When you increase $s$ by $1$, $f_j(s)$ decreases by $1$ for $s < u_j + v_j$ and
increases by $1$ for $s \ge u_j + v_j$. This means that if we know $f_j(s)$, we
can calculate $f_j(s + 1)$ by simply adding $1$ or $-1$, which takes constant
time.
We can extend this idea to $F(s) = \sum_{j = 1}^k f_j(s)$, which is actually the number of transpositions needed for a subarray $a[l..r]$ with $l + r = s$ with $2k$ 'G's such that $a_x$ and $a_y$ are the middle 'G's. When we increase $s$ by $1$, $F(s)$ increases by $1$ for all $j$ such that $s \geq u_j + v_j$, and decreases by $1$ for other $j$. Let's quantify this difference as $d(s) = F(s + 1) - F(s)$. If we maintain $d(s)$, then we can update $F(s)$ to $F(s + 1)$ by simply adding $d(s)$. To then be able to calculate $F(s)$ for higher $s$, we need to be able to update $d(s)$ to $d(s + 1)$ as well, but to do that we simply need to add $2$ for each $j$ such that $u_j + v_j = s + 1$; we can do this easily by simply storing an array $e(s)$ that counts the amount of $j$ such that $u_j + v_j = s$, and calculating $d(s + 1)$ as $d(s) + 2e(s + 1)$.

Therefore, our algorithm will be as follows. For each adjacent pair of 'G's $a_x$ and $a_y$ (there can be 'H's between them but no 'G's), initialize an array $e$ to be all $0$s. We will then compute the answers for subarrays centered at $a_x, a_y$ in phases. For each $k$ starting from $1$, we update $e$ by adding $1$ to $e(u_k + v_k)$. Then, we compute the answers for all subarrays $a[l..r]$ such that $u_{k + 1} < l \leq u_k$ and $v_k \leq r < v_{k + 1}$. Starting with $l = u_k$ and $r = v_k$, we maintain the values of $F(l + r)$ and $d(l + r)$. We repeatedly increase $r$ by $1$, updating $F(l + r)$ and $d(l + r)$ in constant time as we explained above, and importantly adding $F(l + r)$ to our overall answer, until we reach $r = v_{k + 1}$, at which we decrease $r$ back down to $v_k$ in a similar manner. We then decrease $l$ by $1$, and repeat, until we reach $l = u_{k + 1}$. At that point, we've calculated the contributions of all the subarrays that we wanted to.

In terms of runtime, we aren't guaranteed that a single step of fixing the middle two 'G's takes $\mathcal O(N)$ -- even a single phase could take $O(N^2)$ -- but overall, we only take constant time to compute the answer for each subarray, meaning that the overall runtime is $O(N^2)$. We also need to initialize the array $e$ which needs $O(N)$ space, but since we fix the middle two 'G's less than $N$ times, this is also $O(N^2)$.

import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; public class Palindromes { public static void main(String[] args) throws IOException { BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); char[] lineup = in.readLine().toCharArray(); int[] gs = new int[lineup.length]; int amountG = 0; for (int j = 0; j < lineup.length; j++) { if (lineup[j] == 'G') { gs[amountG] = j; amountG++; } } long answer = 0; for (int leftCenter = 0; leftCenter < amountG; leftCenter++) { for (int rightCenter = leftCenter; rightCenter <= leftCenter + 1 && rightCenter < amountG; rightCenter++) { long[] e = new long[2 * lineup.length]; long d = 0; long F = 0; for (int k = 0; leftCenter - k >= 0 && rightCenter + k < amountG; k++) { int uk = gs[leftCenter - k]; int uk1 = leftCenter - k == 0 ? -1 : gs[leftCenter - (k + 1)]; int vk = gs[rightCenter + k]; int vk1 = rightCenter + (k + 1) == amountG ? lineup.length : gs[rightCenter + (k + 1)]; if (uk < vk) { e[uk + vk] += 2; d++; } for (int l = uk; l > uk1; l--) { for (int r = vk; r < vk1; r++) { if (leftCenter == rightCenter) { if ((r - l) % 2 == 0) { answer += F + ((long) Math.abs(((r + l) / 2) - gs[leftCenter])); } else { answer--; } } else { answer += F; } F += d; d += e[l + r + 1]; } for (int r = vk1; r > vk; r--) { d -= e[l + r]; F -= d; } d -= e[l + vk]; F -= d; } for (int r = vk; r < vk1; r++) { F += d; d += e[uk1 + r + 1]; } } } } System.out.println(answer); } }