### Minimum Common String Partition Problem: Hardness and Approximations

#### Abstract

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing and compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement.

A *partition* of a string $A$ is a sequence ${\cal P} = (P_1,P_2,\dots,P_m)$ of strings, called the *blocks*, whose concatenation is equal to $A$. Given a partition ${\cal P}$ of a string $A$ and a partition ${\cal Q}$ of a string $B$, we say that the pair $\langle{{\cal P},{\cal Q}}\rangle$ is a *common partition* of $A$ and $B$ if ${\cal Q}$ is a permutation of ${\cal P}$. The *minimum common string partition* problem (*MCSP*) is to find a common partition of two strings $A$ and $B$ with the minimum number of blocks. The restricted version of *MCSP* where each letter occurs at most $k$ times in each input string, is denoted by $k$-*MCSP*.

In this paper, we show that $2$-*MCSP* (and therefore *MCSP*) is NP-hard and, moreover, even APX-hard. We describe a $1.1037$-approximation for $2$-*MCSP* and a linear time $4$-approximation algorithm for $3$-*MCSP*. We are not aware of any better approximations.