Minimum Common String Partition Problem: Hardness and Approximations

  • Avraham Goldstein
  • Petr Kolman
  • Jie Zheng

Abstract

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing and compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement.

A partition of a string $A$ is a sequence ${\cal P} = (P_1,P_2,\dots,P_m)$ of strings, called the blocks, whose concatenation is equal to $A$. Given a partition ${\cal P}$ of a string $A$ and a partition ${\cal Q}$ of a string $B$, we say that the pair $\langle{{\cal P},{\cal Q}}\rangle$ is a common partition of $A$ and $B$ if ${\cal Q}$ is a permutation of ${\cal P}$. The minimum common string partition problem (MCSP) is to find a common partition of two strings $A$ and $B$ with the minimum number of blocks. The restricted version of MCSP where each letter occurs at most $k$ times in each input string, is denoted by $k$-MCSP.

In this paper, we show that $2$-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a $1.1037$-approximation for $2$-MCSP and a linear time $4$-approximation algorithm for $3$-MCSP. We are not aware of any better approximations.

Published
2005-09-29
Article Number
R50