Overlaps between strings are crucial in many areas of computer science, such as bioinformatics, code design, and stringology. A self overlapping string is characterized by its periods and borders. A period of a string u is the starting position of a suffix of u that is also a prefix u, and such a suffix is called a border. Each word of length, say n>0, has a set of periods, but not all combinations of integers are sets of periods. The question we address is how to compute the set, denoted Gamma(n), of all period sets of strings of length n. Computing the period set for all possible words of length n is clearly prohibitive. The cardinality of Gamma(n) is exponential in n. One dynamic programming algorithm exists for enumerating Gamma(n), but it suffers from an expensive space complexity. After stating some combinatorial properties of period sets, we will present a novel algorithm that computes Gamma(n) from Gamma(n-1), for any length n>1. The period set of a string u is a key information for computing the absence probability of u in random texts. Moreover, computing Gamma(n) is useful for assessing the significance of word statistics, like the number of k-mers shared between two texts, or the number of missing k-mers in one text. Besides applications, investigating Gamma(n) is interesting per se as it unveils combinatorial properties of string overlaps.
Incremental algorithms for computing the set of period sets. The dynamics of period set
