Pattern Search Problem

There are several algorithm to solve pattern search from a provided string problem.

To solve such problem using Naive method(for all the char of the provided string check whether the pattern  contains or not ). It takes O(m*n),here m = string length of pattern and n= string length of the text.

There are several algorithm to solve such problem ,like KMP (Knuth-Morris-Pratt) agorithm,Boyer-Moore String Search algorithm,Rabin-Karp String Search algorithm,Aho-corasick String Match algorithm .

KMP (Knuth-Morris-Pratt) agorithm:

The two string,

txt[] = “AAAAAAAAAAAAAAAAAB”

pat[] = “AAAAB”

If we want to check ,is Contain pat[] in txt[] using KMP algorithm?

To avoid single sliding procedure like naive algorithm we first determine a prefix array of pattern,

What is Prefix array?

An array which indicates the longest proper prefix length which is also the longest proper suffix length.

Proper prefix/suffix is the prefix/suffix that is not the string itself.

So our first step is to find the longest common (proper)prefix and suffix of the sub patterns starting from 0 to m-1.

So if the pattern[]=”ABABAB”; length = 6;

Our target Prefix array will be of size 6,PreArr[6]; Each of the entry of the array indicates the longest common proper prefix and suffix of the sub pattern length [0 to i].

Now, PreArr[0]=0 ,as it is the empty string,

PreArr[1] = 0, cause: in “AB” prefix is “A” but suffix is “B”, so no common prefix/suffix;

PreArr[2]=1;cause: in “ABA” prefix : A,AB ; suffix : BA,A ; so common prefix/suffix = A;

PreArr[3]=2;cause: in “ABAB” prefix : A,AB,ABA ; suffix : BAB,A,AB ; so common prefix/suffix = AB;

PreArr[4]=3;cause: in “ABABA” prefix : A,AB,ABA,ABAB ; suffix : ABA,BABA,BA,A ; so common prefix/suffix = ABA;

PreArr[5]=0,cause: in “ABABAB” prefix : A,AB,ABA,ABAB,ABABA ; suffix : BABAB,ABAB,BAB,BA,B,; so no common prefix/suffix ;

To determine the common longest prefix and suffix array length require O(m*m) time complexity.

There is an O(m) time complexity way to determine prefix array.

 

After computing the prefix array we use that array to find the pattern.

How we use the prefix array?

when we find a match we increment both txt[] string index and pattern string index.

If we find a mismatch we lookup the prefix array to find the longest common prefix and suffix for the matched index that is j-1(as previously matched) and we start to match checking from that lps[j-1] index of the pattern.

At worst case: when all index values of prefix array is 0 time complexity is as naive method that is

O(m*m).

Related Problems are:

uva 10679 , uva 11512 , uva 719 ,uva 11107 , uva 11507

Leave a Reply

Your email address will not be published. Required fields are marked *