Find First Occurrence in a String: Efficient Rabin-Karp Algorithm in C++

Find First Occurrence in a String: Efficient Rabin-Karp Algorithm in C++

Introduction:

In programming, finding the index of the first occurrence of a substring within a string is a common problem. One of the efficient algorithms for solving this problem is the Rabin-Karp algorithm.

Problem Statement:

Given two strings needle and haystack, find the index of the first occurrence of the needle in the haystack string. If the needle is not part of the haystack, return -1.

Algorithmic approach:

The Rabin-Karp algorithm is a pattern-matching algorithm that compares the hash value of the pattern with that of the substrings in the text. This algorithm is based on a rolling hash function that calculates the hash value of a substring in constant time by updating the hash value of the previous substring.

The algorithm works by first calculating the hash value of the needle and then the hash value of each substring of length m in the haystack. If the hash values match, the algorithm checks if the needle and the substring are equal. If the needle and the substring are not equal, the algorithm moves to the next substring and calculates its hash value. This process continues until a match is found or all substrings have been searched.

Implementation:

int rabinKarp(string pattern, string text)
{
    int m = needle.length();
    int n = haystack.length();
    int window_start = 0;

    for (window_start = 0; window_start <= n - m; window_start++)
    {
        int i;
        for (i = 0; i < m; i++)
        {
            if (needle[i] != haystack[window_start + i])
            {
                break;
            }
        }
        if (i == m)
        {
            return window_start;
        }
    }
    return -1;
}

Time complexity:

The time complexity of the Rabin-Karp algorithm is O(nm), where n and m are the lengths of the haystack and needle strings, respectively. The reason for the time complexity is that the algorithm needs to calculate the hash value of each substring of length m in the haystack. However, the Rabin-Karp algorithm is still an efficient solution for large strings.

Space complexity:

The space complexity of the Rabin-Karp algorithm is O(1), as the algorithm does not require any additional space for its operation.

Conclusion:

The Rabin-Karp algorithm provides an efficient solution to the problem of finding the index of the first occurrence of a substring in a string. With its O(nm) time complexity, it is a practical solution for large strings. The algorithm's rolling hash function is the key to its efficiency, as it allows for constant-time updates of the hash value of the substring.

Rabin-Karp Algorithm