Bioinformatics Bioinformatics Algorithm Sequence Alignment

Hirschberg’s Algorithm

Pinterest LinkedIn Tumblr

In computer science, Hirschberg’s algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the optimal sequence alignment between two strings.

Hirschberg’s algorithm is a generally applicable algorithm for optimal sequence alignment. BLAST and FASTA are suboptimal heuristics. If x and y are strings, where length(x) = n and length(y) = m, the Needleman-Wunsch algorithm finds an optimal alignment in O (nm) time, using O (nm) space. Hirschberg’s algorithm is a clever modification of the Needleman-Wunsch Algorithm which still takes O(nm) time, but needs only O(min{n,m}) space and is much faster in practice. One application of the algorithm is finding sequence alignments of DNA or protein sequences. It is also a space-efficient way to calculate the longest common subsequence between two sets of data such as with the common diff tool.

Hirschberg’s dynamic programming algorithm has the capability of finding the best sequence alignment of two sequences. The algorithm utilizes Levenshtein edit distance which is distance between two strings is the number of single character deletions, insertions, or substitutions required to transform one string into the other. It can also be defined as a modified version of Needleman-Wunsch algorithm, it is more space efficient and uses the divide and conquer strategy.

Finding optimal sequence alignment requires 

computational resources in terms of time and space. For example, Needleman-Wunsch algorithm will take O(nm) time and O(nm) space for finding an optimal sequence alignment of two strings of x and y, where length(x) = n and length(y) = m. On the other hand Hirschberg’s algorithm still takes O(nm) time, but needs only O(min{n,m}) space. Hirschberg’s algorithm is also a space-efficient way to calculate the longest common subsequence between two sets of data such as with the common diff tool.

For calculating the least score (Edit(x,y)) required for changing x into y by using insertions, substitutions and deletions, we give a score to each of the events. For example, we give a score for inserting a character “A” into substring, a score for substituting character “B” with “A” in the string and similarly a score for deleting a character “A” from the string. The standard score can be Ins(a) = Del(a) = 1 for each character a, Sub(a,a) = 0, and Sub(a,b) = 1 if a is not equal to b.

Levenshtein edit distances can be computed using linear space. What we call the “forward subprogram” computes the values of Edit(Prefix[x,i],Prefix[y,j]) for all i and j, just as the Needleman-Wunsch and returns the array {Edit(x,Prefix[y,j])}0 = j = m. The “backward subprogram” is similar, except that the dynamic program is done in the opposite direction, such as starting from the right ends of the strings. It returns the array {Edit(x,Suffix[y,j])}0 = j = m, where Suffix[y,j] is the suffix of y of length j.

Write A Comment