The Basic Local Alignment Search Tool (BLAST) is an algorithm that finds regions of local similarity between amino-acid or nucleotide sequences. A BLAST search enables us to compare a query sequence with a library or database of sequences, identifying sequences in the database that resemble the query sequence above a certain threshold. For example, use BLAST to search for a gene sequence in a whole genome. The basic local alignment tool uses heuristics (rules of thumb) to decrease the search time. It searches for matching words of a specific length and then extends them until it hits a mismatch.
The EBI provides the only BLAST libraries to use the IPD-IMGT/HLA Database. The BLAST tool searches against the nucleotide and protein sequences of HLA alleles and related sequences included in the database. The EBI BLAST engine is now automatically configured for IPD-IMGT/HLA searches when we access it directly from the page. If we are accessing the page from another location or wish to alter the parameters provided please read the guidelines below.
1) Selecting a BLAST engine
The first step is to select the type of search required. In the right-hand side of the table under the “OTHER TYPES” select Nucleotide or Protein Databases, this defines the data banks available for searching.
2) Selecting a database
The IPD-IMGT/HLA sequences are only available from IPD-IMGT/HLA Databases in BLAST. The component entries are available from the EMBL-Bank Databases. The cDNA nucleotide sequences can be found under the IMGT tab, and then under the “IMGT/HLA (cds)” option. The gDNA nucleotide sequences can be found under the IMGT tab, and then under the “IMGT/HLA (genomic)” option. The protein sequences can be found under “Other Databases” and the “IMGT/HLA” option. These databases contain all the sequences in the current release of the IPD-IMGT/HLA Database.
If the menus do not automatically update when different BLAST programs are chosen, this is most likely a local browser problem.
3) Pasting a sequence
The sequence can be entered using either a “cut & paste” method or the file upload option. The sequence should ideally be in FASTA format. This means a single sequence with the first line starting with a greater than sign (>). The rest of this line is used to name the sequence and can contain spaces but should not contain numbers or any form of formatting i.e. asterisks (*) or periods (.) as BLAST considers these invalid nucleotides. The IUB (International Union of Biochemists) codes are acceptable in a BLAST search, however, if the sequence contains more than 50% ambiguity codes it will most likely be rejected, but if successful may contain false positive hits which are of limited use.
The BLAST engine is designed for searching for sequence similarities over large sequences. Searching the databases with short sequences may result in an error. The minimum length is 11 bases, however the recommended minimum sequence length is 22 bases (nucleotides/blastn) or 6 amino acids (proteins/blastp). Some searches under this recommended size may run but even single mismatches can cause the search to fail.
4) Search Errors
Searching for any matches to intron sequences will result in an error. The IPD-IMGT/HLA does not currently contain any intron sequence and so these are not available through the BLAST engine. The BLAST engine can currently only search coding sequences. If we need to search for an intron sequence the EMBL database should be used until the IPD-IMGT/HLA Database incorporates an intronic sequence.
The BLAST scoring system can sometimes distort the results for example a 546 base pair sequence of ~95% identity may score higher than a shorter (270 bps for example) sequence of 100% identity. Therefore the top results may not always be the best match. This is due to the high degree of similarity between HLA sequences.
The Blast result page shows an overview of the results, colour coded by sequence identity in descending order by score. We can re-order the results by E-Value, Score or Identity.
Filters and views
We can use filters on the left hand side to narrow down our search results, e.g. to limit the results to a particular species. We can also map the results to UniProt protein databases UniProtKB, UniRef and UniParc. We can view results by taxonomy or in plain text format.
More detail about each result can be seen in the ‘Alignments’ table under the ‘Overview’ section, showing the query sequence aligned to each subject sequence.
We can view each alignment in more detail by clicking on the graphic or on the ‘view alignment’ link. You can also see information about E-Value, Score and Identity. To add more information, click on the ‘edit columns’ button.