Subcellular localization is a necessary part of protein functionality. Many proteins have functions only after being transported to certain compartments of the cell. The study of the mechanism of protein trafﬁcking and subcellular localization is the ﬁeld of protein sorting (also known as protein targeting), which has become one of the central themes in modern cell biology. Identifying protein subcellular localization is an important aspect of functional annotation, because knowing the cellular localization of a protein often helps to narrow down its putative functions. For many eukaryotic proteins, newly synthesized protein precursors have to be transported to speciﬁc membrane-bound compartments and be proteolytically processed to become functional. These compartments include chloroplasts, mitochondria, the nucleus and peroxisomes. To carry out protein translocation, unique peptide signals have to be present in the nascent proteins, which function as “zip codes” that direct the proteins to each of these compartments. Once the proteins are translocated within the organelles, protease cleavage takes place to remove the signal sequences and generate mature proteins. Even in prokaryotes, proteins can be targeted to the inner or outer membranes, the periplasmic space between these membranes, or the extracellular space. The sorting of these proteins is similar to that in eukaryotes and relies on the presence of signal peptides. The signal sequences have a weak consensus but have some speciﬁc features. They all have a hydrophobic core region preceded by one or more positively charged residues. However, the length and sequence of the signal sequences vary tremendously. Peptides targeting mitochondria, for example, are located in the N-terminal region. The sequences are typically twenty to eighty residues long, rich in positively charged residues such as arginines as well as hydroxyl residues such as serines and threonines, but devoid of negatively charged residues and have the tendency to form amphiphilicα-helices. These targeting sequences are cleaved once the precursor proteins are inside the mitochondria. Chloroplast localization signals (also called transit peptides) are also located in the N-terminus and are about 25 to100 residues in length, containing very few negatively charged residues but many hydroxylated residues such as serine. An interesting feature of the proteins targeted for the chloroplasts is that the transit signals are bipartite. That is, they consist of two adjacent signal peptides, one for targeting the proteins to the stroma portion of the chloroplast before being cleaved and the other for targeting the remaining portion of the proteins to the thylakoids. Localization signals targeting the nucleus are variable in length (seven to forty-one residues) and are found in the internal region of the proteins. They typically consist of one or two stretches of basic residues with a consensus motif K (K/R)X(K/R). Nuclear signal sequences are not cleaved after protein transport. Considerable variations in length and sequence make accurate prediction of signal peptides using computational approaches difﬁcult. Nonetheless, various computational methods have been developed to predict the subcellular localization signals. In general, they fall within three categories. Some algorithms are signal based, depending on the knowledge of charge, hydrophobicity, or consensus motifs. Some are content based, depending on the sequence statistics such as amino acid composition. The third group of algorithms combines the virtue of both signals and content and appears to be more successful in prediction. Neural network and HMM-based algorithms are examples of the combined approach.
These programs predict both the signal peptides and the protease cleavage sites of the query sequence.