β-turns are often accessible and generally hydrophilic, two characteristics of antigenic regions. Beta-turns are the most common type of non-repetitive structures, and constitute on average 25% of the residues in all protein chains. The formation of Beta-turns plays an important role in protein folding, protein stability and molecular recognition processes. A high correlation between the tendency for a sequence to form a Beta-turn and the proteins reactivity to anti peptide antibodies has been discovered, and furthermore it has been shown that there is an overrepresentation of Beta-turns in B-cell epitopes. In the area of protein-protein interactions, the formation of Beta-turns types I and II has been shown to be essential for high-affinity binding to SH2-domains.
NetTurnP predicts if an amino acid is located in a Beta-turn or not. NetTurnP is also able to predict the nine Beta-turn subtypes.
All the input sequences must be in one-letter amino acid code.
The sequences can be input in the following two ways:
- Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.
- Select a FASTA file on our local disk, either by typing the file name into the lower window or by browsing the disk.
All pipes ‘|’ will be replaced with an underscore ‘_’ in the name of a fasta-entry eg:
All lowercase letters in a sequence will be changed to uppercase letters.
Submit the job
Click on the “Submit” button. The status of our job (either ‘queued’ or ‘running’) will be displayed and constantly updated until it terminates and the server output appears in the browser window.
At any time during the wait we may enter our e-mail address and simply leave the window. Our job will continue; we will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for us to collect them.
The method consists of two artificial neural network layers. Several second layer network setups were tested in order to find the architecture with the highest cross-validated MCC value based on training set sequences.
First layer networks
Classification artificial neural networks, β-turn-G, were trained to predict whether or not an amino acid was located in a β-turn. Input to the networks was sequence profiles in the form of PSSM, predicted secondary structure and surface accessibility. Using 10-fold cross validation spanning a series of different network architectures, an ensemble was constructed of the best 100 network architectures, determined by cross validation leave-out tests. Furthermore, position specific networks, β-turn-P were also trained in order to increase the predictive performance of the second level networks.
Second layer networks
The output from the first layer networks was used as an input to the second layer networks. The final method uses predictions from the β-turn-P and β-turn-G networks, including secondary structure and relative surface accessibility predictions from NetSurfP. An ensemble of 10 network architectures was selected corresponding to the top ranking network architecture within each of the subsets, based on the leave-out performance. Further increasing the number of architectures in the ensemble did not increase the performance. All performances increased from the first to the second layer networks, except for the sensitivity, which decreased 1.6 percentage points.
Secondary structure and surface accessibility predictions were generated for all protein chains, using the NetSurfP program.
A standard feed-forward procedure was utilized to train the neural networks and a gradient descent method was used to back-propagate the errors where-after weights were updated. A sliding window of amino acids was presented to the neural network and predictions were made for the central position. Altogether 20 different neural network architectures were used. A 10-fold cross-validation procedure was used, thus a total of 200 neural networks.
Amino acids were encoded both using PSSM values, three neurons for predicted helix, strand and coil and one extra neuron for the relative surface accessibility, thus a total of 25 neurons were used to describe an amino acid.