Almost all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. The SCOP database, created by manual inspection and assisted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.
A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are considered to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in “families”, and are assumed to have a closer common ancestor.
The SCOP database is freely available on the internet. SCOP was created in 1994 in the Centre for Protein Engineering and the Laboratory of Molecular Biology.
The new Structural Classification of Proteins version 2 (SCOP2) database was released at the beginning of 2020.
The levels of SCOP are as follows;
- Class: Types of folds, e.g., beta sheets.
- Fold: The different shapes of domains within a class.
- Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor.
- Family: The domains in a superfamily are grouped into families, which have a more recent common ancestor.
- Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein.
- Species: The domains in “protein domains” are grouped according to species.
- Domain: part of a protein. For simple proteins, it can be the entire protein.
The primary purpose of SCOP was to help experimental structural biologists in the analysis and exploration of protein structures similar to their proteins of research.
In the last years, SCOP has been used to address many questions in structural biology and is further employed as training and gold-standard databases making them invaluable resources in structural bioinformatics. They have been used to study the interplay of protein structure and protein sequence evolution or to explore the connection between alternative splicing and protein structure evolution.
Most of the protein relationships in the current SCOP classification are trivial and can be described as a hierarchical tree in which protein family and superfamily domains are grouped according to their structural similarity and evolutionary divergence and their boundaries correlate with each other. The first scenario is when a family domain spans two or more structural domains each of which belong to a distinct superfamily, e.g. the combination and arrangement of these domains evolved within, and is typical for this family of proteins. The second scenario is when a family contains domains that are topologically more similar to another distinct fold than to the fold of the other superfamily domains.