SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signaling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa. It is a biological database that is used in the identification and analysis of protein domains within protein sequences. SMART uses profile-hidden Markov models built from multiple sequence alignments to detect protein domains in protein sequences. The most recent release of SMART contains 1,204 domain models. Data from SMART was used in creating the Conserved Domain Database collection and is also distributed as part of the InterPro database. The database is hosted by the European Molecular Biology Laboratory in Heidelberg. The majority of signaling proteins are multi-domain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of the domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome showed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated.
The process of constructing SMART predicted,
- novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases
- previously unknown domain families, including a citron-homology domain
- putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA)
- cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance
- signaling domains in known disease genes such as SPRY domains
- domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria
SMART is created as a user-friendly, information-rich and relatively error-free tool. However, it encourages everyone who:
- has trouble in understanding the SMART output pages,
- wishes to add to the annotation of domain families,
- has suggestions of domains presently not included among the SMART set
- wishes to donate alignments to SMART
Representation of a prediction of the amino acids in tertiary structures of homologues that overlay in three dimensions. Alignments held by SMART are mostly based on published observations but are updated and edited manually.
Block of Alignments
Ungapped alignments that usually represent a single secondary structure.
Alignment scores are reported by HMMer and BLAST as bits scores. The likelihood that the query sequence is a bona fide homologue of the database sequence is compared to the likelihood that the sequence was instead generated by a “random” model.
SMART uses NCBI-BLAST for detection of outlier homologues and homologues of known structure. WU-BLAST is used for nrdb searches with user supplied sequences.
Coiled coils are detected in SMART. Coiled coils predictions are indicated on the second line in SMART’s graphical output.
Conserved structural entities with distinctive secondary structure content and a hydrophobic core. In small disulphide-rich and Zn2+-binding or Ca2+– binding domains the hydrophobic core may be provided by cystines and metal ions, respectively. Homologous domains with common functions usually show sequence similarities.
SMART’s domain annotation pages contain links to the Entrez system thereby providing extensive literature, structure and sequence information.
We can use SMART in two different modes:
Normal or genomic. In Normal SMART, the database contains Swiss-Prot, SP-TrEMBL and stable Ensembl proteomes. In Genomic SMART, only the proteomes of completely sequenced genomes are used; Ensembl for metazoans and Swiss-Prot for the rest.