The mmCIF format is similar to the format for a relational database in which a set of tables are used to organize database records. Each table or field of information is explicitly assigned by a tag and linked to other fields through a special syntax. A single line in mmCIF format of description in the header section of PDB is divided into many lines or fields with each field having explicit assignment of item names and item values. Each field starts with an underscore character followed by category name and keyword description separated by a period. The annotation shows that the data items belong to the category of “struct” or “database.” Following a keyword tag, a short text string enclosed by quotation marks is used to assign values for the keyword. Using multiple fields with tags for the same information has the advantage of providing an explicit reference to each item in a data file and ensures a one-to-one relationship between item names and item values. By showing the data item by item, the format provides much more flexibility for information storage and retrieval.
The PDBx/mmCIF file format and data dictionary is the basis of wwPDB data deposition, annotation, and archiving of PDB data from all supported experimental methods.
The initial CIF (Crystallographic Information File) format and dictionary was developed for archiving small molecule crystallographic experiments. In 1997, the dictionary was expanded (mmCIF) to include data items relevant to macromolecular crystallographic experiments (PDBx/mmCIF). This format overcomes limitations of the legacy PDB file format and supports data representing large structures, complex chemistry, and new and hybrid experimental methods. The legacy PDB file format is no longer modified or extended to support new content. As the PDBx/mmCIF format continues to evolve, PDB format files will become outdated.
PDBx/mmCIF is a powerful format. PDBx/mmCIF explicitly documents all relationships between common data items (e.g. atom and residue identifiers) which allows software applications to evaluate and validate referential integrity with any PDB entry, and maps information between the residue sequences of the experimental sample and the model coordinates. The mmCIF/PDBx Exchange Dictionary provides metadata (e.g. data types, allowed ranges, controlled vocabularies).
mmCIF stands for ‘macro-molecular Crystallographic Information File’. This format was developed by the PDB consortium and the International Union of Crystallography (IUCr), based on Crystallographic Information File (CIF), a format used for describing the structures of small molecules.
mmCIF is a flexible and extensible tag-value format for representing macromolecular structural data. The set of mmCIF tags, which determine the classes of information present in a given mmCIF format file, are defined in an mmCIF dictionary. The structure of the dictionary is in turn defined by a ddl. The standard mmCIF dictionary is now stable since being ratified by the IUCr in 1997 that was pioneered by Paula M. Fitzgerald, Helen Berman, Phil Bourne, Brian McMahon, Keith Watenpaugh, and John Westbrook.