Alignment Format Bioinformatics Bioinformatics File Formats

MEGA (Alignment Format)

Pinterest LinkedIn Tumblr

Different types of sequence alignment formats are currently in use, leading to file-interconversion difficulties where diverse software packages are used. If our alignment is not in a recognized standard format then we will first need to convert it into a suitable one.

MEGA (Alignment Format)

Sequence data or distance data can be entered in MEGA as ASCII-text files. These data must be organized in a format specific to MEGA. These input file formats are consistent and flexible, and they include options for writing extensive comments in the data file.

Currently MEGA supported data file formats include CLUSTAL, 16 NEXUS (PAUP, McClade), 19 PHYLIP (interleaved and non-interleaved), 20 GCG, 21 FastA, PIR, NBRF, MSF, IG and XML (NCBI). The format conversion facility is available in the text file editor. The text file editor is useful for creating and editing ASCII text files and is automatically invoked by MEGA if the input data file processing modules detect errors in the data file format. MEGA supports sequence alignment using both the ClustalW and MUSCLE programs.

The MEGA Text File Editor is similar to the Microsoft Windows’ NotePad and WordPad accessories. An important feature of MEGA is the presence of an input Sequence Data Explorer (SDE), which allows investigators to browse attributes of sequence data and export those data to other formats. The SDE displays sequences in a two- dimensional grid. Faint grey boxes outlining each codon mark protein- coding regions of the DNA sequences

MEGA Format For MEGA to read and interpret our data correctly, it should be formatted according to a set of rules. All input data files are basic ASCII-text files, which may contain DNA sequence, protein sequence, evolutionary distance, or phylogenetic tree data. Most word processing packages (e.g., Microsoft Word, WordPerfect, Notepad, and WordPad) allow us to edit and save ASCII text files, which are usually marked with a .TXT extension. After creating the file, we should change this extension to .MEG, so that we can differentiate between our data files and the other text files. However, there are a number of features that are common to all MEGA data files. 

General Conventions 

The first line must contain the keyword #MEGA to indicate that the data file is in MEGA format. The data file may contain a succinct description of the data, called Title, included in the file on the second line. The Title statement is written according to a set of rules and is copied from MEGA to every output file. If the specified description exceeds 128 characters in length, the additional characters are ignored.

The key words can be written in any combination of lower- and upper-case letters.

In the long run, an informative title will allow us to easily recognize our past work. The data file may also contain a more descriptive multi-line account of the data in the Description statement, which is written after the Title statement. The Description statement is also written according to a set of rules. Unlike the Title statement, the Description statement is not copied from MEGA to every output file. In addition, the data file may also contain a Format statement, which includes information on the type of data present in the file and some of its attributes. The Format statement should be generally written after the Title or the Description statement. Writing a format statement requires knowledge of the keywords used to identify different types of data and data attributes. All taxa names must be written according to a set of rules. Comments can be written anywhere in the data file and can span multiple lines. They must always be enclosed in square brackets ([and]) brackets and can be nested. Nested comments are allowed [[like] this]

Write A Comment