File Formats for the Atomic Coordinates of the Molecule
- 1 File Formats for the Atomic Coordinates of the Molecule
MOL and SD (Symyx MDL)
MOL = MDL molfile = MOL v2000
SD = SDF = Structure Data Format
- SD files share the MOL format but may contain several structures (separated by lines with $$$$), which will be read by Jmol as multiple models or frames.
MOLv3000 = extended molfile or extended connection table
- This newer format applies to both MOL and SDF, hasn't got the 1000-atom limit and is also supported by Jmol.
Jmol reads MOL v2000 and v3000, and SD files (and can write MOLv2000 files under some circumstances). Original from Molecular Design Limited, then Elsevier MDL, now Symyx Technologies, widely adopted by many other programs. Contains atom coordinates and bonds. V2000 (the most common) is limited to 1000 atoms.
These formats support formal charges and isotopes; both are read by Jmol.
There are often MOL and SD files with two-dimensional data (i.e, all atoms have Z=0); Jmol will read them too, but the resulting flat model will not be realistic. The defining tag (2D or 3D) must be located in line 2, columns 21-22, but is ignored by Jmol, which just uses the Z coordinates provided, be they zero or not.
MOL header lines:
- The first line is reserved for the molecule name and will be so used by Jmol in the popup menu.
- The second line is in principle reserved for information on the originating program, date, user, etc. (Jmol will ignore this line).
- The third line is for comments, and may contain an inline script starting with
Official document (PDF): http://www.mdl.com/downloads/public/ctfile/ctfile.pdf, copied here.
Some extra information on SD files at US EPA DSSTox.
MOL2 (Sybyl, Tripos)
Jmol reads MOL2 files. Original from Tripos. Contains atom coordinates, bonds, substructure information.
This format supports formal charges, partial charges and isotopes, but only partial charges are currrently supported by Jmol.
A single MOL2 file may contain several structures, which will be read by Jmol as multiple models or frames.
Official document: http://www.tripos.com/data/support/mol2.pdf
Jmol reads PDB files (and can write PDB files under some circumstances). Contains atom coordinates and information on biomolecular residues, sequence, chains, hydrogen and disulfide bonds, secondary structure, biologically relevant sites, cofactors. Can also contain temperature factor, formal charge, element symbol, alternate locations.
Files may contain an inline script starting with
(Official Protein Data Bank document) Atomic Coordinate Entry Format. Description: http://www.wwpdb.org/documentation/format23/v2.3.html
Jmol reads XYZ files (and can write XYZ files under some circumstances). Originally from XMol package, but has been widely adopted by many other programs. Contains only atom coordinates (no bonds) and, optionally, charges and vectors (e.g. for atom vibration). Supports multi-model data (multi-frame, animations).
An extension of this format supports isotopes, and they are read by Jmol.
XYZ header lines:
- The first line is reserved for the number of atoms.
- The second line is for comments, and may contain an inline script starting with
Example by Paul Bourke.
- XYZ datafiles specify molecular geometries using a Cartesian coordinate system. This simple, stripped-down, ASCII-readable format is intended to serve as a "transition" format for the XMol series of applications. For example, suppose a molecular datafile was in a format not supported by XMol. In order to read the data into XMol, it would be possible to modify the datafile, perhaps by creating a shell script, so that it fit the relatively lenient requirements of the XYZ format specification. Once data is in XYZ format, it may be examined by XMol, or converted to yet another format.
- The XYZ format supports multi-step datasets. Each step is represented by a two-line "header," followed by one line for each atom. The first line of a step's header is the number of atoms in that step. This integer may be preceded by whitespace; anything on the line after the integer is ignored. The second line of the header leaves room for a descriptive string. This line may be blank, or it may contain some information pertinent to that particular step, but it must exist, and it must be just one line long. Each line of text describing a single atom must contain at least four fields of information, separated by whitespace: the atom's type (a short string of alphanumeric characters), and its x-, y-, and z-positions. Optionally, extra fields may be used to specify a charge for the atom, and/or a vector associated with the atom. If an input line contains five or eight fields, the fifth field is interpreted as the atom's charge; otherwise, a charge of zero is assumed. If an input line contains seven or eight fields, the last three fields are interpreted as the components of a vector. These components should be specified in angstroms.
- Note that the XYZ format doesn't contain connectivity information. This intentional omission allows for greater flexibility: to create an XYZ file, you don't need to know where a molecule's bonds are; you just need to know where its atoms are. Connectivity information is generated automatically for XYZ files as they are read into XMol-related applications. Briefly, if the distance between two atoms is less than the sum of their covalent radii, they are considered bonded.
- Source: man page for XYZ (part of XMol), quoted at http://www.ccl.net/chemistry/resources/messages/1996/10/21.005-dir/index.html
The XYZ reader in Jmol reads any of the following (updated for Jmol v. 11.4.5 and 11.5.41):
Sym x y z Sym x y z vibX vibY vibZ Sym x y z FormalCharge(integer) Sym x y z FormalCharge(integer) vibX vibY vibZ Sym x y z PartialCharge(decimal) Sym x y z PartialCharge(decimal) vibX vibY vibZ
Symis either an element symbol (C, Fe, Si) or an element symbol preceded by a supported isotope number (2H, 13C, etc.)
Jmol reads CIF files. Crystallographic Information File, the official format from the International Union of Crystallography:
- v. 1.0 Original documentation,
Acta Crystallographica A47: 655-685 (1991)
- v. 1.1 the 2003 update.
CIF files may contain an inline script starting with
Jmol reads mmCIF files. Macromolecular Crystallographic Information File, an expanded format to cope with macromolecules. Official documentation.
A complete specification of these formats would be needed to fully implement the reader. If you have those details, please contact the developers team.
Example files supported by Jmol.
Jmol reads GAMESS files (General Atomic and Molecular Electronic Structure System, by Gordon research group at Iowa State University).
Jmol reads only the output format. Recent versions of Jmol application can also export to files in Gaussian input format.
There are example files of Gaussian input, output and log.
Jmol reads Cube files, original from Gaussian software (Gaussian website).
Description of Cube Input and Cube Output formats: http://www.nersc.gov/nusers/resources/software/apps/chemistry/gaussian/g98/00000430.htm
Description by Paul Bourke.
This is not read by Jmol, but might be supported in the future.
File format is called gro or Gromos87. Usual extension is
Description of the format.
You can convert from gro to pdb using the "editconf" program, which is a part of the GROMACS package that can be run from the command line:
editconf -f whatever.gro -o whatever.pdb
HIV / HIN (Hyperchem)
Jmol reads HIV (or HIN) files, the native format of Hyperchem, a software sold by Hypercube Inc..
Jmol reads mopout output files from MOPAC
and the new graphf output from MOPAC2007 (
.mgf files), which
contains coordinates, charges, and molecular orbitals.
openMOPAC, Molecular Orbital PACkage, public domain.
Jmol (11.1.30 or later) reads PQR files.
PQR format is a format based on
pdb, where the occupancy is replaced with the atomic charge and the temperature (or B factor) is replaced with atomic radius (however, the column positions in many pqr files do not match those of pdb files). This gives the acronym: P for pdb, Q for charge, R for radius. Jmol interprets the charge values (property partialcharge) and the radii (property vanderwaals), and can hence use them e.g. in
color atoms partialCharge and
The PQR format has somewhat uncertain origins, but is used by several computational biology packages, including MEAD, AutoDock and APBS, for which it is the primary input format.
PQR format description within APBS documentation. Note that APBS reads PQR loosely, based only on white space delimiters, but Jmol may be more strict about column positions.
PDB files can be converted to PQR by the PDB2PQR software, which adds missing hydrogen atoms and calculates the charge and radius parameters from a variety of force fields.
Jmol (11.7 or later) reads molecular dynamics output files from Amber. The fileset must have a structure like:
1 (topology file) + n (coordinate files)
filter option of the
load command can be used, as well as a new option to allow selective "first,last,step" loading of coordinate trajectories.
(This is preliminary and needs testing). You can see an example.
V3000 (Symyx MDL)
Jmol reads files output from the computational chemistry package Q-Chem. See the Q-Chem specific section.
The JME Molecular Editor (by Peter Ertl) is an applet which allows to draw, edit and display molecules and reactions within a web page. Among its output formats there is a proprietary JME format, formed by a single line of text with atom and bond data. Jmol has a basic support to read this format from a file (only single and multipart structures are supported, but not reactions). Of course, being a drawing program, the structures produced are flat.
This support is rather experimental and not much needed, since JME-ME can also export to 2D MOL format, much better supported by Jmol.