Schema for RGD Genes - RGD Genes
  Database: rn4    Primary Table: rgdGene2    Row Count: 17,487   Data last updated: 2011-12-07
Format description: A gene prediction.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
name RGD:1565877varchar(255) values Name of gene
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand -char(1) values + or - for strand
txStart 98271int(10) unsigned range Transcription start position (or end position for minus strand item)
txEnd 142335int(10) unsigned range Transcription end position (or start position for minus strand item)
cdsStart 98949int(10) unsigned range Coding region start (or end position for minus strand item)
cdsEnd 142335int(10) unsigned range Coding region end (or start position for minus strand item)
exonCount 6int(10) unsigned range Number of exons
exonStarts 98271,122723,127935,129576,...longblob   Exon start positions (or end positions for minus strand item)
exonEnds 99351,122826,128045,129679,...longblob   Exon end positions (or start positions for minus strand item)

Connected Tables and Joining Fields
        rn4.affyExonTissuesGsMedianDistRgdGene2.query (via rgdGene2.name)
      rn4.affyExonTissuesGsMedianDistRgdGene2.target (via rgdGene2.name)
      rn4.affyExonTissuesGsMedianRgdGene2.name (via rgdGene2.name)
      rn4.affyExonTissuesGsRgdGene2.name (via rgdGene2.name)
      rn4.ceBlastTab.query (via rgdGene2.name)
      rn4.dmBlastTab.query (via rgdGene2.name)
      rn4.drBlastTab.query (via rgdGene2.name)
      rn4.gnfAtlas2RgdGene2Distance.query (via rgdGene2.name)
      rn4.gnfAtlas2RgdGene2Distance.target (via rgdGene2.name)
      rn4.hgBlastTab.query (via rgdGene2.name)
      rn4.mmBlastTab.query (via rgdGene2.name)
      rn4.rgdGene2BlastTab.query (via rgdGene2.name)
      rn4.rgdGene2BlastTab.target (via rgdGene2.name)
      rn4.rgdGene2Canonical.transcript (via rgdGene2.name)
      rn4.rgdGene2Isoforms.transcript (via rgdGene2.name)
      rn4.rgdGene2KeggPathway.rgdId (via rgdGene2.name)
      rn4.rgdGene2Pep.name (via rgdGene2.name)
      rn4.rgdGene2ToDescription.name (via rgdGene2.name)
      rn4.rgdGene2ToDisplayId.name (via rgdGene2.name)
      rn4.rgdGene2ToEnsembl.name (via rgdGene2.name)
      rn4.rgdGene2ToGenbank.name (via rgdGene2.name)
      rn4.rgdGene2ToGenbankAll.name (via rgdGene2.name)
      rn4.rgdGene2ToGnfAtlas2.name (via rgdGene2.name)
      rn4.rgdGene2ToKeggEntrez.name (via rgdGene2.name)
      rn4.rgdGene2ToLocusLink.name (via rgdGene2.name)
      rn4.rgdGene2ToPDB.name (via rgdGene2.name)
      rn4.rgdGene2ToPfam.name (via rgdGene2.name)
      rn4.rgdGene2ToRAE230.name (via rgdGene2.name)
      rn4.rgdGene2ToRefSeq.name (via rgdGene2.name)
      rn4.rgdGene2ToSymbol.rgdId (via rgdGene2.name)
      rn4.rgdGene2ToU34A.name (via rgdGene2.name)
      rn4.rgdGene2ToUniProt.name (via rgdGene2.name)
      rn4.rgdGene2Xref.rgdGeneId (via rgdGene2.name)
      rn4.rgdGenePathway.geneId (via rgdGene2.name)
      rn4.scBlastTab.query (via rgdGene2.name)

Sample Rows
 
namechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEnds
RGD:1565877chr1-9827114233598949142335698271,122723,127935,129576,139688,142173,99351,122826,128045,129679,139737,142335,
RGD:1596944chr1-2458872670112458872670117245887,256156,257489,258566,259804,260546,266805,246774,256280,257714,259370,260077,260562,267011,
RGD:1561757chr1+7606577659147607537659144760657,762313,763741,765030,761461,762538,763865,765914,
RGD:1565892chr1+100941110219231009411102192361009411,1015935,1016643,1018297,1019750,1021039,1009617,1016218,1017447,1018522,1019874,1021923,
RGD:1564110chr1-113479711498651134797114986561134797,1136833,1144442,1145623,1146814,1149659,1135681,1136957,1144667,1146427,1147097,1149865,
RGD:1585898chr1+123490712435611234907124356161234907,1237683,1238354,1240008,1241408,1242674,1235113,1237966,1239158,1240233,1241532,1243561,
RGD:1562697chr1+128708512957561287085129575661287085,1289852,1290535,1292186,1293585,1294872,1287291,1290135,1291339,1292411,1293709,1295756,
RGD:2321865chr1-134044913471521340449134715251340449,1344407,1345112,1345642,1347061,1340586,1344556,1345380,1345903,1347152,
RGD:1584760chr1-138486114030431384861140304381384861,1386239,1390853,1400500,1400823,1401076,1401608,1402952,1385032,1386377,1391070,1400592,1400933,1401343,1401842,1403043,
RGD:1583663chr1-141417614193341414176141933461414176,1414894,1415621,1416333,1416859,1419018,1414243,1415036,1415774,1416597,1417120,1419334,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

RGD Genes (rgdGene2) Track Description
 

Description

This track shows protein-coding genes based on annotation provided by RGD.

Method

Annotations on RGD Genes were downloaded from:

  • ftp://rgd.mcw.edu/pub/data_release/GFF/Corrected_GFF3/
  • ftp://rgd.mcw.edu/pub/data_release/GENES_RAT

The GFF files were combined into a temporary MySQL database table and those records of "gene", "exon", and "CDS" were selected and loaded into separate MySQL tables. Records of inconsistent strand information were cleaned out.

The resulting collection was loaded into a gene prediction format table using the ldHgGene utility program. The data were further processed by two programs, getRgdGeneCds and doRgdGene2 to create the genePred format table, rgdGene2, as the base table for RGD Genes.

A program, doRgdGene2Xref, was used to create the rgdGene2Xref table using the Dbxref field. rgdGene2ToDescription table was built using the gene_desc field from GENES_RAT file. rgdGene2ToUniProt and rgdGene2Pep tables were built using data from GENES_RAT and UniProt database.

A total of 753 genes were found to have inconsistent annotations, which caused display problems in the Genome Browser. These 753 entries were removed.

All the programs mentioned above in this build pipeline could be found in the source code package which may be downloaded here.

Credits

Thanks to RGD for providing the base annotation of RGD Genes.

The RGD Genes track was produced at UCSC by Fan Hsu, Mary Goldman and Hiram Clawson. It is based on data from RGD, NCBI RefSeq, UniProt, and GenBank. Our thanks to the people running these databases and to the scientists worldwide who have made contributions to them.

References

Twigger SN, Shimoyama M, Bromberg S, Kwitek AE, Jacob HJ, RGD Team. The Rat Genome Database, update 2007--easing the path from disease to data and back again. Nucleic Acids Res. 2007 Jan;35(Database issue):D658-62. PMID: 17151068; PMC: PMC1761441