Schema for SGP Genes - SGP Gene Predictions Using Mouse/Human Homology
|
|
Database: hg38 Primary Table: sgpGene Row Count: 36,030   Data last updated: 2015-07-30
Format description: A gene prediction with some additional info. On download server: MariaDB table dump directory
field | example | SQL type | info | description |
bin | 585 | smallint(5) unsigned | range | Indexing field to speed chromosome range queries. |
name | chr1_1.1 | varchar(255) | values | Name of gene (usually transcript_id from GTF) |
chrom | chr1 | varchar(255) | values | Reference sequence chromosome or scaffold |
strand | - | char(1) | values | + or - for strand |
txStart | 14969 | int(10) unsigned | range | Transcription start position (or end position for minus strand item) |
txEnd | 15009 | int(10) unsigned | range | Transcription end position (or start position for minus strand item) |
cdsStart | 14969 | int(10) unsigned | range | Coding region start (or end position for minus strand item) |
cdsEnd | 15009 | int(10) unsigned | range | Coding region end (or start position for minus strand item) |
exonCount | 1 | int(10) unsigned | range | Number of exons |
exonStarts | 14969, | longblob | | Exon start positions (or end positions for minus strand item) |
exonEnds | 15009, | longblob | | Exon end positions (or start positions for minus strand item) |
score | 0 | int(11) | range | score |
name2 | chr1_1 | varchar(255) | values | Alternate name (e.g. gene_id from GTF) |
cdsStartStat | incmpl | enum('none', 'unk', 'incmpl', 'cmpl') | values | Status of CDS start annotation (none, unknown, incomplete, or complete) |
cdsEndStat | cmpl | enum('none', 'unk', 'incmpl', 'cmpl') | values | Status of CDS end annotation (none, unknown, incomplete, or complete) |
exonFrames | 0, | longblob | | Exon frame {0,1,2}, or -1 if no frame for exon |
|
| |
|
|
Sample Rows
|
|
bin | name | chrom | strand | txStart | txEnd | cdsStart | cdsEnd | exonCount | exonStarts | exonEnds | score | name2 | cdsStartStat | cdsEndStat | exonFrames |
---|
585 | chr1_1.1 | chr1 | - | 14969 | 15009 | 14969 | 15009 | 1 | 14969, | 15009, | 0 | chr1_1 | incmpl | cmpl | 0, |
585 | chr1_2.1 | chr1 | - | 16744 | 35736 | 16744 | 35736 | 8 | 16744,16857,17232,17605,17914,18267,24737,35720, | 16765,17055,17257,17742,18061,18379,24891,35736, | 0 | chr1_2 | cmpl | cmpl | 0,0,2,0,0,2,1,0, |
585 | chr1_3.1 | chr1 | + | 52029 | 63887 | 52029 | 63887 | 2 | 52029,62915, | 52038,63887, | 0 | chr1_3 | cmpl | cmpl | 0,0, |
585 | chr1_4.1 | chr1 | + | 65564 | 70008 | 65564 | 70008 | 2 | 65564,69036, | 65573,70008, | 0 | chr1_4 | cmpl | cmpl | 0,0, |
586 | chr1_5.1 | chr1 | + | 131779 | 155667 | 131779 | 155667 | 2 | 131779,155643, | 131845,155667, | 0 | chr1_5 | cmpl | cmpl | 0,0, |
586 | chr1_6.1 | chr1 | - | 163124 | 185530 | 163124 | 185530 | 6 | 163124,163744,164765,165883,170508,185490, | 163141,163805,164791,165942,170527,185530, | 0 | chr1_6 | cmpl | cmpl | 1,0,1,2,1,0, |
586 | chr1_7.1 | chr1 | - | 187266 | 195438 | 187266 | 195438 | 6 | 187266,187379,187754,188129,188790,195262, | 187287,187577,187779,188266,188902,195438, | 0 | chr1_7 | cmpl | cmpl | 0,0,2,0,2,0, |
586 | chr1_8.1 | chr1 | - | 258540 | 258903 | 258540 | 258903 | 1 | 258540, | 258903, | 0 | chr1_8 | cmpl | cmpl | 0, |
587 | chr1_9.1 | chr1 | - | 348161 | 348308 | 348161 | 348308 | 1 | 348161, | 348308, | 0 | chr1_9 | cmpl | cmpl | 0, |
588 | chr1_10.1 | chr1 | - | 450739 | 487203 | 450739 | 487203 | 2 | 450739,487174, | 451763,487203, | 0 | chr1_10 | cmpl | cmpl | 2,0, |
|
Note: all start coordinates in our database are 0-based, not
1-based. See explanation
here.
| |
|
|
SGP Genes (sgpGene) Track Description
|
|
Description
This track shows gene predictions from the
SGP2
homology-based gene prediction program developed by Roderic Guigó's
" Computational Biology of RNA Processing"
group, which is part of the Centre de Regulació Genòmica
(CRG) in Barcelona, Catalunya, Spain. To predict
genes in a genomic query, SGP2 combines geneid predictions with tblastx
comparisons of the genome of the target species against genomic sequences
of other species (reference genomes) deemed to be at an appropriate
evolutionary distance from the target.
Credits
Thanks to the
" Computational Biology of RNA Processing"
group for providing these data.
| |
|
|
|