Schema for TransMap Ensembl - TransMap Ensembl and GENCODE Mappings Version 5
  Database: mm10    Primary Table: transMapEnsemblV5 Data last updated: 2019-06-10
Big Bed File: /gbdb/mm10/transMap/V5/mm10.ensembl.transMapV5.bigPsl
Item Count: 1,216,136
Format description: bigPsl derived pairwise alignment with additional information
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart130389792Start position in chromosome
chromEnd130422300End position in chromosome
namemicMur3:ENSMICT00000066406.1-1.1alignment Id
score675Score (0-1000), faction identity * 1000
strand-+ or - indicates whether the query aligns to the + or - strand on the reference
thickStart130422300Start of where display should be thick (start codon)
thickEnd130422300End of where display should be thick (stop codon)
reserved0RGB value (use R,G,B string in input file)
blockCount58Number of blocks
blockSizes24,21,18,12,34,59,7,6,31,15,30,35,161,24,18,13,15,29,20,40,15,5,10,21,12,20,17,13,34,24,31,18,2,9,29,8,38,21,4,14,54,7,116,69,15,21,18,18,24,8,40,5,100,44,86,100,192,186,Comma separated list of block sizes
chromStarts0,24,45,63,75,109,168,175,182,214,229,259,294,456,480,498,520,551,597,620,675,886,973,991,1012,1026,1046,1063,1076,1116,1140,1179,1199,1201,1210,1239,1247,1285,1309,1318,1332,1423,1566,1682,1754,20075,20836,20857,20875,21739,24256,24297,24304,24404,27046,28315,29808,32322,Start positions relative to chromStart
oChromStart85Start position in other chromosome
oChromEnd2311End position in other chromosome
oStrand++ or -, - means that psl was reversed into BED-compatible coordinates
oChromSize4672Size of other chromosome.
oChromStarts2361,2387,2409,2437,2460,2498,2558,2568,2574,2605,2625,2661,2701,2862,2887,2908,2921,2936,2965,2985,3025,3040,3045,3055,3086,3098,3119,3139,3154,3188,3219,3250,3268,3273,3283,3313,3322,3361,3382,3386,3401,3455,3462,3582,3651,3666,3687,3705,3744,3823,3831,3871,3876,3979,4023,4109,4209,4401,Start positions relative to oChromStart or from oChromStart+oChromSize depending on strand
oSequenceatgagtctcgcgcgcccgagcgcccccacagtgctgcgtctcggctgtctgtacctgctgctactgtggccaccagccgcacggggtgactgcagctctcccccagaagtacctaatgcccacctggacttgagaggtctttcaagttttccggtgaatagcacagtaacatacagatgtaatgaaggctttgtaaaagttcctggccaaacagactcagtaatctgtcttgggaatgatcaatggtcagagatttcagaattctgtaaccgtagctgtaacgtttcaccccagctacagtttgcaatgctcaaaaagatttatagcagccagaattattttccagttgggtccactgtggaatatgagtgccgtttggggtacagaagggtacctggtcgatcaggaacactaacttgccttcagaatttagaatggtccaaaccagctgagttttgtacaaaaaagtcatgtactcatcctggagaattatcaaatggtcagatcaatgtaaaaactgacttcttatttggtgcaaccattaacttctcatgtgataaaggctacaaactaattggtgcaaagcatagtatctgtgttattataggtgataatgttgggtggactgaaccacttccaacttgcagagtaattcattgtcaagaaccaccaaaaattgagaatggaagaatagttcgaggagacggcgactactatgtatataatcaggctgtaatgtacgaatgtaatagaggctttaccctgattggagaggacactattcattgtactgcaaaaggcgatgaaggagagtggagtggcccaccacctcagtgccgaggaaagtctccacctgccacgttccggccatcagttcctaaacctaccactacaaatcttccagcaacccagggtgaacttcttcccaggacaaccatgttttctcctgcaacaagtatatctaaaggaggacagatcccttcagattcttccacatttacagctgggcacacatatttaactttgaccattttgctcatgatgctagtaatcactggccagctgacctagccaaagaagagttaagaagaaagtgcacacatgtacacagattattcctcattacttagacctatctgcaggaatggatacaataaattctatagtgctcttcattttgaacgttaccattgtctttaacatgtgttaggaaactcaacaaagcaaggagaaaggagttcaccctggaatcacacacttaacatacttaacgcctcttgaaaatagaacaacaactcacagaattgagagtgattattttcctaaaagtatagcaaggtgtagagatttgttcatacttagaatgggaacatgaagaaaagaaaataagtgattttttttcagtagatatgtaatattcttacttaaaaaagaaattttaaaatgtaaaacattacttggatatcaaaaacaaacaaaaaactgaatctggtctcttctaagcaaagttgccgaagaacgatgaatcacgctatagaataatccctggcctgtaaagcattttcttccttattccattaaaatgatcctgaatgggttggcaaaatattttgcaggaaaaacttgctaatcaacaaagagtgttggtggtggcaagggaagagcatagaatgaaagactaattcttcttttgttgtataaacacagcccattatttgtaagatgaatatgctacaatcttttcttttttcaaaatgagttcagtgtaaggtatcttatttgacttaatctcttttaaaaaagtatcagggatggtacaatattaacataagaaaaccttctatatttctgaagtgagatgttcacagccaaattgtacattttattcctttgtaacattaaaattaatttatgtatatttatttcagtaagtaaacctattggttttacatataaaacaagaaaacttttatgtaaagaaaaatttattttcccaagtagaaataaataatttatcaaagtttttggtctcatgtagtatttgaaatttattttcttaaatatgaatatttggtttctaaataagatattcccttcctgcctcccataagtagttcagaatgctgccccttggttatcctagtgggaatcattttcagttatttttaaattatattcactttctcaaacatgtcatatcctctcctattagaacgttgatgttattgatttacctaaaggcaatcacattaatttctaggttcttcatatattcttctcatattgccaattacattttaagtattagcttggactagaatacaggcagtccccaggttacagatgacccgagttacagtgttccatacttatgaaccagttcccaaggctgccatgaggataatttagatattaggttattaagtatgaaatgtctgatttcctcaagtactaagtttgacagttgatatcactgacatttcactttgacacttcctgttttattttattgtaaaggtacctcttggaaatggatgttttgccttgtgatttcatctgtcaaatttatttttaaagcaatttttttggagccatgcataagtcaaaaatgtatgctattttcaaatatgatttccctcatggatccagtgcagcacagacagctcaaaataccaacgaagtgttggagaagagtgtggataatgaacgcacagtactcggatggtttgagaagttccattctggtgactttaatcttgaaaatgagccatgtgagcgacttgagaccaaggtggacaacatgtgctgaaagctgtagtggaagggaatccacctcagtctgtgcatcaatcagcagcaaggtctgacgttactaatccaacaatattggaccatttgaaacaaatcatcaaggtaaagaagttggacagatgggtactggatgaataaaaggagcatcagaagagaaatcgtctcgaaacttaccattaagatggcaaaactgtgtcgacagtttaggcacatactttgattaattgtactgctgcttgtttgagatacagtaaactacactttagatttgtaatcggacatttcatatttatgacttaataaactaataattcaggacctcttttcctccgtgtatgtcaatgaagccgcatggcatcggcgtttcattggcaaagcatctttaatcagtttagttgagtattgatatttacatgaattcaaatagtagtataaataataatttatcaaagatctatatctcatttaatttagcctttttttatttaaaaaaatactataatatattaaaatcacatacgaaataatatataagcatatactaaaatcaagaccagctatataatttacaaaacccaattcaatatggaaatgcaagaatttcaagatacggacagtagagtattaaaccaaggtcacttctcagcatgcaagcctgtctaggtttctatctagcttgcatgcccaggaagggaatcctgactaacctggcatgtttgtagtagtattctgtgaaacgtcggagagaggacagcttttcctttttcatattctgaactcacaaaccaatttaatttatgaaatgatttccttatttcaaatatgggagcaaaagatgtttaatatcaaatatgaggaccatacttgatggtttaaaaatctcccgcctcccttccaaatataaactctaaaatgtttgaatactgttaggtctcagtgaaagcctataggaatccctttcctggaatccttgtaggattggttccagcactagacaacgcagggtgtggttgagtatgatgagcatgagtcagagaaaaccctattttgggatactgtggtttttcctctttaggaaagtcacactagccaggaagagctaccaagaagcattcaggaaagaataacatgataggatataacaaacactccaacagctggtccgtagtccaaggcacaactgcaattgagggattttacctaagcaggtaagccatttaaggggaaaggaatctagcctaggactctgttaaccagacgtacgtaaattattaaagggaaacctgttctcgagctattctccaggaagatgagcaacccatcttgccatcagtatcctttagtatctttgacaagggcagtgctgggctgaacagacctttactctgaccagtactatatagatctgtggttctcaaacctagctgcacagtggagcactgtaaatactcctgattccaagagattttgatttaatttgggatgcagcttggatatctggatttttaaaaagatccccatgtgatgttaatgtgtggccagccttgcaaaccgcaggctgttctcagggtcttctaacaaaccttcctcttaaaatccaccacagcctttaactttctggggctagtttgtttttgttgtcatcttctccctcctgtcacactcaaacactcccggggtggtgaatttctctcggcagtcagactagaacaatcctaaggtgggattcaagaggtgtccccgctaaataggggaacccaggtgaagagcgtgattcccgcccaaacatggtaggaacaatgatccctccccgccaggcagctggaccagttttcttaggtcgtttcacacttgctgttttSequence on other chrom (or edit list, or empty)
oCDS1..1071CDS in NCBI format
chromSize195471971Size of target chromosome
match1380Number of bases matched.
misMatch669 Number of bases that don't match
repMatch11 Number of bases that match but are part of repeats
nCount0 Number of 'N' bases
seqType10=empty, 1=nucleotide, 2=amino_acid
srcDbmicMur3source database
srcTransIdENSMICT00000066406.1source transcript id
srcChromchr27source chromosome
srcChromStart31864957start position in source chromosome
srcChromEnd31899389end position in source chromosome
srcIdent1000source score (fraction identity * 1000)
srcAligned1000fraction of source transcript aligned (fraction aligned * 1000)
geneNamegene name
geneIdENSMICG00000046137.1gene id
geneTypeprotein_codinggene type
transcriptTypeprotein_codingtranscript type
chainTypesyntype of chains used for mapping
commonNameMouse lemurcommon name
scientificNameMicrocebus murinusscientific name
orgAbbrevMicrocorganism abbreviation

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartsoChromStartoChromEndoStrandoChromSizeoChromStartsoSequenceoCDSchromSizematchmisMatchrepMatchnCountseqTypesrcDbsrcTransIdsrcChromsrcChromStartsrcChromEndsrcIdentsrcAlignedgeneNamegeneIdgeneTypetranscriptTypechainTypecommonNamescientificNameorgAbbrev
chr1130389792130422300micMur3:ENSMICT00000066406.1-1.1675-13042230013042230005824,21,18,12,34,59,7,6,31,15,30,35,161,24,18,13,15,29,20,40,15,5,10,21,12,20,17,13,34,24,31,18,2,9,29,8,38,21,4,14,54,7,116,69,15 ...0,24,45,63,75,109,168,175,182,214,229,259,294,456,480,498,520,551,597,620,675,886,973,991,1012,1026,1046,1063,1076,1116,1140,117 ...852311+46722361,2387,2409,2437,2460,2498,2558,2568,2574,2605,2625,2661,2701,2862,2887,2908,2921,2936,2965,2985,3025,3040,3045,3055,3086,309 ...atgagtctcgcgcgcccgagcgcccccacagtgctgcgtctcggctgtctgtacctgctgctactgtggccaccagccgcacggggtgactgcagctctcccccagaagtacctaatgcccacctgga ...1..107119547197113806691101micMur3ENSMICT00000066406.1chr27318649573189938910001000ENSMICG00000046137.1protein_codingprotein_codingsynMouse lemurMicrocebus murinusMicroc
chr1130389847130422300monDom5:ENSMODT00000002841.3-1.1612-13042230013042230004330,17,33,23,8,21,16,30,31,17,27,21,75,46,17,52,4,9,23,33,8,19,14,68,29,36,9,60,11,1,16,17,11,17,49,62,56,47,142,86,100,192,186,0,30,58,95,118,126,147,163,198,229,246,275,303,1235,1283,1302,1362,1379,1395,1418,1454,1462,1481,1495,1566,1595,1631,1640,1703,4 ...5023416+3565149,180,197,230,257,266,294,311,341,374,406,433,454,1080,1126,1143,1195,1199,1208,1235,1268,1280,1305,1320,1388,1418,1455,1466,1 ...ccaatacatttaagccctcgtatttacatttgccaaaaacaacaagtcatttgtcggtcccacccctcagcctgacagtgggtttcaaactacaccctgccgcccagtccctgcgctctctcagacgc ...409..204319547197110266855801monDom5ENSMODT00000002841.3chr211387163711389903710001000ENSMODG00000002282.3protein_codingprotein_codingsynOpossumMonodelphis domesticaMonode
chr1130390038130422300panPan2:ENSPPAT00000055182.1-1.1691-1304223001304223000493,36,10,58,27,28,51,39,10,15,21,26,20,24,9,30,65,26,12,20,6,38,24,40,18,6,80,44,67,17,21,13,20,63,102,51,46,54,66,119,29,85,8,41 ...0,3,39,49,117,144,172,225,264,275,723,744,780,800,824,833,870,935,963,975,995,1001,1039,1074,1137,1192,1311,1392,1436,1506,19833 ...6003048+30480,5,42,64,122,150,181,232,278,288,341,366,392,413,439,450,480,546,572,590,611,618,657,681,721,739,745,825,873,940,957,978,994,10 ...cttggcagcactcaagcgcggggatgctccgcttagacgaactcacgttcgggcagcaaggcctgcgatacttgagcacccctccccctctcccgtttacaccccgtttgtgtttacgtagcgaggag ...501..21561954719711584710601panPan2ENSPPAT00000055182.1chr118720108018724035010001000ENSPPAG00000039018.1protein_codingprotein_codingsynBonoboPan paniscusPan
chr1130390055130463462rn6:ENSRNOT00000005284.6-1.1853-13046346213046346205023,58,65,37,49,27,47,55,69,53,7,54,33,22,28,26,42,38,10,53,26,98,7,9,125,87,21,81,35,26,32,41,148,86,100,192,186,222,31,42,51,12 ...0,23,81,156,194,244,271,318,374,472,564,642,696,729,761,789,817,862,900,910,980,1006,1112,1193,1294,1419,57563,58272,59412,59450 ...03093+30930,26,85,150,187,236,268,316,371,440,493,500,555,592,614,643,669,711,750,761,814,843,941,948,957,1086,1173,1194,1275,1310,1378,14 ...gctccggtgtgatttccaaggtgtggatcactttgcttgtcattggctactcttcacaccacgaaataaatctcttaagctctaccacttattcctgaattttattttatagcaggactaaaatcatc ...831..1979195471971239243615001rn6ENSRNOT00000005284.6chr13471267404715429210001000Cd55ENSRNOG00000003927.7protein_codingprotein_codingsynRatRattus norvegicusRattus
chr1130391438130422300colAng1:ENSCANT00000050514.1-1.1689-13042230013042230002036,69,15,21,13,20,30,10,55,11,12,3,41,148,86,100,192,120,25,40,0,36,108,18433,19190,19203,19223,20005,20015,20073,20084,20096,22610,22654,25400,26669,28162,30676,30796,30822,821308+13080,40,109,242,263,279,314,344,369,430,447,470,473,514,662,748,848,1040,1161,1186,atgactgtcgcgcggccgagcgtgcccgcggcgctgccccggctgctgctgctgctgctgctgtgcctgccggccgtgtgggctgactgtggccctcccccagctgtacctaatgcccagccatcttt ...1..1308195471971722325001colAng1ENSCANT00000050514.1NW_012110920v14312729434899610001000CD55ENSCANG00000037159.1protein_codingprotein_codingsynAngolan colobusColobus angolensis palliatusColobu
chr1130391438130422300macNem1:ENSMNET00000032489.1-1.1682-13042230013042230001836,69,15,13,21,29,65,11,12,3,41,148,86,100,192,120,16,49,0,36,108,19190,19203,19224,20005,20073,20084,20096,22610,22654,25400,26669,28162,30676,30796,30813,1031293+12930,40,109,242,258,294,323,394,411,434,437,478,626,712,812,1004,1125,1141,atgactgtcgcgcggccgagcgtgcccgcggcgctgcccctccttggggagctgccccggctgctgctgctgctgctgctgctgtgcctgccggccgtgtgggctgactgtggccctcccccagctgt ...1..1293195471971700326001macNem1ENSMNET00000032489.1NW_012010800v1378550693789137210001000CD55ENSMNEG00000028645.1protein_codingprotein_codingsynPig-tailed macaqueMacaca nemestrinaMacaca
chr1130391438130422300manLeu1:ENSMLET00000020963.1-1.1681-13042230013042230001936,69,15,21,13,21,29,65,11,12,3,41,148,86,100,192,120,16,49,0,36,108,18433,19190,19203,19224,20005,20073,20084,20096,22610,22654,25400,26669,28162,30676,30796,30813,1031314+13140,40,109,242,263,279,315,344,415,432,455,458,499,647,733,833,1025,1146,1162,atgactgttgcgcggccgagcgtgcccgcggcgctgcccctccttggggagctgccccggctgctgctgctgctgctgctgctgtgcctgccggctgtgtgggctgactgtggccctcccccagctgt ...1..1314195471971714333001manLeu1ENSMLET00000020963.1NW_012100254v119098822736510001000CD55ENSMLEG00000019163.1protein_codingprotein_codingsynDrillMandrillus leucophaeusMandri
chr1130391438130422300manLeu1:ENSMLET00000020966.1-1.1684-13042230013042230001836,69,15,13,21,29,65,11,12,3,41,148,86,100,192,120,16,49,0,36,108,19190,19203,19224,20005,20073,20084,20096,22610,22654,25400,26669,28162,30676,30796,30813,1031293+12930,40,109,242,258,294,323,394,411,434,437,478,626,712,812,1004,1125,1141,atgactgttgcgcggccgagcgtgcccgcggcgctgcccctccttggggagctgccccggctgctgctgctgctgctgctgctgtgcctgccggctgtgtgggctgactgtggccctcccccagctgt ...1..1293195471971702324001manLeu1ENSMLET00000020966.1NW_012100254v119098822736510001000CD55ENSMLEG00000019163.1protein_codingprotein_codingsynDrillMandrillus leucophaeusMandri
chr1130391438130422300panPan2:ENSPPAT00000055194.1-1.1677-13042230013042230001636,67,17,21,13,20,30,12,85,8,41,148,86,100,192,186,0,36,106,18433,19190,19203,19223,19750,19762,20093,22610,22654,25400,26669,28162,30676,1001323+13230,40,107,242,263,279,314,344,368,462,470,511,659,745,845,1037,atgaccgtcgcgcggccgagggtgcccgcggcgctgccccttctcggggaactgccccggctgctgctgctggtgctgttgtgcctgccggccgtgtggggtgactgtggccttcccccagaggtacc ...1..1323195471971719343001panPan2ENSPPAT00000055194.1chr118720158018723951710001000ENSPPAG00000039018.1protein_codingprotein_codingsynBonoboPan paniscusPan
chr1130391438130422300papAnu4:ENSPANT00000004502.2-1.1691-13042230013042230001836,69,15,13,21,61,10,88,12,3,41,148,86,100,192,120,16,49,0,36,108,19190,19203,19224,19354,19954,20084,20096,22610,22654,25400,26669,28162,30676,30796,30813,1001332+13320,40,109,242,258,294,355,365,453,476,479,520,668,754,854,1046,1167,1183,atgactgtcgcgcggccgagcgtgcccgcggcgctgcccctccttggggagctgccccggctgctgctgctgctgctgctgtgcctgccggccgtgtgggctgactgtggccctcccccggctgtacc ...1..1332195471971747333001papAnu4ENSPANT00000004502.2chr115327678015331304610001000CD55ENSPANG00000001375.2protein_codingprotein_codingsynBaboonPapio anubisPapio

TransMap Ensembl (transMapEnsemblV5) Track Description
 

Description

This track contains GENCODE or Ensembl alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. GENCODE is Ensembl for human and mouse, for other Ensembl sources, only ones with full gene builds are used. Projection Ensembl gene annotations will not be used as sources. For closer evolutionary distances, the alignments are created using syntenically filtered BLASTZ alignment chains, resulting in a prediction of the orthologous genes in mouse.

Display Conventions and Configuration

This track follows the display conventions for PSL alignment tracks.

This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here.

Methods

  1. Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available.
  2. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the mouse (mm10) genome, a subset of the alignment chains were selected as follows:
    • For organisms whose branch distance was no more than 0.5 (as computed by phyloFit, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the netfilter -syn command. The chains corresponding to the selected nets were used for mapping.
    • For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications.
  3. The pslMap program was used to do a base-level projection of the source transcript alignments via the selected chains to the mouse genome, resulting in pairwise alignments of the source transcripts to the genome.
  4. The resulting alignments were filtered with pslCDnaFilter with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded.

To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is:

   accession.version-srcUniq.destUniq
Where srcUniq is a number added to make each source alignment unique, and destUniq is added to give the subsequent TransMap alignments unique identifiers.

For example, in the cow genome, there are two alignments of mRNA BC149621.1. These are assigned the identifiers BC149621.1-1 and BC149621.1-2. When these are mapped to the human genome, BC149621.1-1 maps to a single location and is given the identifier BC149621.1-1.1. However, BC149621.1-2 maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic.

Data Access

The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way:

  • TransMap Ensembl - mm10.ensembl.transMapV4.bigPsl
  • TransMap RefGene - mm10.refseq.transMapV4.bigPsl
  • TransMap RNA - mm10.rna.transMapV4.bigPsl
  • TransMap ESTs - mm10.est.transMapV4.bigPsl
Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mm10/transMap/V4/mm10.refseq.transMapV4.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout

Credits

This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects.

References

Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585

Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656

Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963