Human Gene SEMG1 (ENST00000372781.4) Description and Page Index
  Description: Homo sapiens semenogelin 1 (SEMG1), mRNA. (from RefSeq NM_003007)
RefSeq Summary (NM_003007): The protein encoded by this gene is the predominant protein in semen. The encoded secreted protein is involved in the formation of a gel matrix that encases ejaculated spermatozoa. This preproprotein is proteolytically processed by the prostate-specific antigen (PSA) protease to generate multiple peptide products that exhibit distinct functions. One of these peptides, SgI-29, is an antimicrobial peptide with antibacterial activity. This proteolysis process also breaks down the gel matrix and allows the spermatozoa to move more freely. This gene and another similar semenogelin gene are present in a gene cluster on chromosome 20. [provided by RefSeq, Feb 2016]. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications. ##Evidence-Data-START## Transcript exon combination :: J04440.1, ERR279869.434.1 [ECO:0000332] RNAseq introns :: single sample supports all introns SAMEA2156266, SAMEA2159912 [ECO:0000348] ##Evidence-Data-END## ##RefSeq-Attributes-START## MANE Ensembl match :: ENST00000372781.4/ ENSP00000361867.3 Protein has antimicrobial activity :: PMID: 18314226 RefSeq Select criteria :: based on single protein-coding transcript ##RefSeq-Attributes-END##
Gencode Transcript: ENST00000372781.4
Gencode Gene: ENSG00000124233.12
Transcript (Including UTRs)
   Position: hg38 chr20:45,207,033-45,209,768 Size: 2,736 Total Exon Count: 3 Strand: +
Coding Region
   Position: hg38 chr20:45,207,054-45,208,686 Size: 1,633 Coding Exon Count: 2 

Page IndexSequence and LinksUniProtKB CommentsMalaCardsCTDRNA-Seq Expression
Microarray ExpressionRNA StructureProtein StructureOther SpeciesGO AnnotationsmRNA Descriptions
PathwaysOther NamesMethods
Data last updated: 2019-09-04

-  Sequence and Links to Tools and Databases
Genomic Sequence (chr20:45,207,033-45,209,768)mRNA (may differ from genome)Protein (462 aa)
Gene SorterGenome BrowserOther Species FASTAGene interactionsTable SchemaBioGPS
CGAPEnsemblEntrez GeneExonPrimerGeneCardsHGNC
ReactomeStanford SOURCEUniProtKBWikipedia

-  Comments and Description Text from UniProtKB
DESCRIPTION: RecName: Full=Semenogelin-1; AltName: Full=Semenogelin I; Short=SGI; Contains: RecName: Full=Alpha-inhibin-92; Contains: RecName: Full=Alpha-inhibin-31; Contains: RecName: Full=Seminal basic protein; Flags: Precursor;
FUNCTION: Predominant protein in semen. It participates in the formation of a gel matrix entrapping the accessory gland secretions and ejaculated spermatozoa. Fragments of semenogelin and/or fragments of the related proteins may contribute to the activation of progressive sperm movements as the gel-forming proteins are fragmented by KLK3/PSA.
FUNCTION: Alpha-inhibin-92 and alpha-inhibin-31, derived from the proteolytic degradation of semenogelin, inhibit the secretion of pituitary follicle-stimulating hormone.
SUBUNIT: Occurs in disulfide-linked complexes which may also contain two less abundant 71- and 76-kDa semenogelin-related polypeptides. Interacts with EPPIN (via C-terminus); Cys-239 is a critical amino acid for both binding to EPPIN.
TISSUE SPECIFICITY: Seminal vesicle.
PTM: Transglutaminase substrate.
PTM: Rapidly cleaved after ejaculation by KLK3/PSA, resulting in liquefaction of the semen coagulum and the progressive release of motile spermatozoa.
SIMILARITY: Belongs to the semenogelin family.
WEB RESOURCE: Name=Protein Spotlight; Note=Shackled sperm - Issue 62 of September 2005; URL="";

-  MalaCards Disease Associations
  MalaCards Gene Search: SEMG1
Diseases sorted by gene-association score: motion sickness (5)

-  Comparative Toxicogenomics Database (CTD)
  The following chemicals interact with this gene

-  RNA-Seq Expression Data from GTEx (53 Tissues, 570 Donors)
  Highest median expression: 1.69 RPKM in Prostate
Total median expression: 1.92 RPKM

View in GTEx track of Genome Browser    View at GTEx portal     View GTEx Body Map

+  Microarray Expression Data
  Press "+" in the title bar above to open this section.

-  mRNA Secondary Structure of 3' and 5' UTRs
RegionFold EnergyBasesEnergy/Base
Display As
5' UTR -0.8021-0.038 Picture PostScript Text
3' UTR -46.60212-0.220 Picture PostScript Text

The RNAfold program from the Vienna RNA Package is used to perform the secondary structure predictions and folding calculations. The estimated folding energy is in kcal/mol. The more negative the energy, the more secondary structure the RNA is likely to have.

-  Protein Domain and Structure Information
  InterPro Domains: Graphical view of domain structure
IPR008836 - Semenogelin

Pfam Domains:
PF05474 - Semenogelin

ModBase Predicted Comparative 3D Structure on P04279
The pictures above may be empty if there is no ModBase structure for the protein. The ModBase structure frequently covers just a fragment of the protein. You may be asked to log onto ModBase the first time you click on the pictures. It is simplest after logging in to just click on the picture again to get to the specific info on that model.

-  Orthologous Genes in Other Species
  Orthologies between human, mouse, and rat are computed by taking the best BLASTP hit, and filtering out non-syntenic hits. For more distant species reciprocal-best BLASTP hits are used. Note that the absence of an ortholog in the table below may reflect incomplete annotations in the other species rather than a true absence of the orthologous gene.
MouseRatZebrafishD. melanogasterC. elegansS. cerevisiae
Genome BrowserNo orthologNo orthologNo orthologNo orthologNo ortholog
Gene Details     
Gene Sorter     
Protein Sequence     

-  Gene Ontology (GO) Annotations with Structured Vocabulary
  Molecular Function:
GO:0005515 protein binding
GO:0046872 metal ion binding

Biological Process:
GO:0007320 insemination
GO:0019730 antimicrobial humoral response
GO:0019731 antibacterial humoral response
GO:0031640 killing of cells of other organism
GO:0044267 cellular protein metabolic process
GO:0050817 coagulation
GO:0051291 protein heterooligomerization
GO:0090281 negative regulation of calcium ion import
GO:1900005 positive regulation of serine-type endopeptidase activity
GO:1901318 negative regulation of flagellated sperm motility

Cellular Component:
GO:0005576 extracellular region
GO:0005615 extracellular space
GO:0005634 nucleus
GO:0032991 macromolecular complex
GO:0070062 extracellular exosome

-  Descriptions from all associated GenBank mRNAs
  AK291811 - Homo sapiens cDNA FLJ78262 complete cds, highly similar to Homo sapiens semenogelin II (SEMG2), mRNA.
BC005229 - Homo sapiens semenogelin I, mRNA (cDNA clone IMAGE:3950939), partial cds.
BC007096 - Homo sapiens semenogelin I, mRNA (cDNA clone MGC:14719 IMAGE:4251523), complete cds.
BC011442 - Homo sapiens semenogelin I, mRNA (cDNA clone IMAGE:4296701), with apparent retained intron.
BC055416 - Homo sapiens semenogelin I, mRNA (cDNA clone MGC:61979 IMAGE:6668871), complete cds.
J04440 - Homo sapiens semenogelin protein (SEMG) mRNA, complete cds.
CU691118 - Synthetic construct Homo sapiens gateway clone IMAGE:100022486 5' read SEMG1 mRNA.
AB591040 - Synthetic construct DNA, clone: pFN21AB7040, Homo sapiens SEMG1 gene for semenogelin I, without stop codon, in Flexi system.
BT007177 - Homo sapiens semenogelin I mRNA, complete cds.
KJ901731 - Synthetic construct Homo sapiens clone ccsbBroadEn_11125 SEMG1 gene, encodes complete protein.
KJ901732 - Synthetic construct Homo sapiens clone ccsbBroadEn_11126 SEMG1 gene, encodes complete protein.
JD437671 - Sequence 418695 from Patent EP1572962.

-  Biochemical and Signaling Pathways
  Reactome (by CSHL, EBI, and GO)

Protein P04279 (Reactome details) participates in the following event(s):

R-HSA-6810643 EPPIN protein complex binds bacteria
R-HSA-6803157 Antimicrobial peptides
R-HSA-977225 Amyloid fiber formation
R-HSA-168249 Innate Immune System
R-HSA-392499 Metabolism of proteins
R-HSA-168256 Immune System

-  Other Names for This Gene
  Alternate Gene Symbols: NM_003007, P04279, Q53ZV0, Q53ZV1, Q53ZV2, Q6X4I9, Q6Y809, Q6Y822, Q6Y823, Q86U64, Q96QM3, SEMG, SEMG1_HUMAN, uc002xni.1, uc002xni.2, uc002xni.3, uc002xni.4
UCSC ID: uc002xni.4
RefSeq Accession: NM_003007
Protein: P04279 (aka SEMG1_HUMAN or SEM1_HUMAN)
CCDS: CCDS13345.1

-  Methods, Credits, and Use Restrictions
  Click here for details on how this gene model was made and data restrictions if any.