Human mucin gene MUC5B, the 10.7-kb large central exon encodes various alternate subdomains resulting in a super-repeat. Structural evidence for a 11p15.5 gene family.
Résumé
Human mucin gene MUC5B is mapped clustered with MUC6, MUC2, and MUC5AC on chromosome 11p15.5. We report here the isolation of three overlapping genomic clones of human MUC5B spanning approximately 40 kilobases. We have determined their partial restriction maps and the intron-exon boundaries of the central region encoding a single open reading frame. This coding region has been completely sequenced. Its length is 10,713 base pairs, and it encodes a 3570-amino acid peptide. Nineteen subdomains have been individualized. Some subdomains show similarity to each other, creating larger composite repeat units that we have called super-repeats. Four super-repeats of 528 amino acid residues are thus observed within the central exon. Each comprises (i) a subdomain composed of 11 repeats of the irregular repeat of 29 amino acid residues, (ii) a unique conserved subdomain with no typical repeat, and (iii) a cysteine-rich subdomain. This latter subdomain has high sequence similarity to the cysteine-rich domains described in MUC2 and MUC5AC. Sequence data of these three genes, together with their clustered organization, lead us to suggest that they may be a part of a multigene family. The super-repeat present in MUC5B is the largest ever determined in mucin genes and the central exon of this gene is, by far, the largest reported for a vertebrate gene.