37 CFR 1.839: Incorporation by reference

Taken from the Ninth Edition of the MPEP, Revision 07.2022, Published February 2023

Previous: §1.835 | Next: §1.902

1.839    Incorporation by reference.

  • (a) Certain material is incorporated by reference into this subpart with the approval of the Director of the Federal Register under 5 U.S.C. 552(a) and 1 CFR part 51. All approved incorporation by reference (IBR) material is available for inspection at the USPTO and at the National Archives and Records Administration (NARA). Contact the USPTO’s Office of Patent Legal Administration at 571–272–7701. For information on the availability of this material at NARA, email fr.inspection@ nara.gov or go to www.archives.gov/ federal-register/cfr/ibr-locations.html. The material may be obtained from the source(s) in paragraph (b) of this section.
  • (b) World Intellectual Property Organization (WIPO), 34 chemin des Colombettes, 1211 Geneva 20 Switzerland, www.wipo.int.
    • (1) WIPO Standard ST.26. WIPO Handbook on Industrial Property Information and Documentation, Standard ST.26: Recommended Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (eXtensible Markup Language) including Annexes I–VII, version 1.5, approved November 5, 2021; IBR approved for §§ 1.831 through 1.834.
    • (2) [Reserved]
[Added 87 FR 30806, May, 20, 2022, effective July 1, 2022]

Appendix to Subpart G of Part 1

Appendix A to Subpart G of Part 1 - List of Nucleotides

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Symbol Meaning Origin of designation
a a adenine.
g g guanine.
c c cytosine.
t t thymine.
u u uracil.
r g or a purine.
y t/u or c pyrimidine.
m a or c amino.
k g or t/u keto.
s g or c strong interactions 3H-bonds.
w a or t/u weak interactions 2H-bonds.
b g or c or t/u not a.
d a or g or t/u not c.
h a or c or t/u not g.
v a or g or c not t, not u.
n a or g or c or t/u, unknown, or other  any.

Appendix B to Subpart G of Part 1 - List of Modified Nucleotides

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Symbol Meaning
ac4c  4-acetylcytidine.
chm5u  5-(carboxyhydroxymethyl)uridine.
cm  2'-O-methylcytidine.
cmnm5s2u  5-carboxymethylaminomethyl-2- thiouridine.
cmnm5u  5-carboxymethylaminomethyluridine.
d dihydrouridine.
fm  2'-O-methylpseudouridine.
gal q beta, D-galactosylqueuosine.
gm  2'-O-methylguanosine.
i inosine.
i6a  N6-isopentenyladenosine.
m1a  1-methyladenosine.
m1f  1-methylpseudouridine.
m1g  1-methylguanosine.
m1i  1-methylinosine.
m22g  2,2-dimethylguanosine.
m2a  2-methyladenosine.
m2g  2-methylguanosine.
m3c  3-methylcytidine.
m5c  5-methylcytidine.
m6a  N6-methyladenosine.
m7g  7-methylguanosine.
mam5u  5-methylaminomethyluridine.
mam5s2u  5-methoxyaminomethyl-2-thiouridine.
man q beta, D-mannosylqueuosine.
mcm5s2u  5-methoxycarbonylmethyl-2- thiouridine.
mcm5u  5-methoxycarbonylmethyluridine.
mo5u  5-methoxyuridine.
ms2i6a  2-methylthio-N6- isopentenyladenosine.
ms2t6a  N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6- yl)carbamoyl)threonine.
mt6a  N-((9-beta-D-ribofuranosylpurine-6- yl)N-methylcarbamoyl)threonine.
mv  uridine-5-oxyacetic acid-methylester.
o5u  uridine-5-oxyacetic acid.
osyw  wybutoxosine.
p pseudouridine.
q queuosine.
s2c  2-thiocytidine.
s2t  5-methyl-2-thiouridine.
s2u  2-thiouridine.
s4u  4-thiouridine.
t 5-methyluridine.
t6a  N-((9-beta-D-ribofuranosylpurine-6- yl)-carbamoyl)threonine.
tm  2'-O-methyl-5-methyluridine.
um 2'-O-methyluridine.
yw wybutosine.
x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u.

Appendix C to Subpart G of Part 1 - List of Amino Acids

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Symbol Meaning
Ala  Alanine.
Cys  Cysteine.
Asp  Aspartic Acid.
Glu  Glutamic Acid.
Phe  Phenylalanine.
Gly  Glycine.
His  Histidine.
Ile  Isoleucine.
Lys  Lysine.
Leu  Leucine.
Met  Methionine.
Asn  Asparagine.
Pro  Proline.
Gln  Glutamine.
Arg  Arginine.
Ser  Serine.
Thr  Threonine.
Val  Valine.
Trp  Tryptophan.
Tyr  Tyrosine.
Asx  Asp or Asn.
Glx  Glu or Gln.
Xaa  unknown or other.

Appendix D to Subpart G of Part 1 - List of Modified and Unusual Amino Acids

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Symbol Meaning
Aad  2-Aminoadipic acid.
bAad  3-Aminoadipic acid.
bAla  beta-Alanine, beta-Aminopropionic acid.
Abu  2-Aminobutyric acid.
4Abu  4-Aminobutyric acid, piperidinic acid.
Acp  6-Aminocaproic acid.
Ahe  2-Aminoheptanoic acid.
Aib  2-Aminoisobutyric acid.
bAib  3-Aminoisobutyric acid.
Apm  2-Aminopimelic acid.
Dbu  2,4 Diaminobutyric acid.
Des  Desmosine.
Dpm  2,2'-Diaminopimelic acid.
Dpr  2,3-Diaminopropionic acid.
EtGly  N-Ethylglycine.
EtAsn  N-Ethylasparagine.
Hyl  Hydroxylysine.
aHyl  allo-Hydroxylysine.
3Hyp  3-Hydroxyproline.
4Hyp  4-Hydroxyproline.
Ide  Isodesmosine.
aIle  allo-Isoleucine.
MeGly  N-Methylglycine, sarcosine.
MeIle  N-Methylisoleucine.
MeLys  6-N-Methyllysine.
MeVal  N-Methylvaline.
Nva  Norvaline.
Nle  Norleucine.
Orn  Ornithine.

Appendix E to Subpart G of Part 1 - List of Feature Keys Related to Nucleotide Sequences

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Key Description
allele  a related individual or strain contains stable, alternative forms of the same gene, which differs from the presented sequence at this location (and perhaps others).
attenuator  (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription.
C_region  constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain.
CAAT_signal  CAAT box; part of a conserved sequence located about 75 bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT.
CDS  coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation.
conflict  independent determinations of the "same'' sequence differ at this site or region.
D-loop  displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein.
D-segment  diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain.
enhancer  a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter.
exon  region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR.
GC_signal  GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG.
gene  region of biological interest identified as a gene and for which a name has been assigned.
iDNA  intervening DNA; DNA which is eliminated through any of several kinds of recombination.
intron  a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it.
J_segment  joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
LTR  long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses.
mat_peptide  mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post- translational modification; the location does not include the stop codon (unlike the corresponding CDS).
misc_binding  site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind).
misc_difference  feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base).
misc_feature  region of biological interest which cannot be described by any other feature key; a new or rare feature.
misc_recomb  site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral).
misc_RNA  any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA).
misc_signal  any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, –35_signal, –10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin).
misc_structure  any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop).
modified_base  the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value).
mRNA  messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR).
mutation  a related strain has an abrupt, inheritable change in the sequence at this location.
N_region  extra nucleotides inserted between rearranged immunoglobulin segments.
old_sequence  the presented sequence revises a previous version of the sequence at this location.
polyA_signal  recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA.
polyA_site  site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation.
precursor_RNA  any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip).
prim_transcript  primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip).
primer_bind  non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements.
promoter  region on a DNA molecule involved in RNA polymerase binding to initiate transcription.
protein_bind  non-covalent protein binding site on nucleic acid.
RBS  ribosome binding site.
repeat_region  region of genome containing repeating units.
repeat_unit  single repeat element.
rep_origin  origin of replication; starting site for duplication of nucleic acid to give two identical copies.
rRNA  mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins.
S_region  switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell.
satellite  many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA.
scRNA  small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote.
sig_peptide  signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence.
snRNA  small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions.
source  identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissible.
stem_loop  hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA.
STS  Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs.
TATA_signal  TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T).
terminator  sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein.
transit_peptide  transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle.
tRNA  mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence.
unsure  author is unsure of exact sequence in this region.
V_region  variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments.
V_segment  variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide.
variation  a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others).
3'clip  3'-most region of a precursor transcript that is clipped off during processing.
3'UTR  region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein.
5'clip  5'-most region of a precursor transcript that is clipped off during processing.
5'UTR  region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein.
–10_signal  pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT.
–35_signal  a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ].

Appendix F to Subpart G of Part 1-List of Feature Keys Related to Protein Sequences

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Key Description
CONFLICT  different papers report differing sequences.
VARIANT  authors report that sequence variants exist.
VARSPLIC  description of sequence variants produced by alternative splicing.
MUTAGEN  site which has been experimentally altered.
MOD_RES  post-translational modification of a residue.
ACETYLATION  N-terminal or other.
AMIDATION  generally at the C-terminal of a mature active peptide.
BLOCKED  undetermined N- or C-terminal blocking group.
FORMYLATION  of the N-terminal methionine.
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION. of asparagine, aspartic acid, proline, or lysine.
METHYLATION  generally of lysine or arginine.
PHOSPHORYLATION  of serine, threonine, tyrosine, aspartic acid or histidine.
PYRROLIDONE CARBOXYLIC ACID  N-terminal glutamate which has formed an internal cyclic lactam.
SULFATATION  generally of tyrosine.
LIPID  covalent binding of a lipidic moiety.
MYRISTATE  myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue.
PALMITATE  palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue.
FARNESYL  farnesyl group attached through a thioether bond to a cysteine residue.
GERANYL-GERANYL  geranyl-geranyl group attached through a thioether bond to a cysteine residue.
GPI-ANCHOR  glycosyl-phosphatidylinositol (GPI) group linked to the alpha- carboxyl group of the C-terminal residue of the mature form of a protein.
N-ACYL DIGLYCERIDE  N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide- linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages.
DISULFID  disulfide bond; the 'FROM' and 'TO' endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the `FROM' and `TO' endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link.
THIOLEST  thiolester bond; the 'FROM' and 'TO' endpoints represent the two residues which are linked by the thiolester bond.
THIOETH  thioether bond; the 'FROM' and 'TO' endpoints represent the two residues which are linked by the thioether bond.
CARBOHYD  glycosylation site; the nature of the carbohydrate (if known) is given in the description field.
METAL  binding site for a metal ion; the description field indicates the nature of the metal.
BINDING  binding site for any chemical group (co- enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field.
SIGNAL  extent of a signal sequence (prepeptide).
TRANSIT  extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody).
PROPEP  extent of a propeptide.
CHAIN  extent of a polypeptide chain in the mature protein.
PEPTIDE  extent of a released active peptide.
DOMAIN  extent of a domain of interest on the sequence; the nature of that domain is given in the description field.
CA_BIND  extent of a calcium-binding region.
DNA_BIND  extent of a DNA-binding region.
NP_BIND  extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field.
TRANSMEM  extent of a transmembrane region.
ZN_FING  extent of a zinc finger region.
SIMILAR  extent of a similarity with another protein sequence; precise information, relative to that sequence, is given in the description field.
REPEAT  extent of an internal sequence repetition.
HELIX  secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi- helix.
STRAND  secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge.
TURN  secondary structure Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn).
ACT_SITE  amino acid(s) involved in the activity of an enzyme.
SITE  any other interesting site on the sequence.
INIT_MET  the sequence is known to start with an initiator methionine.
NON_TER  the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N- terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key.
NON_CONS  non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them.
UNSURE  uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment.

Appendix G to Subpart G of Part 1 - Numeric Identifiers

Numeric Identifier Definition Comments and format Mandatory (M) or optional (O)
<110>  Applicant  If Applicant is inventor, then preferably max. of 10 names; one name per line; preferable format: Surname, Other Names and/or Initials. M.
<120>  Title of Invention  M.
<130>  File Reference Personal file reference  M when filed prior to assignment or appl. number.
<140>  Current Application Number  Specify as: US 09/999,999 or PCT/US09/99999  M, if available.
<141>  Current Filing Date Specify as: yyyy-mm-dd  M, if available.
<150>  Prior Application Number  Specify as: US 09/999,999 or PCT/US09/99999  M, if applicable include priority documents under 35 U.S.C. 119 and 120.
<151>  Prior Application Filing Date  Specify as: yyyy-mm-dd  M, if applicable.
<160>  Number of SEQ ID NOs  Count includes total number of SEQ ID NOs  M.
<170>  Software  Name of software used to create the "Sequence Listing" O.
<210>  SEQ ID NO:#: Response shall be an integer representing the SEQ ID NO shown M.
<211>  Length  Respond with an integer expressing the number of bases or amino acid residues M.
<212>  Type  Whether presented sequence molecule is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be "DNA.'' In addition, the combined DNA/ RNA molecule shall be further described in the <220> to <223> feature section M.
<213>  Organism  Scientific name, i.e., Genus/species, Unknown or Artificial Sequence. In addition, the "Unknown'' or "Artificial Sequence'' organisms shall be further described in the <220> to <223> feature section M.
<220>  Feature  Leave blank after <220>. <221-223> provide for a description of points of biological significance in the sequence  M, under the following conditions: If "n," "Xaa," or a modified or unusual L-amino acid or modified base was used in a sequence; if ORGANISM is "Artificial Sequence'' or "Unknown''; if molecule is combined DNA/ RNA.
<221>  Name/Key  Provide appropiate identifier for feature, from WIPO Standard ST.25 (2009), Appendices E and F to this subpart  M, under the following conditions: If "n," "Xaa," or a modified or unusual L-amino acid or modified base was used in a sequence.
<222>  Location  Specify location within sequence; where appropriate, state number of first and last bases/amino acids in feature  M, under the following conditions: If "n," "Xaa," or a modified or unusual L-amino acid or modified base was used in a sequence.
<223>  Other Information. Other relevant information; four lines maximum. M, under the following conditions: If "n,'' "Xaa,'' or a modified or unusual L- amino acid or modified base was used in a sequence; if ORGANISM is "Artificial Sequence'' or "Unknown''; if molecule is combined DNA/ RNA.
<300>  Publication Information  Leave blank after <30>. O.
<301>  Authors  Preferably max. of 10 named authors of publication; specify one name per line; preferable format: Surname, Other Names and/or Initials. O.
<302>  Title  O.
<303>  Journal  O.
<304>  Volume  O.
<305>  Issue  O.
<306>  Pages  O.
<307>  Date  Journal date on which data published; specify as yyyy-mm-dd, MMM-yyyy or Season-yyyy O.
<308>  Database Accession Number. Accession number assigned by database, including database name  O.
<309>  Database Entry Date. Date of entry in database; specify as yyyy-mm-dd or MMM-yyyy O.
<310>  Patent Document Number. Document number; for patent-type citations only. Specify as, for example, US 09/ 999,999 O.
<311>  Patent Filing Date. Document filing date, for patent-type citations only; specify as yyyy-mm-dd O.
<312>  Publication Date. Document publication date, for patent-type citations only; specify as yyyy-mm-dd O.
<313>  Relevant Residues. FROM (position) TO (position) O.
<400>  Sequence  SEQ ID NO should follow the numeric identifier and should appear on the line preceding the actual sequence M.