MPEP 2422.01
Nucleotide and/or Amino Acids Disclosures Requiring a Sequence Listing

Ninth Edition of the MPEP, Revision 10.2019, Last Revised in June 2020

Previous: §2422 | Next: §2422.02

2422.01    Nucleotide and/or Amino Acids Disclosures Requiring a Sequence Listing [R-10.2019]


37 CFR 1.821(a) presents a definition for "nucleotide and/or amino acid sequences." This definition sets forth limits, in terms of numbers of amino acids and/or numbers of nucleotides, at or above which compliance with the sequence rules is required. Nucleotide and/or amino acid sequences as used in 37 CFR 1.821 through 37 CFR 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than ten specifically defined nucleotides or four specifically defined amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2 (see MPEP § 2422).

The limit of four or more amino acids was established for consistency with limits in place for industry database collections whereas the limit of ten or more nucleotides, while lower than certain industry database limits, was established to encompass those nucleotide sequences to which the smallest probe will bind in a stable manner.


37 CFR 1.821(a)(1) and 37 CFR 1.821(a)(2) present further definitions for those nucleotide and amino acid sequences that are intended to be embraced by the sequence rules. Situations in which the applicability of the rules is in issue will be resolved on a case-by-case basis.

Nucleotide sequences are further limited to those that can be represented by the symbols set forth in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 1 (see MPEP § 2422). The presence of other than typical 5' to 3' phosphodiester linkages in a nucleotide sequence does not render the rules inapplicable. The Office does not want to exclude linkages of the type commonly found in naturally occurring nucleotides, e.g., eukaryotic end capped sequences.

Amino acid sequences are further limited to those listed in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 3 (see MPEP § 2422), and those L-amino acids that are commonly found in naturally occurring proteins. The presence of one or more D-amino acids in a sequence will exclude that sequence from the scope of the rules. Voluntary compliance is, however, encouraged in these situations; the symbol "Xaa" can be used to represent D-amino acids. The sequence rules embrace "[a]ny peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc." 37 CFR 1.821(a)(2).

With regard to amino acid sequences, the use of the terms "peptide or protein" implies, however, that the amino acids in a given sequence are linked by at least three consecutive peptide bonds. Accordingly, an amino acid sequence is not excluded from the scope of the rules merely due to the presence of a single non-peptidyl bond. If an amino acid sequence can be represented by a string of amino acid abbreviations, with reference, where necessary, to a features table to explain modifications in the sequence, the sequence comes within the scope of the rules. However, the rules are not intended to encompass the subject matter that is generally referred to as synthetic resins.


The requirement for compliance in 37 CFR 1.821(c) is directed to "disclosures of nucleotide and/or amino acid sequences." (Emphasis added.) All sequence information, whether claimed or not, that meets the length thresholds in 37 CFR 1.821(a) is subject to the rules. The goal of the Office is to build a comprehensive database that can be used for, inter alia, assessing the prior art. It is therefore essential that all sequence information, whether only disclosed or also claimed, be included in the database. In those instances in which prior art sequences are only referred to in a given application by name and a publication or accession reference, they need not be included as part of the sequence listing, unless the referred-to sequence is "essential material" per MPEP § 608.01(p). However, if the applicant presents the sequence as a string of particular nucleotide bases or amino acids, it is necessary to include the sequence in the sequence listing regardless of whether the applicant considers the sequence to be prior art. In general, any sequence that is disclosed and/or claimed as a sequence, i.e., as a string of particular nucleotide bases or amino acids, and that otherwise meets the criteria of 37 CFR 1.821(a), must be set forth in the sequence listing.


It is generally acceptable to present a single, primary sequence in the specification and sequence listing by enumeration of its residues in accordance with the sequence rules ("primary sequence") and to discuss and/or claim variants of that primary sequence without presenting each variant as a separate sequence in the sequence listing. However, the primary sequence should be annotated in the sequence listing to reflect such variants. By way of example only, the following types of sequence disclosures would be treated as noted herein by the Office. With respect to a primary sequence and "conservatively modified variants thereof," the sequences may be described as SEQ ID NO:X (the primary sequence) and "conservatively modified variants thereof," if desired. With respect to a sequence that "may be deleted at the C-terminus by 1, 2, 3, 4, or 5 residues," all of the implied variations do not need to be included in the sequence listing. In this latter example, only the sequence without deletions needs to be included in the sequence listing, however applicant is encouraged to annotate the sequence to indicate that deletions have been made at the C-terminus by 1, 2, 3, 4, or 5 residues.

The Office's database will only contain the unmodified sequence. It is strongly recommended that any sequences appearing in the claims, or sequences that are considered essential to understanding the invention, be included in the sequence listing as a separate sequence.


37 CFR 1.821(c) requires that each disclosed nucleic acid or amino acid sequence in the application appear separately in the sequence listing, with each sequence further being assigned a sequence identifier, referred to as "SEQ ID NO." The sequence identifiers must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the sequence listing in numerical order and in the order in which they are discussed in the application.

37 CFR 1.821(d) requires that where the description or claims of a patent application discuss a sequence that is set forth in the sequence listing, a reference to the sequence identifier of that sequence is required at all occurrences, even if in the text of the description or claims that sequence is set forth by enumeration of its residues. This requirement is also intended to permit references elsewhere in the application (e.g., specification, claims, or drawings) to sequences set forth in the sequence listing by the use of assigned sequence identifiers without repeating the sequence. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as "residues 14 to 243 of SEQ ID NO:23" is permissible and the fragment need not be separately presented in the sequence listing. Where a sequence that meets the length thresholds of 37 CFR 1.821(a) is disclosed by enumeration of its residues anywhere in an application, it must be presented in a sequence listing in a manner that complies with the requirements of the sequence rules.

The rules do not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of the rules has had no effect on disclosure and/or claiming requirements. The rules, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112(a) or 35 U.S.C. 112(b). The use of sequence identifiers (SEQ ID NO:X) only provides a shorthand way for applicants to discuss and claim their inventions. These identification numbers do not in any way restrict the manner in which an invention can be claimed.