2422.01 Nucleotide and/or Amino Acids Disclosures Requiring a "Sequence Listing" [R-07.2022]
[Editor Note: This section is not applicable to applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). See MPEP §§ 2412-2419 for guidance on WIPO ST.26 requirements for applications filed on or after July 1, 2022.]
I. LENGTH THRESHOLDS
37 CFR 1.821(a) presents a definition for "nucleotide and/or amino acid sequences." This definition sets forth limits, in terms of numbers of amino acids and/or numbers of nucleotides, at or above which compliance with the sequence rules is required. Nucleotide and/or amino acid sequences as used in 37 CFR 1.821 through 37 CFR 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than ten specifically defined nucleotides or four specifically defined amino acids are specifically excluded from 37 CFR 1.821. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in Appendices A-F to 37 CFR part 1, Subpart G (see MPEP § 2422(I)).
The limit of four or more amino acids was established for consistency with limits in place for industry database collections whereas the limit of ten or more nucleotides, while lower than certain industry database limits, was established to encompass those nucleotide sequences to which the smallest probe will bind in a stable manner.
II. REPRESENTATION OF NUCLEIC ACIDS AND AMINO ACIDS
37 CFR 1.821(a)(1) and 37 CFR 1.821(a)(2) present further definitions for those nucleotide and amino acid sequences that are intended to be embraced by the sequence rules. Situations in which the applicability of the rules is in issue will be resolved on a case-by-case basis.
Nucleotide sequences are further limited to those that can be represented by the symbols set forth in 37 CFR 1.822(b) and Appendices A and B to 37 CFR part 1, Subpart G (see MPEP § 2422(I)). The presence of other than typical 5' to 3' phosphodiester linkages in a nucleotide sequence does not render the rules inapplicable. For example, the Office does not want to exclude linkages of the type commonly found in naturally occurring nucleotides, e.g., eukaryotic end capped sequences.
Amino acid sequences are further limited to those in 37 CFR 1.822(b) and Appendices C and D to 37 CFR part 1, Subpart G (see MPEP § 2422(I)) and those L-amino acids that are commonly found in naturally occurring proteins. The presence of one or more D-amino acids in a sequence will exclude that sequence from the scope of the rules. Voluntary compliance is, however, encouraged in these situations; the symbol "Xaa" can be used to represent D-amino acids. The sequence rules embrace "[a]ny peptide or protein that can be expressed as a sequence using the symbols in Appendix C to 37 CFR part 1, Subpart G (see MPEP § 2422(I)) in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc." 37 CFR 1.821(a)(2).
With regard to amino acid sequences, the use of the terms "peptide or protein" implies that the amino acids in a given sequence are linked by at least three consecutive peptide bonds. Accordingly, an amino acid sequence is not excluded from the scope of the rules merely due to the presence of a single non-peptidyl bond. If an amino acid sequence can be represented by a string of amino acid abbreviations, modifications in the sequence, if any, set forth in the Features section, the sequence comes within the scope of the rules. However, the rules are not intended to encompass the subject matter that is generally referred to as synthetic resins.
III. SEQUENCES DISCLOSED IN APPLICATION TEXT
The requirement for compliance in 37 CFR 1.821(c) is directed to "disclosures of nucleotide and/or amino acid sequences." (Emphasis added.) All sequences, whether claimed or not, that meet the length thresholds in 37 CFR 1.821(a) are subject to the "Sequence Listing" rules. The goal of the Office is to build a comprehensive database that can be used for, inter alia, assessing the prior art. It is therefore essential that all sequences, whether only disclosed or also claimed, be included in the database. In those instances in which prior art sequences are only referred to in a given application by name and a publication or accession reference, they need not be included as part of the "Sequence Listing", unless the referred-to sequence is "essential material" per MPEP § 608.01(p). However, if the applicant presents the sequence as a string of particular nucleotide bases or amino acids, whether by way of symbols, words or chemical structure, it is necessary to include the sequence in the "Sequence Listing" regardless of whether the applicant considers the sequence to be prior art, so long as the sequence meets the criteria of 37 CFR 1.821(a). In general, any sequence that is disclosed and/or claimed as a sequence, i.e., as a string of particular nucleotide bases or amino acids, and that otherwise meets the criteria of 37 CFR 1.821(a), must be set forth in the "Sequence Listing".
IV. VARIANTS OF A PRESENTED SEQUENCE
It is generally acceptable to present a single, primary sequence in the specification and "Sequence Listing" by enumeration of its residues in accordance with the sequence rules ("primary sequence") and to discuss and/or claim variants of that primary sequence without presenting each variant as a separate sequence in the "Sequence Listing". Where the variant sequence meets the length thresholds of 37 CFR 1.821(a) and is disclosed by enumeration of its residues anywhere in an application, it must be presented in a "Sequence Listing" in a manner that complies with the requirements of the sequence rules. However, the primary sequence should be annotated in the "Sequence Listing" to reflect such variants. By way of example only, the following types of sequence disclosures would be treated as noted herein by the Office. With respect to a primary sequence and "conservatively modified variants thereof," the sequences may be described as SEQ ID NO:X (the primary sequence) and "conservatively modified variants thereof," if desired. With respect to a sequence that "may be deleted at the C-terminus by 1, 2, 3, 4, or 5 residues," all of the implied variations do not need to be included in the "Sequence Listing". In this latter example, only the sequence without deletions needs to be included in the "Sequence Listing", though applicant is encouraged to annotate the sequence to indicate that deletions have been made at the C-terminus by 1, 2, 3, 4, or 5 residues.
The Office's database will only contain the unmodified sequence. It is strongly recommended that any sequences appearing in the claims, or sequences that are considered essential to understanding the invention, be included in the "Sequence Listing" as a separate sequence.
V. SEQUENCE IDENTIFIER
37 CFR 1.821(d) and 37 CFR 1.823(a)(5) require that each disclosed nucleic acid and/or amino acid sequence in the application appear separately in the "Sequence Listing", with each sequence further being assigned a sequence identifier, referred to as "SEQ ID NO." or the like. The use of "SEQ ID NO:" is preferred, but including "or the like" is intended to ensure that a formalities notice is not sent when an application uses, for example, "SEQ NO." or "Seq. Id. No." or any similar identification for an amino acid or nucleotide sequence in the specification or claims where it is clear that a sequence from the "Sequence Listing" is shown in the description or claims. The sequence identifiers must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the "Sequence Listing" in numerical order and in the order in which they are discussed in the application.
37 CFR 1.821(d) further requires that where the description or claims of a patent application discuss a sequence that is set forth in the "Sequence Listing", a reference to the sequence identifier of that sequence is required at all occurrences, even if in the text of the description or claims where the sequence is set forth by enumeration of its residues. This requirement is also intended to permit references elsewhere in the application (e.g., specification, claims, or drawings) to sequences set forth in the "Sequence Listing" by the use of assigned sequence identifiers without repeating the sequence. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as "residues 14 to 243 of SEQ ID NO:23" is permissible and the fragment need not be separately presented in the "Sequence Listing". Where a sequence that meets the length thresholds of 37 CFR 1.821(a) is disclosed by enumeration of its residues anywhere in an application, it must be presented in a "Sequence Listing" in a manner that complies with the requirements of the sequence rules.
The rules do not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of the rules has had no effect on disclosure and/or claiming requirements. The rules, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112(a) or 35 U.S.C. 112(b). The use of sequence identifiers (SEQ ID NO:X or the like) only provides a shorthand way for applicants to discuss and claim their inventions. These identifiers do not in any way restrict the manner in which an invention can be claimed.