packages: samba/samba-rfc3454.txt (NEW) - new; from git

adamg adamg at pld-linux.org
Mon Nov 9 01:14:55 CET 2009


Author: adamg                        Date: Mon Nov  9 00:14:55 2009 GMT
Module: packages                      Tag: HEAD
---- Log message:
- new; from git

---- Files affected:
packages/samba:
   samba-rfc3454.txt (NONE -> 1.1)  (NEW)

---- Diffs:

================================================================
Index: packages/samba/samba-rfc3454.txt
diff -u /dev/null packages/samba/samba-rfc3454.txt:1.1
--- /dev/null	Mon Nov  9 01:14:55 2009
+++ packages/samba/samba-rfc3454.txt	Mon Nov  9 01:14:49 2009
@@ -0,0 +1,5099 @@
+
+
+
+
+
+
+Network Working Group                                         P. Hoffman
+Request for Comments: 3454                                    IMC & VPNC
+Category: Standards Track                                    M. Blanchet
+                                                                Viagenie
+                                                           December 2002
+
+
+        Preparation of Internationalized Strings ("stringprep")
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2002).  All Rights Reserved.
+
+Abstract
+
+   This document describes a framework for preparing Unicode text
+   strings in order to increase the likelihood that string input and
+   string comparison work in ways that make sense for typical users
+   throughout the world.  The stringprep protocol is useful for protocol
+   identifier values, company and personal names, internationalized
+   domain names, and other text strings.
+
+   This document does not specify how protocols should prepare text
+   strings.  Protocols must create profiles of stringprep in order to
+   fully specify the processing options.
+
+Table of Contents
+
+   1. Introduction....................................................3
+     1.1 Terminology..................................................4
+     1.2 Using stringprep in protocols................................4
+   2. Preparation Overview............................................6
+   3. Mapping.........................................................7
+     3.1 Commonly mapped to nothing...................................7
+     3.2 Case folding.................................................8
+   4. Normalization...................................................9
+   5. Prohibited Output..............................................10
+     5.1 Space characters............................................11
+     5.2 Control characters..........................................11
+     5.3 Private use.................................................12
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 1]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+     5.4 Non-character code points...................................12
+     5.5 Surrogate codes.............................................13
+     5.6 Inappropriate for plain text................................13
+     5.7 Inappropriate for canonical representation..................13
+     5.8 Change display properties or deprecated.....................13
+     5.9 Tagging characters..........................................14
+   6. Bidirectional Characters.......................................14
+   7. Unassigned Code Points in Stringprep Profiles..................15
+     7.1 Categories of code points...................................16
+     7.2 Reasons for difference between stored strings and queries...17
+     7.3 Versions of applications and stored strings.................18
+   8. References.....................................................19
+     8.1 Normative references........................................19
+     8.2 Informative references......................................19
+   9. Security Considerations........................................19
+     9.1 Stringprep-specific security considerations.................19
+     9.2 Generic Unicode security considerations.....................20
+   10. IANA Considerations...........................................21
+   11. Acknowledgements..............................................22
+   A. Unicode repertoires............................................23
+     A.1 Unassigned code points in Unicode 3.2.......................23
+   B. Mapping Tables.................................................31
+     B.1 Commonly mapped to nothing..................................31
+     B.2 Mapping for case-folding used with NFKC.....................32
+     B.3 Mapping for case-folding used with no normalization.........61
+   C. Prohibition tables.............................................78
+     C.1 Space characters............................................78
+       C.1.1 ASCII space characters..................................78
+       C.1.2 Non-ASCII space characters..............................79
+     C.2 Control characters..........................................79
+       C.2.1 ASCII control characters................................79
+       C.2.2 Non-ASCII control characters............................79
+     C.3 Private use.................................................80
+     C.4 Non-character code points...................................80
+     C.5 Surrogate codes.............................................80
+     C.6 Inappropriate for plain text................................80
+     C.7 Inappropriate for canonical representation..................81
+     C.8 Change display properties or are deprecated.................81
+     C.9 Tagging characters..........................................81
+   D. Bidirectional tables...........................................81
+     D.1 Characters with bidirectional property "R" or "AL"..........81
+     D.2 Characters with bidirectional property "L"..................82
+   Authors' Addresses................................................90
+   Full Copyright Statement..........................................91
+
+
+
+
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 2]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+1. Introduction
+
+   Application programs can display text in many different ways.
+   Similarly, a user can enter text into an application program in a
+   myriad of fashions.  Internationalized text (that is, text that is
+   not restricted to the narrow set of US-ASCII characters) has many
+   input and display behaviors that make it difficult to compare text in
+   a consistent fashion.
+
+   This document specifies a framework of processing rules for Unicode
+   text.  Other protocols can create profiles of these rules; these
+   profiles will allow users to enter internationalized text strings in
+   applications and have the highest chance of getting the content of
+   the strings correct.  In this case, "correct" means that if two
+   different people enter what they think is the same string into two
+   different input mechanisms, the strings should match on a character-
+   by-character basis.
+
+   This framework does not describe how data is transcoded from other
+   character sets into Unicode.  In systems that uses non-Unicode
+   character sets, the transcoding algorithm is a critical part of
+   enabling secure and "correct" operation of internationalized text
+   strings.
+
+   In addition to helping string matching, profiles of stringprep can
+   also exclude characters that should not normally appear in text that
+   is used in the protocol.  The profile can prevent such characters by
+   changing the characters to be excluded to other characters, by
+   removing those characters, or by causing an error if the characters
+   would appear in the output.  For example, because the backspace
+   character can cause unpredictable display results, a profile can
+   specify that a string containing a backspace character would cause an
+   error.
+
+   A profile of stringprep converts a single string of input characters
+   to a string of output characters, or returns an error if the output
+   string would contain a prohibited character.  Stringprep profiles
+   cannot both emit a string and return an error.
+
+   Stringprep profiles cannot account for all of the variations that
+   might occur or that a user might expect.  In particular, a profile
+   will not be able to account for choice of spellings in all languages
+   for all scripts because the number of alternative spellings of words
+   and phrases is immense.  Users would probably expect all spelling
+   equivalents to be made equivalent, or none of them to be.  Examples
+   of spelling equivalents include "theater" vs. "theatre", and
+   "hemoglobin" vs. "h<U+00E6>moglobin" in American vs. British English.
+   Other examples are simplified Chinese spellings of names (for
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 3]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   example,"<U+7EDF><U+4E00><U+7801>") vs. the equivalent traditional
+   Chinese spelling (for example, "<U+7D71><U+4E00><U+78BC>").
+   Language-specific equivalences such as "Aepfel" vs. "<U+00C4>pfel",
+   which are sometimes considered equivalent in German, may not be
+   considered equivalent in other languages.
+
+1.1 Terminology
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in BCP 14, RFC 2119
+   [RFC2119].
+
+   Note: A glossary of terms used in Unicode and ISO/IEC 10646 can be
+   found in [Glossary].  Information on the 10646/Unicode character
+   encoding model can be found in [CharModel].
+
+   Character names in this document use the notation for code points and
+   names from the Unicode Standard [Unicode3.2] and ISO/IEC 10646
+   [ISO10646].  For example, the letter "a" may be represented as either
+   "U+0061" or "LATIN SMALL LETTER A".  In the lists of mappings and the
+   prohibited characters, the "U+" is left off to make the lists easier
+   to read.  The comments for character ranges are shown in square
+   brackets (such as "[CONTROL CHARACTERS]") and do not come from the
+   standards.
+
+1.2 Using stringprep in protocols
+
+   The stringprep protocol does not stand on its own; it has to be used
+   by other protocols at precisely-defined places in those other
+   protocols.  For example, a protocol that has strings that come from
+   the entire ISO/IEC 10646 [ISO10646] character repertoire might
+   specify that only strings that have been processed with a particular
+   profile of stringprep are legal.  Another example would be a protocol
+   that does string comparison as a step in the protocol; that protocol
+   might specify that such comparison is done only after processing the
+   strings with a specific profile of stringprep.
+
+   When two protocols that use different profiles of stringprep
+   interoperate, there may be conflict about what characters are and are
+   not allowed in the final string.  Thus, protocol developers should
+   strongly consider re-using existing profiles of stringprep.
+
+   When developers wish to allow users as wide of a range of characters
+   as possible in input text strings, they should, where possible, cause
+   stringprep to convert characters from the input string to a canonical
+   form instead of prohibiting them.
+
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 4]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   Although it would be easy to use the stringprep process to "correct"
+   perceived mis-features or bugs in the current character standards,
+   stringprep profiles SHOULD NOT do so.
+
+   A profile of stringprep can create tables different from those in the
+   appendixes of this document, but it will be an exception when they
+   do.  The intention of stringprep is to define the tables and have the
+   profiles of stringprep select among those defined tables.
+
+   A profile of stringprep MUST include all of the following:
+
+   - The intended applicability of the profile
+
+   - The character repertoire that is the input and output to stringprep
+     (which is Unicode 3.2 for this version of stringprep)
+
+   - The mapping tables from this document used (as described in section
+     3)
+
+   - Any additional mapping tables specific to the profile
+
+   - The Unicode normalization used, if any (as described in section 4)
+
+   - The tables from this document of characters that are prohibited as
+     output (as described in section 5)
+
+   - The bidirectional string testing used, if any (as described in
+     section 6)
+
+   - Any additional characters that are prohibited as output specific to
+     the profile
+
+   Each profile MUST state the character repertoire on which the profile
+   will operate.  Appendix A lists the Unicode repertoires that can be
+   selected.  No repertoire is ever complete, and it is expected that
+   characters will be added to the Unicode repertoire for the
+   foreseeable future.  Section 7 of this document describes how to
+   handle characters that are assigned in later versions of the Unicode
+   repertories.  Subsections of appendix A also list unassigned code
+   points for each repertoire.
+
+   This document is for Unicode version 3.2, and should not be
+   considered to automatically apply to later Unicode versions.  The
+   IETF, through an explicit standards action, may update this document
+   as appropriate to handle later Unicode versions.
+
+
+
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 5]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   This document lists the unassigned code points in the range 0 to
+   10FFFF for Unicode 3.2 in appendix A.  The list in appendix A MUST be
+   used by implementations of this specification.  If there are any
+   discrepancies between the list in appendix A and the Unicode 3.2
+   specification, the list in appendix A always takes precedence.
+
+   Each profile of stringprep MUST be registered with IANA.  The
+   registration procedure is described in the IANA Considerations
+   appendix; basically, the IESG must review each profile of stringprep.
+   Protocol developers are strongly encouraged to look through the IANA
+   profile registry when creating new profiles for stringprep, and to
+   re-use logic from earlier profiles where possible in new profiles.
+   In some cases, an existing profile can be reused by a different
+   protocol.
+
+2. Preparation Overview
+
+   The steps for preparing strings are:
+
+   1) Map -- For each character in the input, check if it has a mapping
+      and, if so, replace it with its mapping.  This is described in
+      section 3.
+
+   2) Normalize -- Possibly normalize the result of step 1 using Unicode
+      normalization.  This is described in section 4.
+
+   3) Prohibit -- Check for any characters that are not allowed in the
+      output.  If any are found, return an error.  This is described in
+      section 5.
+
+   4) Check bidi -- Possibly check for right-to-left characters, and if
+      any are found, make sure that the whole string satisfies the
+      requirements for bidirectional strings.  If the string does not
+      satisfy the requirements for bidirectional strings, return an
+      error.  This is described in section 6.
+
+   The above steps MUST be performed in the order given to comply with
+   this specification.
+
+   The mappings described in section 3, and the optional Unicode
+   normalization described in section 4, can be one-to-none, one-to-one,
+   one-to-many, many-to-one, or many-to-many.  That is, some characters
+   might be eliminated or replaced by more than one character, and the
+   output of this step might be shorter or longer than the input.
+   Because of this, the system using stringprep MUST be prepared to
+   receive a longer or shorter string than the one input in the
+   stringprep algorithm.
+
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 6]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+3. Mapping
+
+   Each character in the input stream MUST be checked against a mapping
+   table.  The mapping table SHOULD come from this document, although
+   the mapping table MAY be added to or altered by the profile.  The
+   mapping tables are subsections of appendix B.
+
+   The lists in appendix B MUST be used by implementations of this
+   specification.  If there are any discrepancies between the lists in
+   appendix B and subsections below, the lists in appendix B always
+   takes precedence.
+
+   For any individual character, the mapping table MAY specify that a
+   character be mapped to nothing, or mapped to one other character, or
+   mapped to a string of other characters.
+
+   Mapped characters are not re-scanned during the mapping step.  That
+   is, if character A at position X is mapped to character B, character
+   B which is now at position X is not checked against the mapping
+   table.
+
+3.1 Commonly mapped to nothing
+
+   The following characters are simply deleted from the input (that is,
+   they are mapped to nothing) because their presence or absence in
+   protocol identifiers should not make two strings different.  They are
+   listed in Table B.1.
+
+   Some characters are only useful in line-based text, and are otherwise
+   invisible and ignored.
+
+   00AD; SOFT HYPHEN
+   1806; MONGOLIAN TODO SOFT HYPHEN
+   200B; ZERO WIDTH SPACE
+   2060; WORD JOINER
+   FEFF; ZERO WIDTH NO-BREAK SPACE
+
+   Some characters affect glyph choice and glyph placement, but do not
+   bear semantics.
+
+   034F; COMBINING GRAPHEME JOINER
+   180B; MONGOLIAN FREE VARIATION SELECTOR ONE
+   180C; MONGOLIAN FREE VARIATION SELECTOR TWO
+   180D; MONGOLIAN FREE VARIATION SELECTOR THREE
+   200C; ZERO WIDTH NON-JOINER
+   200D; ZERO WIDTH JOINER
+   FE00; VARIATION SELECTOR-1
+   FE01; VARIATION SELECTOR-2
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 7]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   FE02; VARIATION SELECTOR-3
+   FE03; VARIATION SELECTOR-4
+   FE04; VARIATION SELECTOR-5
+   FE05; VARIATION SELECTOR-6
+   FE06; VARIATION SELECTOR-7
+   FE07; VARIATION SELECTOR-8
+   FE08; VARIATION SELECTOR-9
+   FE09; VARIATION SELECTOR-10
+   FE0A; VARIATION SELECTOR-11
+   FE0B; VARIATION SELECTOR-12
+   FE0C; VARIATION SELECTOR-13
+   FE0D; VARIATION SELECTOR-14
+   FE0E; VARIATION SELECTOR-15
+   FE0F; VARIATION SELECTOR-16
+
+3.2 Case folding
+
+   If a profile is going to map characters for case-insensitive
+   comparison, that profile SHOULD map using either appendix B.2 or
+   appendix B.3.  appendix B.2 is for profiles that also use Unicode
+   normalization form KC, while appendix  B.3 is for profiles that do
+   not use Unicode normalization.  These tables map from uppercase to
+   lowercase characters.  Note that this could have been "change all
+   lowercase characters into uppercase characters".  However, the
+   upper-to-lower folding was chosen because there is a tradition of
+   using lowercase in current Internet applications and protocols.
+
+   If a profile creates its own mapping tables for case folding, they
+   SHOULD be based on [UTR21], and SHOULD map from uppercase characters
+   to lowercase.  The "CaseFolding.txt" file from the Unicode database
+   SHOULD be used to prepare the mapping table. The profile SHOULD do
+   full case mapping (that is, using statuses C, F, and I).
+
+   If the profile is using Unicode normalization form KC (as described
+   in section 4 of this document), it is important to note that there
+   are some characters that do not have mappings in [UTR21] but still
+   need processing.  These characters include a few Greek characters and
+   many symbols that contain Latin characters.  The list of characters
+   to add to the mapping table can determined by the following
+   algorithm:
+
+   b = NormalizeWithKC(Fold(a));
+   c = NormalizeWithKC(Fold(b));
+   if c is not the same as b, add a mapping for "a to c".
+
+   Because NormalizeWithKC(Fold(c)) always equals c, the table is stable
+   from that point on.
+
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 8]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   Appendix B.3 is derived from the CaseFolding-3.txt file associated
+   with Unicode 3.2; appendix B.2 is based on appendix B.3 with the
+   additional characters added from the algorithm above.
+
+   Authors of profiles of this document need to consider the effects of
+   changing the mapping of any currently-assigned character when
+   updating their profiles.  Adding a new mapping for a currently-
+   assigned character, or changing an existing mapping, could cause a
+   variance between the behavior of systems that have been updated and
+   systems that have not been updated.
+
+4. Normalization
+
+   The output of the mapping step is optionally normalized using one of
+   the Unicode normalization forms, as described in [UAX15].  A profile
+   can specify one of two options for Unicode normalization:
+
+   - no normalization
+
+   - Unicode normalization with form KC
+
+   A profile MAY choose to do no normalization.  However, such a profile
+   can easily yield results that will be surprising to typical users,
+   depending on the input mechanism they use.  For example, some input
+   mechanisms enter compatibility characters that look exactly like the
+   underlying characters, but have different code points.  Another
+   example of where Unicode normalization helps create predictable
+   results is with characters that have multiple combining diacritics:
+   normalization orders those diacritics in a predictable fashion.
+
+   On the other hand, Unicode normalization requires fairly large tables
+   and somewhat complicated character reordering logic.  The size and
+   complexity should not be considered daunting except in the most
+   restricted of environments, and needs to be weighed against the
+   problems of user surprise from comparing unnormalized strings.  Note
+   that the tables used for normalization are not given in this
+   document, but instead must be derived from the Unicode database, as
+   described in [UAX15].
+
+   There is a third form of normalization, Unicode normalization with
+   form C.  If a profile is going to use a Unicode normalization, it
+   MUST use Unicode normalization form KC.  Form KC maps many
+   "compatibility characters" to their equivalents.  Some user interface
+   systems make it possible to enter compatibility characters instead of
+   the base equivalents.  Thus, using form KC instead of form C will
+   cause more strings that users would expect to match to actually
+   match.
+
+
+
+
+Hoffman & Blanchet          Standards Track                     [Page 9]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   A profile that specifies Unicode normalization MUST use the
+   normalization in [UAX15] that is associated with the version of the
+   Unicode character set specified for the profile.
+
+   The composition process described in [UAX15] requires a fixed
+   composition version of Unicode to ensure that strings normalized
+   under one version of Unicode remain normalized under all future
+   versions of Unicode.
+
+   The IETF is relying on Unicode not to change the normalization of
+   currently-assigned characters in future versions of normalization.
+   If a future version of the normalization tables changes the
+   normalized value of an existing character, authors of profiles of
+   this document have to look at the changes very carefully before they
+   update their normalization tables.  Such a change could cause a
+   variance between the behavior of systems that have been updated and
+   systems that have not been updated.
+
+5. Prohibited Output
+
+   Before the text can be emitted, it MUST be checked for prohibited
+   code points.  There are a variety of prohibited code points, as
+   described in this section.  A profile of this document MAY use all or
+   some of the tables in appendix C.
+
+   The stringprep process never emits both an error and a string.  If an
+   error is detected during the checking for prohibited code points,
+   only an error is returned.
+
+   Note that the subsections below describe how the tables in appendix C
+   were formed.  They are here for people who want to understand more,
+   but they should be ignored by implementors.  Implementations that use
+   tables MUST map based on the tables themselves, not based on the
+   descriptions in this section of how the tables were created.
+
+   The lists in appendix C MUST be used by implementations of this
+   specification.  If there are any discrepancies between the lists in
+   appendix C and subsections below, the lists in appendix C always take
+   precedence.
+
+   Some code points listed in one section may also appear in other
+   sections.
+
+   It is important to note that a profile of this document MAY prohibit
+   additional characters.
+
+
+
+
+
+
+Hoffman & Blanchet          Standards Track                    [Page 10]
+
+RFC 3454        Preparation of Internationalized Strings   December 2002
+
+
+   Each subsection of this section has a matching subsection in appendix
+   C.  For example, the characters listed in section 5.1 are listed in
+   appendix C.1.
+
+5.1 Space characters
+
+   Space characters can make accurate visual transcription of strings
+   nearly impossible and could lead to user entry errors in many ways.
+   Note that the list below is split into two tables in appendix C:
+   Table C.1.1 contains the ASCII code points, while Table C.1.2
+   contains the non-ASCII code points.  Most profiles of this document
+   that want to prohibit space characters will want to include both
+   tables.
+
+   0020; SPACE
+   00A0; NO-BREAK SPACE
+   1680; OGHAM SPACE MARK
+   2000; EN QUAD
+   2001; EM QUAD
+   2002; EN SPACE
+   2003; EM SPACE
+   2004; THREE-PER-EM SPACE
+   2005; FOUR-PER-EM SPACE
+   2006; SIX-PER-EM SPACE
+   2007; FIGURE SPACE
<<Diff was trimmed, longer than 597 lines>>


More information about the pld-cvs-commit mailing list