packages: samba/samba-rfc3454.txt (NEW) - new; from git
adamg
adamg at pld-linux.org
Mon Nov 9 01:14:55 CET 2009
Author: adamg Date: Mon Nov 9 00:14:55 2009 GMT
Module: packages Tag: HEAD
---- Log message:
- new; from git
---- Files affected:
packages/samba:
samba-rfc3454.txt (NONE -> 1.1) (NEW)
---- Diffs:
================================================================
Index: packages/samba/samba-rfc3454.txt
diff -u /dev/null packages/samba/samba-rfc3454.txt:1.1
--- /dev/null Mon Nov 9 01:14:55 2009
+++ packages/samba/samba-rfc3454.txt Mon Nov 9 01:14:49 2009
@@ -0,0 +1,5099 @@
+
+
+
+
+
+
+Network Working Group P. Hoffman
+Request for Comments: 3454 IMC & VPNC
+Category: Standards Track M. Blanchet
+ Viagenie
+ December 2002
+
+
+ Preparation of Internationalized Strings ("stringprep")
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2002). All Rights Reserved.
+
+Abstract
+
+ This document describes a framework for preparing Unicode text
+ strings in order to increase the likelihood that string input and
+ string comparison work in ways that make sense for typical users
+ throughout the world. The stringprep protocol is useful for protocol
+ identifier values, company and personal names, internationalized
+ domain names, and other text strings.
+
+ This document does not specify how protocols should prepare text
+ strings. Protocols must create profiles of stringprep in order to
+ fully specify the processing options.
+
+Table of Contents
+
+ 1. Introduction....................................................3
+ 1.1 Terminology..................................................4
+ 1.2 Using stringprep in protocols................................4
+ 2. Preparation Overview............................................6
+ 3. Mapping.........................................................7
+ 3.1 Commonly mapped to nothing...................................7
+ 3.2 Case folding.................................................8
+ 4. Normalization...................................................9
+ 5. Prohibited Output..............................................10
+ 5.1 Space characters............................................11
+ 5.2 Control characters..........................................11
+ 5.3 Private use.................................................12
+
+
+
+Hoffman & Blanchet Standards Track [Page 1]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ 5.4 Non-character code points...................................12
+ 5.5 Surrogate codes.............................................13
+ 5.6 Inappropriate for plain text................................13
+ 5.7 Inappropriate for canonical representation..................13
+ 5.8 Change display properties or deprecated.....................13
+ 5.9 Tagging characters..........................................14
+ 6. Bidirectional Characters.......................................14
+ 7. Unassigned Code Points in Stringprep Profiles..................15
+ 7.1 Categories of code points...................................16
+ 7.2 Reasons for difference between stored strings and queries...17
+ 7.3 Versions of applications and stored strings.................18
+ 8. References.....................................................19
+ 8.1 Normative references........................................19
+ 8.2 Informative references......................................19
+ 9. Security Considerations........................................19
+ 9.1 Stringprep-specific security considerations.................19
+ 9.2 Generic Unicode security considerations.....................20
+ 10. IANA Considerations...........................................21
+ 11. Acknowledgements..............................................22
+ A. Unicode repertoires............................................23
+ A.1 Unassigned code points in Unicode 3.2.......................23
+ B. Mapping Tables.................................................31
+ B.1 Commonly mapped to nothing..................................31
+ B.2 Mapping for case-folding used with NFKC.....................32
+ B.3 Mapping for case-folding used with no normalization.........61
+ C. Prohibition tables.............................................78
+ C.1 Space characters............................................78
+ C.1.1 ASCII space characters..................................78
+ C.1.2 Non-ASCII space characters..............................79
+ C.2 Control characters..........................................79
+ C.2.1 ASCII control characters................................79
+ C.2.2 Non-ASCII control characters............................79
+ C.3 Private use.................................................80
+ C.4 Non-character code points...................................80
+ C.5 Surrogate codes.............................................80
+ C.6 Inappropriate for plain text................................80
+ C.7 Inappropriate for canonical representation..................81
+ C.8 Change display properties or are deprecated.................81
+ C.9 Tagging characters..........................................81
+ D. Bidirectional tables...........................................81
+ D.1 Characters with bidirectional property "R" or "AL"..........81
+ D.2 Characters with bidirectional property "L"..................82
+ Authors' Addresses................................................90
+ Full Copyright Statement..........................................91
+
+
+
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 2]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+1. Introduction
+
+ Application programs can display text in many different ways.
+ Similarly, a user can enter text into an application program in a
+ myriad of fashions. Internationalized text (that is, text that is
+ not restricted to the narrow set of US-ASCII characters) has many
+ input and display behaviors that make it difficult to compare text in
+ a consistent fashion.
+
+ This document specifies a framework of processing rules for Unicode
+ text. Other protocols can create profiles of these rules; these
+ profiles will allow users to enter internationalized text strings in
+ applications and have the highest chance of getting the content of
+ the strings correct. In this case, "correct" means that if two
+ different people enter what they think is the same string into two
+ different input mechanisms, the strings should match on a character-
+ by-character basis.
+
+ This framework does not describe how data is transcoded from other
+ character sets into Unicode. In systems that uses non-Unicode
+ character sets, the transcoding algorithm is a critical part of
+ enabling secure and "correct" operation of internationalized text
+ strings.
+
+ In addition to helping string matching, profiles of stringprep can
+ also exclude characters that should not normally appear in text that
+ is used in the protocol. The profile can prevent such characters by
+ changing the characters to be excluded to other characters, by
+ removing those characters, or by causing an error if the characters
+ would appear in the output. For example, because the backspace
+ character can cause unpredictable display results, a profile can
+ specify that a string containing a backspace character would cause an
+ error.
+
+ A profile of stringprep converts a single string of input characters
+ to a string of output characters, or returns an error if the output
+ string would contain a prohibited character. Stringprep profiles
+ cannot both emit a string and return an error.
+
+ Stringprep profiles cannot account for all of the variations that
+ might occur or that a user might expect. In particular, a profile
+ will not be able to account for choice of spellings in all languages
+ for all scripts because the number of alternative spellings of words
+ and phrases is immense. Users would probably expect all spelling
+ equivalents to be made equivalent, or none of them to be. Examples
+ of spelling equivalents include "theater" vs. "theatre", and
+ "hemoglobin" vs. "h<U+00E6>moglobin" in American vs. British English.
+ Other examples are simplified Chinese spellings of names (for
+
+
+
+Hoffman & Blanchet Standards Track [Page 3]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ example,"<U+7EDF><U+4E00><U+7801>") vs. the equivalent traditional
+ Chinese spelling (for example, "<U+7D71><U+4E00><U+78BC>").
+ Language-specific equivalences such as "Aepfel" vs. "<U+00C4>pfel",
+ which are sometimes considered equivalent in German, may not be
+ considered equivalent in other languages.
+
+1.1 Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in BCP 14, RFC 2119
+ [RFC2119].
+
+ Note: A glossary of terms used in Unicode and ISO/IEC 10646 can be
+ found in [Glossary]. Information on the 10646/Unicode character
+ encoding model can be found in [CharModel].
+
+ Character names in this document use the notation for code points and
+ names from the Unicode Standard [Unicode3.2] and ISO/IEC 10646
+ [ISO10646]. For example, the letter "a" may be represented as either
+ "U+0061" or "LATIN SMALL LETTER A". In the lists of mappings and the
+ prohibited characters, the "U+" is left off to make the lists easier
+ to read. The comments for character ranges are shown in square
+ brackets (such as "[CONTROL CHARACTERS]") and do not come from the
+ standards.
+
+1.2 Using stringprep in protocols
+
+ The stringprep protocol does not stand on its own; it has to be used
+ by other protocols at precisely-defined places in those other
+ protocols. For example, a protocol that has strings that come from
+ the entire ISO/IEC 10646 [ISO10646] character repertoire might
+ specify that only strings that have been processed with a particular
+ profile of stringprep are legal. Another example would be a protocol
+ that does string comparison as a step in the protocol; that protocol
+ might specify that such comparison is done only after processing the
+ strings with a specific profile of stringprep.
+
+ When two protocols that use different profiles of stringprep
+ interoperate, there may be conflict about what characters are and are
+ not allowed in the final string. Thus, protocol developers should
+ strongly consider re-using existing profiles of stringprep.
+
+ When developers wish to allow users as wide of a range of characters
+ as possible in input text strings, they should, where possible, cause
+ stringprep to convert characters from the input string to a canonical
+ form instead of prohibiting them.
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 4]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ Although it would be easy to use the stringprep process to "correct"
+ perceived mis-features or bugs in the current character standards,
+ stringprep profiles SHOULD NOT do so.
+
+ A profile of stringprep can create tables different from those in the
+ appendixes of this document, but it will be an exception when they
+ do. The intention of stringprep is to define the tables and have the
+ profiles of stringprep select among those defined tables.
+
+ A profile of stringprep MUST include all of the following:
+
+ - The intended applicability of the profile
+
+ - The character repertoire that is the input and output to stringprep
+ (which is Unicode 3.2 for this version of stringprep)
+
+ - The mapping tables from this document used (as described in section
+ 3)
+
+ - Any additional mapping tables specific to the profile
+
+ - The Unicode normalization used, if any (as described in section 4)
+
+ - The tables from this document of characters that are prohibited as
+ output (as described in section 5)
+
+ - The bidirectional string testing used, if any (as described in
+ section 6)
+
+ - Any additional characters that are prohibited as output specific to
+ the profile
+
+ Each profile MUST state the character repertoire on which the profile
+ will operate. Appendix A lists the Unicode repertoires that can be
+ selected. No repertoire is ever complete, and it is expected that
+ characters will be added to the Unicode repertoire for the
+ foreseeable future. Section 7 of this document describes how to
+ handle characters that are assigned in later versions of the Unicode
+ repertories. Subsections of appendix A also list unassigned code
+ points for each repertoire.
+
+ This document is for Unicode version 3.2, and should not be
+ considered to automatically apply to later Unicode versions. The
+ IETF, through an explicit standards action, may update this document
+ as appropriate to handle later Unicode versions.
+
+
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 5]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ This document lists the unassigned code points in the range 0 to
+ 10FFFF for Unicode 3.2 in appendix A. The list in appendix A MUST be
+ used by implementations of this specification. If there are any
+ discrepancies between the list in appendix A and the Unicode 3.2
+ specification, the list in appendix A always takes precedence.
+
+ Each profile of stringprep MUST be registered with IANA. The
+ registration procedure is described in the IANA Considerations
+ appendix; basically, the IESG must review each profile of stringprep.
+ Protocol developers are strongly encouraged to look through the IANA
+ profile registry when creating new profiles for stringprep, and to
+ re-use logic from earlier profiles where possible in new profiles.
+ In some cases, an existing profile can be reused by a different
+ protocol.
+
+2. Preparation Overview
+
+ The steps for preparing strings are:
+
+ 1) Map -- For each character in the input, check if it has a mapping
+ and, if so, replace it with its mapping. This is described in
+ section 3.
+
+ 2) Normalize -- Possibly normalize the result of step 1 using Unicode
+ normalization. This is described in section 4.
+
+ 3) Prohibit -- Check for any characters that are not allowed in the
+ output. If any are found, return an error. This is described in
+ section 5.
+
+ 4) Check bidi -- Possibly check for right-to-left characters, and if
+ any are found, make sure that the whole string satisfies the
+ requirements for bidirectional strings. If the string does not
+ satisfy the requirements for bidirectional strings, return an
+ error. This is described in section 6.
+
+ The above steps MUST be performed in the order given to comply with
+ this specification.
+
+ The mappings described in section 3, and the optional Unicode
+ normalization described in section 4, can be one-to-none, one-to-one,
+ one-to-many, many-to-one, or many-to-many. That is, some characters
+ might be eliminated or replaced by more than one character, and the
+ output of this step might be shorter or longer than the input.
+ Because of this, the system using stringprep MUST be prepared to
+ receive a longer or shorter string than the one input in the
+ stringprep algorithm.
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 6]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+3. Mapping
+
+ Each character in the input stream MUST be checked against a mapping
+ table. The mapping table SHOULD come from this document, although
+ the mapping table MAY be added to or altered by the profile. The
+ mapping tables are subsections of appendix B.
+
+ The lists in appendix B MUST be used by implementations of this
+ specification. If there are any discrepancies between the lists in
+ appendix B and subsections below, the lists in appendix B always
+ takes precedence.
+
+ For any individual character, the mapping table MAY specify that a
+ character be mapped to nothing, or mapped to one other character, or
+ mapped to a string of other characters.
+
+ Mapped characters are not re-scanned during the mapping step. That
+ is, if character A at position X is mapped to character B, character
+ B which is now at position X is not checked against the mapping
+ table.
+
+3.1 Commonly mapped to nothing
+
+ The following characters are simply deleted from the input (that is,
+ they are mapped to nothing) because their presence or absence in
+ protocol identifiers should not make two strings different. They are
+ listed in Table B.1.
+
+ Some characters are only useful in line-based text, and are otherwise
+ invisible and ignored.
+
+ 00AD; SOFT HYPHEN
+ 1806; MONGOLIAN TODO SOFT HYPHEN
+ 200B; ZERO WIDTH SPACE
+ 2060; WORD JOINER
+ FEFF; ZERO WIDTH NO-BREAK SPACE
+
+ Some characters affect glyph choice and glyph placement, but do not
+ bear semantics.
+
+ 034F; COMBINING GRAPHEME JOINER
+ 180B; MONGOLIAN FREE VARIATION SELECTOR ONE
+ 180C; MONGOLIAN FREE VARIATION SELECTOR TWO
+ 180D; MONGOLIAN FREE VARIATION SELECTOR THREE
+ 200C; ZERO WIDTH NON-JOINER
+ 200D; ZERO WIDTH JOINER
+ FE00; VARIATION SELECTOR-1
+ FE01; VARIATION SELECTOR-2
+
+
+
+Hoffman & Blanchet Standards Track [Page 7]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ FE02; VARIATION SELECTOR-3
+ FE03; VARIATION SELECTOR-4
+ FE04; VARIATION SELECTOR-5
+ FE05; VARIATION SELECTOR-6
+ FE06; VARIATION SELECTOR-7
+ FE07; VARIATION SELECTOR-8
+ FE08; VARIATION SELECTOR-9
+ FE09; VARIATION SELECTOR-10
+ FE0A; VARIATION SELECTOR-11
+ FE0B; VARIATION SELECTOR-12
+ FE0C; VARIATION SELECTOR-13
+ FE0D; VARIATION SELECTOR-14
+ FE0E; VARIATION SELECTOR-15
+ FE0F; VARIATION SELECTOR-16
+
+3.2 Case folding
+
+ If a profile is going to map characters for case-insensitive
+ comparison, that profile SHOULD map using either appendix B.2 or
+ appendix B.3. appendix B.2 is for profiles that also use Unicode
+ normalization form KC, while appendix B.3 is for profiles that do
+ not use Unicode normalization. These tables map from uppercase to
+ lowercase characters. Note that this could have been "change all
+ lowercase characters into uppercase characters". However, the
+ upper-to-lower folding was chosen because there is a tradition of
+ using lowercase in current Internet applications and protocols.
+
+ If a profile creates its own mapping tables for case folding, they
+ SHOULD be based on [UTR21], and SHOULD map from uppercase characters
+ to lowercase. The "CaseFolding.txt" file from the Unicode database
+ SHOULD be used to prepare the mapping table. The profile SHOULD do
+ full case mapping (that is, using statuses C, F, and I).
+
+ If the profile is using Unicode normalization form KC (as described
+ in section 4 of this document), it is important to note that there
+ are some characters that do not have mappings in [UTR21] but still
+ need processing. These characters include a few Greek characters and
+ many symbols that contain Latin characters. The list of characters
+ to add to the mapping table can determined by the following
+ algorithm:
+
+ b = NormalizeWithKC(Fold(a));
+ c = NormalizeWithKC(Fold(b));
+ if c is not the same as b, add a mapping for "a to c".
+
+ Because NormalizeWithKC(Fold(c)) always equals c, the table is stable
+ from that point on.
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 8]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ Appendix B.3 is derived from the CaseFolding-3.txt file associated
+ with Unicode 3.2; appendix B.2 is based on appendix B.3 with the
+ additional characters added from the algorithm above.
+
+ Authors of profiles of this document need to consider the effects of
+ changing the mapping of any currently-assigned character when
+ updating their profiles. Adding a new mapping for a currently-
+ assigned character, or changing an existing mapping, could cause a
+ variance between the behavior of systems that have been updated and
+ systems that have not been updated.
+
+4. Normalization
+
+ The output of the mapping step is optionally normalized using one of
+ the Unicode normalization forms, as described in [UAX15]. A profile
+ can specify one of two options for Unicode normalization:
+
+ - no normalization
+
+ - Unicode normalization with form KC
+
+ A profile MAY choose to do no normalization. However, such a profile
+ can easily yield results that will be surprising to typical users,
+ depending on the input mechanism they use. For example, some input
+ mechanisms enter compatibility characters that look exactly like the
+ underlying characters, but have different code points. Another
+ example of where Unicode normalization helps create predictable
+ results is with characters that have multiple combining diacritics:
+ normalization orders those diacritics in a predictable fashion.
+
+ On the other hand, Unicode normalization requires fairly large tables
+ and somewhat complicated character reordering logic. The size and
+ complexity should not be considered daunting except in the most
+ restricted of environments, and needs to be weighed against the
+ problems of user surprise from comparing unnormalized strings. Note
+ that the tables used for normalization are not given in this
+ document, but instead must be derived from the Unicode database, as
+ described in [UAX15].
+
+ There is a third form of normalization, Unicode normalization with
+ form C. If a profile is going to use a Unicode normalization, it
+ MUST use Unicode normalization form KC. Form KC maps many
+ "compatibility characters" to their equivalents. Some user interface
+ systems make it possible to enter compatibility characters instead of
+ the base equivalents. Thus, using form KC instead of form C will
+ cause more strings that users would expect to match to actually
+ match.
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 9]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ A profile that specifies Unicode normalization MUST use the
+ normalization in [UAX15] that is associated with the version of the
+ Unicode character set specified for the profile.
+
+ The composition process described in [UAX15] requires a fixed
+ composition version of Unicode to ensure that strings normalized
+ under one version of Unicode remain normalized under all future
+ versions of Unicode.
+
+ The IETF is relying on Unicode not to change the normalization of
+ currently-assigned characters in future versions of normalization.
+ If a future version of the normalization tables changes the
+ normalized value of an existing character, authors of profiles of
+ this document have to look at the changes very carefully before they
+ update their normalization tables. Such a change could cause a
+ variance between the behavior of systems that have been updated and
+ systems that have not been updated.
+
+5. Prohibited Output
+
+ Before the text can be emitted, it MUST be checked for prohibited
+ code points. There are a variety of prohibited code points, as
+ described in this section. A profile of this document MAY use all or
+ some of the tables in appendix C.
+
+ The stringprep process never emits both an error and a string. If an
+ error is detected during the checking for prohibited code points,
+ only an error is returned.
+
+ Note that the subsections below describe how the tables in appendix C
+ were formed. They are here for people who want to understand more,
+ but they should be ignored by implementors. Implementations that use
+ tables MUST map based on the tables themselves, not based on the
+ descriptions in this section of how the tables were created.
+
+ The lists in appendix C MUST be used by implementations of this
+ specification. If there are any discrepancies between the lists in
+ appendix C and subsections below, the lists in appendix C always take
+ precedence.
+
+ Some code points listed in one section may also appear in other
+ sections.
+
+ It is important to note that a profile of this document MAY prohibit
+ additional characters.
+
+
+
+
+
+
+Hoffman & Blanchet Standards Track [Page 10]
+
+RFC 3454 Preparation of Internationalized Strings December 2002
+
+
+ Each subsection of this section has a matching subsection in appendix
+ C. For example, the characters listed in section 5.1 are listed in
+ appendix C.1.
+
+5.1 Space characters
+
+ Space characters can make accurate visual transcription of strings
+ nearly impossible and could lead to user entry errors in many ways.
+ Note that the list below is split into two tables in appendix C:
+ Table C.1.1 contains the ASCII code points, while Table C.1.2
+ contains the non-ASCII code points. Most profiles of this document
+ that want to prohibit space characters will want to include both
+ tables.
+
+ 0020; SPACE
+ 00A0; NO-BREAK SPACE
+ 1680; OGHAM SPACE MARK
+ 2000; EN QUAD
+ 2001; EM QUAD
+ 2002; EN SPACE
+ 2003; EM SPACE
+ 2004; THREE-PER-EM SPACE
+ 2005; FOUR-PER-EM SPACE
+ 2006; SIX-PER-EM SPACE
+ 2007; FIGURE SPACE
<<Diff was trimmed, longer than 597 lines>>
More information about the pld-cvs-commit
mailing list