|PREV PACKAGE NEXT PACKAGE FRAMES NO FRAMES|
This file contains all the changes in documentation in the package
java.textas colored differences. Deletions are shown
like this, and additions are shown like this.
If no deletions or additions are shown in an entry, the HTML tags will be what has changed. The new HTML tags are shown in the differences. If no documentation existed, and then some was added in a later version, this change is noted in the appropriate class pages of differences, but the change is not shown on this page. Only changes in existing text are shown here. Similarly, documentation which was inherited from another class or interface is not shown here.
Note that an HTML error in the new documentation may cause the display of other documentation changes to be presented incorrectly. For instance, failure to close a <code> tag will cause all subsequent paragraphs to be displayed differently.
Create Bidi from the given paragraph of text.
The RUN_DIRECTION attribute in the text if present determines the base direction (left-to-right or right-to-left). If not present the base direction is computes using the Unicode Bidirectional Algorithm defaulting to left-to-right if there are no strong directional characters in the text. This attribute if present must be applied to all the text in the paragraph.
The BIDI_EMBEDDING attribute in the text if present represents embedding level information. Negative values from -1 to -62 indicate overrides at the absolute value of the level. Positive values from 1 to 62 indicate embeddings. Where values are zero or not defined the base embedding level as determined by the base direction is assumed.
The NUMERIC_SHAPING attribute in the text if present converts European digits to other decimal digits before running the bidi algorithm. This attribute if present must be applied to all the text in the paragraph. @param paragraph a paragraph of text with optional character and paragraph attribute information @see TextAttribute
.#BIDI_EMBEDDING @see TextAttribute .#NUMERIC_SHAPING @see TextAttribute .#RUN_DIRECTION
This class represents the set of symbols (such as the decimal separator the grouping separator and so on) needed by
DecimalFormatto format numbers.
DecimalFormatcreates for itself an instance of
DecimalFormatSymbolsfrom its locale data. If you need to change any of these symbols you can get the
DecimalFormatSymbolsobject from your
DecimalFormatand modify it. @see java.util.Locale @see DecimalFormat @version 1.
35 1237 01/ 0316/ 0102 @author Mark Davis @author Alan Liu
RuleBasedCollatorclass is a concrete subclass of
Collatorthat provides a simple data-driven table collator. With this class you can create a customized table-based
RuleBasedCollatormaps characters to sort keys.
RuleBasedCollatorhas the following restrictions for efficiency (other subclasses may be used for more complex languages) :
- If a special collation rule controlled by a <modifier> is specified it applies to the whole collator object.
- All non-mentioned
Unicodecharacters are at the end of the collation order.
The collation table is composed of a list of collation rules where each rule is of one of three forms:<modifier> <relation> <text-argument> <reset> <text-argument>The definitions of the rule elements is as follows:
- Text-Argument: A text-argument is any sequence of characters excluding special characters (that is common whitespace characters [0009-000D 0020] and rule syntax characters [0021-002F 003A-0040 005B-0060 007B-007E]). If those characters are desired you can put them in single quotes (e.g. ampersand => '&'). Note that unquoted white space characters are ignored; e.g.
b cis treated as
- Modifier: There are currently two modifiers that turn on special collation rules.
- '@' : Turns on backwards sorting of accents (secondary differences) as in French.
- ' ' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the consonant for collation purposes.
'@' : Indicates that accents are sorted backwards as in French.
- Relation: The relations are the following:
- '<' : Greater as a letter difference (primary)
- ';' : Greater as an accent difference (secondary)
- ' ' : Greater as a case difference (tertiary)
- '=' : Equal
- Reset: There is a single reset which is used primarily for contractions and expansions but which can also be used to add a modification at the end of a set of rules.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example the following are equivalent ways of expressing the same thing:Notice that the order is important as the subsequent item goes immediately after the text-argument. The following are not equivalent:a < b < c a < b & b < c a < c & a < bEither the text-argument must already be present in the sequence or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case "ae" is not entered and treated as a single character; instead "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d") while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a A < b B ... &ae;\u00e3&AE;\u00c3"). [\u00e3 and \u00c3 are of course the escape sequences for a-umlaut.]a < b & a < c a < c & a < b
For ignorable characters the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If however the first relation is not "<" then all the all text-arguments up to the first "<" are ignorable. For example " - < a < b" makes "-" an ignorable character as we saw earlier in the word "black-birds". In the samples for different languages you see that most accents are ignorable.
Normalization and Accents
RuleBasedCollatorautomatically processes its rule table to include both pre-composed and combining-character versions of accented characters. Even if the provided rule string contains only base characters and separate combining accent characters the pre-composed accented characters matching all canonical combinations of characters from the rule string will be entered in the table.
This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. There are two caveats however. First if the strings to be collated contain combining sequences that may not be in canonical order you should set the collator to CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of combining sequences. Second if the strings contain characters with compatibility decompositions (such as full-width and half-width forms) you must use FULL_DECOMPOSITION since the rule tables only include canonical mappings.
For more information see The Unicode Standard Version 2.0.)
The following are errors:
If you produce one of these errors a
- A text-argument contains unquoted punctuation symbols (e.g. "a < b-c < d").
- A relation or reset character not followed by a text-argument (e.g. "a < b").
- A reset where the text-argument (or an initial substring of the text-argument) is not already in the sequence. (e.g. "a < b & e < f")
Simple: "< a < b < c < d"
Norwegian: "< a A< b B< c C< d D< e E< f F< g G< h H< i I< j J < k K< l L< m M< n N< o O< p P< q Q< r R< s S< t T < u U< v V< w W< x X< y Y< z Z < \u00E5=a\u030A \u00C5=A\u030A ;aa AA< \u00E6 \u00C6< \u00F8 \u00D8"
Normally to create a rule-based Collator object you will use
Collator's factory method
getInstance. However to create a rule-based Collator object with specialized rules tailored to your needs you construct the
RuleBasedCollatorwith the rules contained in a
Stringobject. For example:Or:String Simple = "< a< b< c< d"; RuleBasedCollator mySimple = new RuleBasedCollator(Simple);String Norwegian = "< a A< b B< c C< d D< e E< f F< g G< h H< i I< j J" + "< k K< l L< m M< n N< o O< p P< q Q< r R< s S< t T" + "< u U< v V< w W< x X< y Y< z Z" + "< \u00E5=a\u030A \u00C5=A\u030A" + ";aa AA< \u00E6 \u00C6< \u00F8 \u00D8"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
Collators is as simple as concatenating strings. Here's an example that combines two
Collators from two different locales:// Create an en_US Collator object RuleBasedCollator en_USCollator = (RuleBasedCollator) Collator.getInstance(new Locale("en" "US" "")); // Create a da_DK Collator object RuleBasedCollator da_DKCollator = (RuleBasedCollator) Collator.getInstance(new Locale("da" "DK" "")); // Combine the two // First get the collation rules from en_USCollator String en_USRules = en_USCollator.getRules(); // Second get the collation rules from da_DKCollator String da_DKRules = da_DKCollator.getRules(); RuleBasedCollator newCollator = new RuleBasedCollator(en_USRules + da_DKRules); // newCollator has the combined rules
Another more interesting example would be to make changes on an existing table to create a new
Collatorobject. For example add "&C< ch cH Ch CH" to the
en_USCollatorobject to create your own:// Create a new Collator object with additional rules String addRules = "&C< ch cH Ch CH"; RuleBasedCollator myCollator = new RuleBasedCollator(en_USCollator + addRules); // myCollator contains the new rules
The following example demonstrates how to change the order of non-spacing accents// old rule String oldRules = "=\u0301;\u0300;\u0302;\u0308" // main accents + ";\u0327;\u0303;\u0304;\u0305" // main accents + ";\u0306;\u0307;\u0309;\u030A" // main accents + ";\u030B;\u030C;\u030D;\u030E" // main accents + ";\u030F;\u0310;\u0311;\u0312" // main accents + "< a A ; ae AE ; \u00e6 \u00c6" + "< b B < c C < e E & C < d D"; // change the order of accent characters String addOn = "& \u0300 ; \u0308 ; \u0302"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
The last example shows how to put new primary ordering in before the default setting. For example in Japanese
Collatoryou can either sort English characters before or after Japanese characters@see Collator @see CollationElementIterator @version 1.25 07/24/98 @author Helena Shih Laura Werner Richard Gillam// get en_US Collator rules RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator.getInstance(Locale.US); // add a few Japanese character to sort before English characters // suppose the last character before the first base letter 'a' in // the English collation rule is \u2212 String jaString = "& \u2212 < \u3041 \u3042 < \u3043 \u3044"; RuleBasedCollator myJapaneseCollator = new RuleBasedCollator(en_USCollator.getRules() + jaString);