Internationalization and Localization in Java - II

Ravindhra

Time is represented as a long integer measured in milliseconds since midnight Greenwich Mean Time (GMT) January 1, 1970. This starting point for time measurement is known as the epoch. This value is signed, so negative values signify time before the beginning of the epoch. The System.currentTimeMillis method returns the current time. This value will express dates into the year A.D. 292,280,995, which should suffice for most purposes.

You can use java.util.Date to hold a time and perform some simple time-related operations. When a new Date object is created, you can specify a long value for its time. If you use the no-arg constructor, the Date object will mark the time of its creation. A Date object can be used for simple operations. For example, the simplest program to print the current time is

import java.util.Date;

 


class Date2 { public static void main(String[] args) { Date now = new Date(); System.out.println(now); } }

This program will produce output such as the following:

Sun Mar 20 08:48:38 GMT+10:00 2005

Note that this is not localized output. No matter what the default locale, the date will be in this format, adjusted for the current time zone.

You can compare two dates with the before and after methods, which return TRue if the object on which they are invoked is before or after the other date. Or you can compare the long values you get from invoking getTime on the two objects. The method setTime lets you change the time to a different long.

The Date class provides no support for localization and has effectively been replaced by the more sophisticated and locale-sensitive Calendar and DateFormat classes.

Calendars

Calendars mark the passage of time. Most of the world uses the same calendar, commonly called the Gregorian calendar after Pope Gregory XIII, under whose auspices it was first instituted. Many other calendars exist in the world, and the calendar abstractions are designed to express such variations. A given moment in time is expressed as a date according to a particular calendar, and the same moment can be expressed as different dates by different calendars. The calendar abstraction is couched in the following form:

An abstract Calendar class that represents various ways of marking time

An abstract TimeZone class that represents time zone offsets and other adjustments, such as daylight saving time

An abstract java.text.DateFormat class that defines how one can format and parse date and time strings

Because the Gregorian calendar is commonly used, you also have the following concrete implementations of the abstractions:

A GregorianCalendar class

A SimpleTimeZone class for use with GregorianCalendar

A java.text.SimpleDateFormat class that formats and parses Gregorian dates and times

For example, the following code creates a GregorianCalendar object representing midnight (00:00:00), October 26, 1972, in the local time zone, then prints its value:

Calendar cal =
  new GregorianCalendar(1972, Calendar.OCTOBER, 26);
  System.out.println(cal.getTime());

The method getTime returns a Date object for the calendar object's time, which was set by converting a year, month, and date into a millisecond-measured long. The output would be something like this (depending on your local time zone of course):

Thu Oct 26 00:00:00 GMT+10:00 1972

You can also work directly with the millisecond time value by using getTimeInMillis and setTimeInMillis. These are equivalent to working with a Date object; for example, getTimeInMillis is equivalent to invoking getTime().getTime().

The abstract Calendar class provides a large set of constants that are useful in many calendars, such as Calendar.AM and Calendar.PM for calendars that use 12-hour clocks. Some constants are useful only for certain calendars, but no calendar class is required to use such constants. In particular, the month names in Calendar (such as Calendar.JUNE) are names for the various month numbers (such as 5month numbers start at 0), with a special month UNDECIMBER for the thirteenth month that many calendars have. But no calendar is required to use these constants.

Each Calendar object represents a particular moment in time on that calendar. The Calendar class provides only constructors that create an object for the current time, either in the default locale and time zone or in specified ones.

Calendar objects represent a moment in time, but they are not responsible for displaying the date. That locale-sensitive procedure is the job of the DateFormat class, which will soon be described.

You can obtain a calendar object for a locale by invoking one of the static Calendar.getInstance methods. With no arguments, getInstance returns an object of the best available calendar type (currently only GregorianCalendar) for the default locale and time zone, set to the current time. The other overloads allow you to specify the locale, the time zone, or both. The static getAvailableLocales method returns an array of Locale objects for which calendars are installed on the system.

With a calendar object in hand, you can manipulate the date. The following example prints the next week of days for a given calendar object:

public static void oneWeek(PrintStream out,  Calendar cal) {
  Calendar cur = (Calendar) cal.clone(); //modifiable copy
  int dow = cal.get(Calendar.DAY_OF_WEEK);
  do {
  out.println(cur.getTime());
  cur.add(Calendar.DAY_OF_WEEK, 1);
  }  while (cur.get(Calendar.DAY_OF_WEEK) != dow);
  }

First we make a copy of the calendar argument so that we can make changes without affecting the calendar we were passed.[1] Instead of assuming that there are seven days in a week (who knows what kind of calendar we were given?), we loop, printing the time and adding one day to that time, until we have printed a week's worth of days. We detect whether a week has passed by looking for the next day whose "day of the week" is the same as that of the original object.

[1] For historical reasons Calendar.clone returns Object not Calendar, so a cast is required.

The Calendar class defines many kinds of calendar fields for calendar objects, such as DAY_OF_WEEK in the preceding code. These calendar fields are constants used in the methods that manipulate parts of the time:

MILLISECOND
SECOND
MINUTE
HOUR
HOUR_OF_DAY
AM_PM
DAY_OF_WEEK
DAY_OF_WEEK_IN_MONTH
DAY_OF_MONTH
DATE
DAY_OF_YEAR
WEEK_OF_MONTH
WEEK_OF_YEAR
MONTH
YEAR
ERA
ZONE_OFFSET
DST_OFFSET
FIELD_COUNT

An int is used to store values for all these calendar field types. You use these constantsor any others defined by a particular calendar classto specify a calendar field to the following methods (always as the first argument):

get Returns the value of the field

set Sets the value of the field to the provided int

clear Clears the value of the field to "unspecified"

isSet Returns true if the field has been set

add Adds an int amount to the specified field

roll Rolls the field up to the next value if the second boolean argument is true, or down if it is false

getMinimum Gets the minimum valid value for the field

getMaximum Gets the maximum valid value for the field

getGreatestMinimum Gets the highest minimum value for the field; if it varies, this can be different from getMinimum

getLeastMaximum Gets the smallest maximum value for the field; if it varies, this can be different from getMaximum

The greatest minimum and least maximum describe cases in which a value can vary within the overall boundaries. For example, the least maximum value for DAY_OF_MONTH on the Gregorian calendar is 28 because February, the shortest month, can have as few as 28 days. The maximum value is 31 because no month has more than 31 days.

The set method allows you to specify a date by certain calendar fields and then calculate the time associated with that date. For example, you can calculate on which day of the week a particular date falls:

public static int dotw(int year, int month,  int date) {
  Calendar cal = new GregorianCalendar();
  cal.set(Calendar.YEAR, year);
  cal.set(Calendar.MONTH, month);
  cal.set(Calendar.DATE, date);
  return cal.get(Calendar.DAY_OF_WEEK);
  }

The method dotw calculates the day of the week on the Gregorian calendar for the given date. It creates a Gregorian calendar object, sets the date fields for year, month, and day, and returns the resulting day of the week.

The clear method can be used to reset a field's value to be unspecified. You can use clear with no parameters to clear all calendar fields. The isSet method returns true if a field currently has a value set.

Three variants of set change particular fields you commonly need to manipulate, leaving unspecified fields alone:

public void set(int year, int month, int date)
public void set(int year, int month, int date, int hrs, int min)
public void set(int year, int month, int date, int hrs, int min, int sec)

You can also use setTime to set the calendar's time from a Date object.

A calendar field that is out of range can be interpreted correctly. For example, January 32 can be equivalent to February 1. Whether it is treated as such or as an error depends on whether the calendar is considered to be lenient. A lenient calendar will do its best to interpret values as valid. A strict (non-lenient) calendar will not accept any values out of range, throwing IllegalArgumentException. The setLenient method takes a boolean that specifies whether parsing should be lenient; isLenient returns the current setting.

A week can start on any day, depending on the calendar. You can discover the first day of the week with the method getFirstDayOfWeek. In a Gregorian calendar for the United States this method would return SUNDAY, whereas Ireland uses MONDAY. You can change this by invoking setFirstDayOfWeek with a valid weekday index.

Some calendars require a minimum number of days in the first week of the year. The method getMinimalDaysInFirstWeek returns that number; the method setMinimalDaysInFirstWeek lets you change it. The minimum number of days in a week is important when you are trying to determine in which week a particular date falls for example, in some calendars, if January 1 is a Friday it may be considered part of the last week of the preceding year.

You can compare two Calendar objects by using compareTo since Calendar implements Comparable. If you prefer, you can use the before and after methods to compare the objects.

Time Zones

TimeZone is an abstract class that encapsulates not only offset from GMT but also other offset issues, such as daylight saving time. As with other locale-sensitive classes, you can get the default TimeZone by invoking the static method geTDefault. You can change the default time zone by passing setDefault a new TimeZone object to use or null to reset to the original default time zone. Time zones are understood by particular calendar types, so you should ensure that the default calendar and time zone are compatible.

 

Each time zone has a string identifier that is interpreted by the time zone object and can be displayed to the user. These identifiers use a long form consisting of a major and minor regional name, separated by '/'. For example, the following are all valid time zone identifiers: America/New_York, Australia/Brisbane, Africa/Timbuktu. Many time zones have a short form identifier often just a three letter acronymsome of which are recognized by TimeZone for backward compatibility. You should endeavor to always use the long formafter all, while many people know that EST stands for "Eastern Standard Time," that doesn't tell you for which country. TimeZone also recognizes generic identifiers expressed as the difference in time from GMT. For example, GMT+10:00 and GMT-4:00 are both valid generic time zone identifiers. You can get an array of all the identifiers available on your system from the static method getAvailableIDs. If you want only those for a given offset from GMT, you can invoke getAvailableIDs with that offset. An offset might, for example, have identifiers for both daylight saving and standard time zones.

You can find the identifier of a given TimeZone object from getID, and you can set it with setID. Setting the identifier changes only the identifier on the time zoneit does not change the offset or other values. You can get the time zone for a given identifier by passing it to the static method getTimeZone.

A time zone can be converted into a displayable form by using one of the getdisplayName methods, similar to those of Locale. These methods allow you to specify whether to use the default locale or a specified one, and whether to use a short or long format. The string returned by the display methods is controlled by a DateFormat object. These objects maintain their own tables of information on how to format different time zones. On a given system they may not maintain information for all the supported time zones, in which case the generic identifier form is used.

Each time zone has a raw offset from GMT, which can be either positive or negative. You can get or set the raw offset by using getrawOffset or set RawOffset, but you should rarely need to do this.

Daylight saving time supplements the raw offset with a seasonal time shift. The value of this shift can be obtained from getdSTSavingsthe default implementation returns 3,600,000 (the number of milliseconds in an hour). You can ask whether a time zone ever uses daylight saving time during the year by invoking the method useDaylightTime, which returns a boolean. The method inDaylightTime returns true if the Date argument you pass would fall inside daylight saving time in the zone.

You can obtain the exact offset for a time zone on a given date by specifying that date in milliseconds or by using calendar fields to specify the year and month and so on.

public int getOffset(long date)

Returns the offset from GMT for the given time in this time zone, taking any daylight saving time offset into account

public abstract int getOffset(int era, int year, int month, int day, int dayOfWeek, int milliseconds)

Returns the offset from GMT for the given time in this time zone, taking any daylight saving time offset into account. All parameters are interpreted relative to the calendar for which the particular time zone implementation is designed. The era parameter represents calendar-specific eras, such as B.C. and A.D. in the Gregorian calendar.

GregorianCalendar and SimpleTimeZone

The GregorianCalendar class is a concrete subclass of Calendar that reflects UTC (Coordinated Universal Time), although it cannot always do so exactly. Imprecise behavior is inherited from the time mechanisms of the underlying system.[2] Parts of a date are specified in UTC standard units and ranges. Here are the ranges for GregorianCalendar:

[2] Almost all modern systems assume that one day is 24*60*60 seconds. In UTC, about once a year an extra second, called a leap second, is added to a day to account for the wobble of the Earth. Most computer clocks are not accurate enough to reflect this distinction, so neither is the Date class. Some computer standards are defined in GMT, which is the "civil" name for the standard; UT is the scientific name for the same standard. The distinction between UTC and UT is that UT is based on an atomic clock and UTC is based on astronomical observations. For almost all practical purposes, this is an invisibly fine hair to split.

YEAR 1292278994

MONTH 011

DATE Day of the month, 131

HOUR_OF_DAY 023

MINUTE 059

SECOND 059

MILLISECOND 0999

The GregorianCalendar class supports several constructors:

public GregorianCalendar()

Creates a GregorianCalendar object that represents the current time in the default time zone with the default locale.

public GregorianCalendar(int year, int month, int date, int hrs, int min, int sec)

Creates a GregorianCalendar object that represents the given date in the default time zone with the default locale.

public GregorianCalendar(int year, int month, int date, int hrs, int min)

Equivalent to GregorianCalendar(year,month, date,hrs, min,0) that is, the beginning of the specified minute.

public GregorianCalendar(int year, int month, int date)

Equivalent to GregorianCalendar(year,month, date,0, 0,0) that is, midnight on the given date (which is considered to be the start of the day).

public GregorianCalendar(Locale locale)

Creates a GregorianCalendar object that represents the current time in the default time zone with the given locale.

public GregorianCalendar(TimeZone timeZone)

Creates a GregorianCalendar object that represents the current time in the given timeZone with the default locale.

public GregorianCalendar(TimeZone zone, Locale locale)

Creates a GregorianCalendar object that represents the current time in the given timeZone with the given locale.

In addition to the methods it inherits from Calendar, GregorianCalendar provides an isLeapYear method that returns TRue if the passed in year is a leap year in that calendar.

The Gregorian calendar was preceded by the Julian calendar in many places. In a GregorianCalendar object, the default date at which this change happened is midnight local time on October 15, 1582. This is when the first countries switched, but others changed later. The getGregorianChange method returns the time the calendar is currently using for the change as a Date. You can set a calendar's change-over time by using setGregorianChange with a Date object.

The SimpleTimeZone class is a concrete subclass of TimeZone that expresses values for Gregorian calendars. It does not handle historical complexities, but instead projects current practices onto all times. For historical dates that precede the use of daylight saving time, for example, you will want to use a calendar with a time zone you have selected that ignores daylight saving time. For future dates, SimpleTimeZone is probably as good a guess as any.

Date and time formatting is a separate issue from calendars, although they are closely related. Formatting is localized in a different way. Not only are the names of days and months different in different locales that share the same calendar, but also the order in which a dates' components are expressed changes. In the United States it is customary in short dates to put the month before the date, so that July 5 is written as 7/5. In many European countries the date comes first, so 5 July becomes 5/7 or 5.7 or …

In the previous sections the word "date" meant a number of milliseconds since the epoch, which could be interpreted as year, month, day-of-month, hours, minutes, and seconds information. When dealing with the formatting classes you must distinguish between dates, which deal with year, month, and day-of-month information, and times, which deal with hours, minutes, and seconds.

Date and time formatting issues are text issues, so the classes for formatting are in the java.text packagethough the java.util.Formatter class also supports some localized date formatting.

DateFormat provides several ways to format and parse dates and times. It is a subclass of the general Format class, discussed earlier in this article. There are three kinds of formatters, each returned by different static methods: date formatters from geTDateInstance, time formatters from getTimeInstance, and date/time formatters from getdateTimeInstance. Each of these formatters understands four formatting styles: SHORT, MEDIUM, LONG, and FULL, which are constants defined in DateFormat. And for each of them you can either use the default locale or specify one. For example, to get a medium date formatter in the default locale, you would use

Format fmt = DateFormat.getDateInstance(DateFormat.MEDIUM);


To get a date and time formatter that uses dates in short form and times in full form in a Japanese locale, you would use

Locale japan = new Locale("jp",  "JP");
  Format fmt =  DateFormat.getDateTimeInstance(
  DateFormat.SHORT,  DateFormat.FULL, japan
  );

For all the various "get instance" methods, if both formatting style and locale are specified the locale is the last parameter. The date/time methods require two formatting styles: the first for the date part, the second for the time. The simplest getInstance method takes no arguments and returns a date/time formatter for short formats in the default locale. The getAvailableLocales method returns an array of Locale objects for which date and time formatting is configured.

The following list shows how each formatting style is expressed for the same date. The output is from a date/time formatter for U.S. locales, with the same formatting mode used for both dates and times:

FULL:    Friday, August 29, 1986 5:00:00 PM EDT
LONG:    August 29, 1986 5:00:00 PM EDT
MEDIUM:  Aug 29, 1986 5:00:00 PM
SHORT:   8/29/86 5:00 PM

Each DateFormat object has an associated calendar and time zone set by the "get instance" method that created it. They are returned by getCalendar and getTimeZone, respectively. You can set these values by using setCalendar and setTimeZone. Each DateFormat object has a reference to a NumberFormat object for formatting numbers. You can use the methods getNumberFormat and setNumberFormat.

You format dates with one of several format methods based on the formatting parameters described earlier:

public final String format(Date date)

Returns a formatted string for date.

public abstract StringBuffer format(Date date, StringBuffer appendTo, FieldPosition pos)

Adds the formatted string for date to the end of appendTo.

public abstract StringBuffer format(Object obj, StringBuffer appendTo, FieldPosition pos)

Adds the formatted string for obj to the end of appendTo. The object can be either a Date or a Number whose longValue is a time in milliseconds.

The pos argument is a FieldPosition object that tracks the starting and ending index for a specific field within the formatted output. You create a FieldPosition object by passing an integer code that represents the field that the object should track. These codes are static fields in DateFormat, such as MINUTE_FIELD or MONTH_FIELD. Suppose you construct a FieldPosition object pos with MINUTE_FIELD and then pass it as an argument to a format method. When format returns, the getBeginIndex and getEndIndex methods of pos will return the start and end indices of the characters representing minutes within the formatted string. A specific formatter could also use the FieldPosition object to align the represented field within the formatted string. To make that happen, you would first invoke the setBeginIndex and setEndIndex methods of pos, passing the indices where you would like that field to start and end in the formatted string. Exactly how the formatter aligns the formatted text depends on the formatter implementation.

A DateFormat object can also be used to parse dates. Date parsing can be lenient or not, depending on your preference. Lenient date parsing is as forgiving as it can be, whereas strict parsing requires the format and information to be proper and complete. The default is to be lenient. You can use setLenient to set leniency to be true or false. You can test leniency via isLenient.

The parsing methods are

public Date parse(String text) throws ParseException

Tries to parse text into a date and/or time. If successful, a Date object is returned; otherwise, a ParseException is thrown.

public abstract Date parse(String text, ParsePosition pos)

Tries to parse text into a date and/or time. If successful, a Date object is returned; otherwise, returns a null reference. When the method is called, pos is the position at which to start parsing; at the end it will either be positioned after the parsed text or will remain unchanged if an error occurred.

public Object parseObject(String text, ParsePosition pos)

Returns the result of parse(text,pos) . This method is provided to fulfill the generic contract of Format.

The class java.text.SimpleDateFormat is a concrete implementation of DateFormat that is used in many locales. If you are writing a DateFormat class, you may find it useful to extend SimpleDateFormat. SimpleDateFormat uses methods in the DateFormatSymbols class to get localized strings and symbols for date representation. When formatting or parsing dates, you should usually not create SimpleDateFormat objects; instead, you should use one of the "get instance" methods to return an appropriate formatter.

DateFormat has protected fields calendar and numberFormat that give direct access to the values publicly manipulated with the set and get methods.

Using Formatter with Dates and Times

The java.util.Formatter class, supports the formatting of date and time information using a supplied Date or Calendar object, or a date represented as a long (or Long). Using the available format conversions you can extract information about that date/time, including things like the day of the month, the day of the week, the year, the hour of the day, and so forth.

The output of the formatter is localized according to the locale associated with that formatter, so things like the name of the day and month will be in the correct languagehowever, digits themselves are not localized. Unlike DateFormat, a formatter cannot help you with localization issues such as knowing whether the month or the day should come first in a date it simply provides access to each individual component and your program must combine them in the right way.

A date/time conversion is indicated by a format conversion of t (or T for uppercase output), followed by various suffixes that indicate what is to be output and in what form. The following table lists the conversion suffixes related to times:

H Hour of the day for 24-hour clock format. Two digits: 0023

I Hour of the day for 12-hour clock format. Two digits: 0112

k Hour of the day for 24-hour clock format: 023

l Hour of the day for 12-hour clock format: 112

M Minute within the hour. Two digits: 0059

S Seconds within the minute. Two digits: 0060 (60 is a leap second)

L Milliseconds within the second. Three digits: 000999

N Nanoseconds within the second. Nine digits: 000000000999999999

p Locale specific AM or PM marker.

z Numeric offset from GMT (as per RFC 822). E.g. +1000

Z String representing the abbreviation for the time zone

s Seconds since the epoch.

Q Milliseconds since the epoch.

So, for example, the following code will print out the current time in the familiar hh:mm:ss format:

System.out.printf("%1$tH:%1$tM:%1$tS %n", new Date());

The conversion suffixes that deal with dates are

B Full month name

b Abbreviated month name

h Same as 'b'

A Full name of the day of the week

a Short name of the day of the week

C The four digit year divided by 100. Two digits: 0099

Y Year. Four digits: 00009999

y Year: Two digits: 0099

j Day of the year. Three digits: 001999

m Month in year. Two digits: 0199

d Day of month. Two digits: 0199

e Day of month: 199

Naturally, the valid range for day of month, month of year, and so forth, depends on the calendar that is being used. To continue the example, the following code will print the current date in the common mm/dd/yy format:

System.out.printf("%1$tm/%1$td/%1$ty %n", new Date());

As you can see, all the information about a date or time can be extracted and you can combine the pieces in whatever way you need. Doing so, however, is rather tedious both for the writer and any subsequent readers of the code. To ease the tedium a third set of conversion suffixes provides convenient shorthands for common combinations of the other conversions:

R Time in 24-hour clock hh:mm format ("%tH:%tM")
T Time in 24-hour clock hh:mm:ss format ("%tH:%tM:%tS")
r Time in 12-hour clock h:mm:ss am/pm format ("%tI:%tM:%tS %Tp")
D Date in mm/dd/yy format ("%tm/%td/%ty")
F Complete date in ISO 8601 format ("%tY-%tm-%TD")
c Long date and time format ("%ta %tb %td %tT %tZ %tY")

So the previous examples could be combined in the more compact and somewhat more readable

System.out.printf("%1$tT %1$tD %n", new Date());

As with all format conversions a width can be specified before the conversion indicator, to specify the minimum number of characters to output. If the converted value is smaller than the width then the output is padded with spaces. The only format flag that can be specified with the date/time conversions is the '' flag for left-justification if this flag is given then a width must be supplied as well.

Internationalization and Localization for Text

The package java.text provides several types for localizing text behavior, such as collation (comparing strings), and formatting and parsing text, numbers, and dates. You have already learned about dates in detail so in this section we look at general formatting and parsing, and collation.

Collation

Comparing strings in a locale-sensitive fashion is called collation. The central class for collation is Collator, which provides a compare method that takes two strings and returns an int less than, equal to, or greater than zero as the first string is less than, equal to, or greater than the second.

As with most locale-sensitive classes, you get the best available Collator object for a locale from a getInstance method, either passing a specific Locale object or specifying no locale and so using the default locale. For example, you get the best available collator to sort a set of Russian-language strings like this:

Locale russian = new Locale("ru", "");
Collator coll = Collator.getInstance(russian);

You then can use coll.compare to determine the order of strings. A Collator object takes locality not Unicode equivalence into account when comparing. For example, in a French-speaking locale, the characters ç and c are considered equivalent for sorting purposes. A naïve sort that used String.compare would put all strings starting with ç after all those starting with c (indeed, it would put them after z), but in a French locale this would be wrong. They should be sorted according to the characters that follow the initial c or ç characters in the strings.

Determining collation factors for a string can be expensive. A CollationKey object examines a string once, so you can compare precomputed keys instead of comparing strings with a Collator. The method Collator.getCollationKey returns a key for a string. For example, because Collator implements the interface Comparator, you could use a Collator to maintain a sorted set of strings:

class CollatorSorting {
  private  TreeSet<String> sortedStrings;

 


    CollatorSorting(Collator collator) {
        sortedStrings = new TreeSet<String>(collator);
}

    void add(String str) { sortedStrings.add(str); }

    Iterator<String> strings() { return sortedStrings.iterator();

    }
}

Each time a new string is inserted in sortedStrings, the Collator is used as a Comparator, with its compare method invoked on various elements of the set until the TReeSet finds the proper place to insert the string. This results in several comparisons. You can make this quicker at the cost of space by creating a treeMap that uses a CollationKey to map to the original string. CollationKey implements the interface Comparable with a compareTo method that can be much more efficient than using Collator.compare.

class CollationKeySorting { private TreeMap<CollationKey, String> sortedStrings; private Collator collator;

    CollationKeySorting(Collator collator) { this.collator = collator; sortedStrings = new TreeMap<CollationKey, String>(); }

    void add(String str) {
sortedStrings.put( collator.getCollationKey(str), str); }

Iterator<String> strings() { return sortedStrings.values().iterator(); } }

Formatting and Parsing

The abstract Format class provides methods to format and parse objects according to a locale. Format declares a format method that takes an object and returns a formatted String, tHRowing IllegalArgumentException if the object is not of a type known to the formatting object. Format also declares a parseObject method that takes a String and returns an object initialized from the parsed data, throwing ParseException if the string is not understood. Each of these methods is implemented as appropriate for the particular kind of formatting. The package java.text provides three Format subclasses:

DateFormat was discussed in the previous section.

MessageFormat helps you localize output when printing messages that contain values from your program. Because word order varies among languages, you cannot simply use a localized string concatenated with your program's values. For example, the English phrase "a fantastic menu" would in French have the word order "un menu fantastique." A message that took adjectives and nouns from lists and displayed them in such a phrase could use a MessageFormat object to localize the order.

NumberFormat is an abstract class that defines a general way to format and parse various kinds of numbers for different locales. It has two subclasses: ChoiceFormat to choose among alternatives based on number (such as picking between a singular or plural variant of a word); and DecimalFormat to format and parse decimal numbers. (The formatting capabilities of NumberFormat are more powerful than those provided by java.util.Formatter.)

NumberFormat in turn has four different kinds of "get instance" methods. Each method uses either a provided Locale object or the default locale.

getNumberInstance returns a general number formatter/parser. This is the kind of object returned by the generic getInstance method.

getIntegerInstance returns a number formatter/parser that rounds floating-point values to the nearest integer.

getCurrencyInstance returns a formatter/parser for currency values. The Currency object used by a NumberFormatter can also be retrieved with the getCurrency method.

getPercentInstance returns a formatter/parser for percentages.

Here is a method you can use to print a number using the format for several different locales:

public void reformat(double num, String[] locales) {
  for  (String loc : locales) {
  Locale  pl = parseLocale(loc);
  NumberFormat fmt = NumberFormat.getInstance(pl);
  System.out.print(fmt.format(num));
  System.out.println("\t" + pl.getDisplayName());
  }
  }

 

public static Locale parseLocale(String desc) { StringTokenizer st = new StringTokenizer(desc, "_"); String lang = "", ctry = "", var = ""; try { lang = st.nextToken(); ctry = st.nextToken(); var = st.nextToken(); } catch (java.util.NoSuchElementException e) { ; // fine, let the others default } return new Locale(lang, ctry, var);
}

The first argument to reformat is the number to format; the other arguments specify locales. We use a StringTokenizer to break locale argument strings into constituent components. For example, cy_GB will be broken into the language cy (Welsh), the country GB (United Kingdom), and the empty variant "". We create a Locale object from each result, get a number formatter for that locale, and then print the formatted number and the locale. When run with the number 5372.97 and the locale arguments en_US, lv, it_CH, and lt, reformat prints:

5,372.97        English (United States)
5 372,97        Latvian
5'372.97        Italian (Switzerland)
5.372,97        Lithuanian

A similar method can be written that takes a locale and a number formatted in that locale, uses the parse method to get a Number object, and prints the resulting value formatted according to a list of other locales:

public void parseAndReformat(String locale, String  number,
  String[] locales)
  throws  ParseException
  {
Locale loc = LocalNumber.parseLocale(locale); NumberFormat parser = NumberFormat.getInstance(loc); Number num = parser.parse(number); for (String str : locales) { Locale pl = LocalNumber.parseLocale(str); NumberFormat fmt = NumberFormat.getInstance(pl); System.out.println(fmt.format(num)); } }
When run with the original locale it_CH, the number string "5'372.97" and the locale arguments en_US, lv, and lt, parseAndReformat prints:

 

5,372.97
5 372,97
5.372,97

Text Boundaries

Parsing requires finding boundaries in text. The class BreakIterator provides a locale-sensitive tool for locating such break points. It has four kinds of "get instance" methods that return specific types of BreakIterator objects:

getCharacterInstance returns an iterator that shows valid breaks in a string for individual characters (not necessarily a char).

getWordInstance returns an iterator that shows word breaks in a string.

getLineInstance returns an iterator that shows where it is proper to break a line in a string, for purposes such as wrapping text.

getSentenceInstance returns an iterator that shows where sentence breaks occur in a string.

The following code prints each break shown by a given BreakIterator:

static void showBreaks(BreakIterator breaks, String  str) {
  breaks.setText(str);
  int start  = breaks.first();
  int end = breaks.next();
  while (end  != BreakIterator.DONE) {
  System.out.println(str.substring(start, end));
  start  = end;
  end =  breaks.next();
  }
System.out.println(str.substring(start)); // the last }

A BreakIterator is a different style of iterator from the usual java.util.Iterator objects you have seen. It provides several methods for iterating forward and backward within a string, looking for different break positions.

You should always use these boundary classes when breaking up text because the issues involved are subtle and widely varying. For example, the logical characters used in these classes are not necessarily equivalent to a single char. Unicode characters can be combined, so it can take more than one 16-bit Unicode value to constitute a logical character. And word breaks are not necessarily spaces some languages do not even use spaces.

This concludes our discussion on Localization using Java.








}