icuSetCollate              package:base              R Documentation

_S_e_t_u_p _C_o_l_l_a_t_i_o_n _b_y _I_C_U

_D_e_s_c_r_i_p_t_i_o_n:

     Controls the way collation is done by ICU (an optional part of the
     R build).

_U_s_a_g_e:

     icuSetCollate(...)

_A_r_g_u_m_e_n_t_s:

     ...: Named arguments, see 'Details'.

_D_e_t_a_i_l_s:

     Optionally, R can be built to collate character strings by ICU
     (<URL: http://www.icu-project.org>).  For such systems,
     'icuSetCollate' can be used to tune the way collation is done. On
     other builds calling this function does nothing, with a warning.

     Possible arguments are

     '_l_o_c_a_l_e': A character string such as '"da_DK"' giving the country
          whose collation rules are to be used.  If present, this
          should be the first argument.

     '_c_a_s_e__f_i_r_s_t': '"upper"', '"lower"' or '"default"', asking for
          upper- or lower-case characters to be sorted first.  The
          default is usually lower-case first, but not in all languages
          (see the Danish example).

     '_a_l_t_e_r_n_a_t_e__h_a_n_d_l_i_n_g': Controls the handling of 'variable'
          characters (mainly punctuation and symbols). Possible values
          are '"non_ignorable"' (primary strength) and '"shifted"'
          (quaternary strength).

     '_s_t_r_e_n_g_t_h': Which components should be used?  Possible values
          '"primary"', '"secondary"', '"tertiary"' (default),
          '"quaternary"' and '"identical"'. 

     '_f_r_e_n_c_h__c_o_l_l_a_t_i_o_n': In a French locale the way accents affect
          collation is from right to left, whereas in most other
          locales it is from left to right.  Possible values '"on"',
          '"off"' and '"default"'.

     '_n_o_r_m_a_l_i_z_a_t_i_o_n': Should strings be normalized? Possible values
          '"on"' and '"off"' (default). This affects the collation of
          composite characters.

     '_c_a_s_e__l_e_v_e_l': An additional level between secondary and tertiary,
          used to distinguish large and small Japanese Kana characters.
          Possible values '"on"' and '"off"' (default).

     '_h_i_r_a_g_a_n_a__q_u_a_t_e_r_n_a_r_y': Possible values '"on"' (sort Hiragana first
          at quaternary level) and '"off"'.

     Only the first three are likely to be of interest except to those
     with a  detailed understanding of collation and specialized
     requirements.

     Some examples are 'case_level="on", strength="primary"' to ignore
     accent differences, 'alternate_handling="shifted"' to ignore space
     and punctuation characters.

_S_e_e _A_l_s_o:

     Comparison, 'sort'

     The ICU user guide chapter on collation (<URL:
     http://www.icu-project.org/userguide/Collate_Intro.html>).

_E_x_a_m_p_l_e_s:

     x <- c("Aarhus", "aarhus", "safe", "test", "Zoo")
     sort(x)
     icuSetCollate(case_first="upper"); sort(x)
     icuSetCollate(case_first="lower"); sort(x)

     icuSetCollate(locale="da_DK", case_first="default"); sort(x)
     icuSetCollate(locale="et_EE"); sort(x)

