uconv
The uconv tool is a command line executable that converts text files from one encoding standard to another encoding standard. For example, you can specify XML files, log files, or any other file type as input and output files.
The utility allows the use of callbacks to handle invalid characters in the input, or characters that cannot be transcoded to the destination encoding. Such callbacks perform a specific action whenever such a character is encountered in the input. For example, you can use a specific substitution character to represent all problem characters, or replace the problem character with a string.
You can specify a general callback for both input and output using the -callback option in the command line, or use different callbacks for reading input and transcoding output (with the --from-callback and --to-callback options).
The UCONV tool is included with the public-domain ICU package at: http://site.icu-project.org/
The utility is located in: %SWISRSDK%\amd64\bin
Usage
uconv file1 [file2 ...]
[-b, --block-size size] [--callback callback] [--canon]
[--fallback | --no-fallback]
[-f, --from-code code] [--from-callback callback | -i][-h, -?, --help]
[-l, --list | --list-code code | --default-code | -L, --list-transliterators][-o, --output file]
[-s, --silent]
[-t, --to-code code] [--to-callback callback | -c][-v, --verbose]
[-V, --version]
[-x transliteration]
Options
file1 [file2 ...]
One or more files that are to be converted.
-b | --block-size size
Reads input in blocks of size bytes at a time. If no size is specified, the default is 4096.
--callback callback
Specifies a global callback to be used for problem characters in both the input and the output files. By default, problem characters stop the conversion unless a callback is specified. See Callbacks for details.
--canon
This option complements the listing options for the uconv utility:
- If used with -l, --list or --default-code, the list of encodings is produced in a format compatible with the ICU standard converter aliases format.
- If used with -L or--list-transliterators, only one transliterator name is printed per line.
--fallback | --no-fallback
Specifies whether to use the fallback mapping when transcoding from Unicode to the destination encoding (--fallback) or not (--no-fallback, the default).
-f, --from-code code
Specifies the original encoding of the input file.
--from-callback callback
Specifies the callback to use if a problem character is encountered in the input file (see Callbacks). Cannot be used with the -i option.
-h, -?, --help
Displays information about the uconv utility and how to use it.
-i
Ignores problem characters in the input. This option is the equivalent of "--from-callback skip". Cannot be used at the same time as --from-callback.
-l, --list | --list-code code | --default-code | -L, --list-transliterators
Only one of these listing options can be used at one time. Each of them lists information about the encoding options:
- -l or --list provides a list of all available encodings.
- --list-code code lists only that encoding, if it is supported.
- --default-code lists the current default encoding.
- -L or --list-transliterators lists all available transliterators
It is recommended that you use --canon with -l or -L, as the output will otherwise all be printed on a single line, making it difficult to read.
-o, --output file
Writes the transcoded output to the specified file.
-s, --silent
Suppresses messages during transcoding.
-t, --to-code code
Specifies the desired encoding for the output file.
--to-callback callback |
Specifies the callback to use if a problem character is encountered when writing the output file (see Callbacks). Cannot be used with -c.
-c
Ignores problem characters in the output. This option is the equivalent of "--to-callback skip". Cannot be used at the same time as --to-callback.
-v, --verbose
Displays extra messages during transcoding.
-V, --version
Displays version information about the uconv utility.
-x transliteration
Runs the given transliteration on the transcoded Unicode data, and uses the transliterated data as input to be transcoded to the output encoding.
Callbacks
escape, escape-icu
Replaces problematic characters with a string of format %Uhhhh for plane 0 characters and %Uhhhh%Uhhhh for higher-plane characters, where hhhh is the hexadecimal value of one of the character UTF-16 code units.
escape-c
Replaces problematic characters with a string of the format \uhhhh for plane 0 characters, and \Uhhhhhhhh for higher-plane characters, where hhhh and hhhhhhhh are the hexadecimal values of the Unicode codepoint.
escape-java
Replaces problematic characters with a string of the format \uhhhh for plane 0 characters, and \uhhhh\uhhhh for higher-plane characters, where hhhh is the hexadecimal value of one of the character UTF-16 code units.
escape-xml, escape-xml-hex
Replaces problematic characters with a string of the format &#xhhhh;, where hhhh is the hexadecimal value of the Unicode codepoint.
escape-unicode
Replaces problematic characters with a string of the format {U+hhhh}, where hhhh is the hexadecimal value of the Unicode codepoint. The length of the hexadecimal string can vary from 4 to 6 digits.
skip
Ignores the invalid data.
stop
Stops the transcoding and generates an error when a problematic character is encountered. This is the default callback.
substitute
Writes the encoding's substitute sequence, or the Unicode replacement character U+FFFD when transcoding to Unicode.
Example
uconv -f utf-8 -t utf-16 --callback escape-xml-dec myfile.txt