View source code
Display the source code in std/utf.d from which this
page was generated on github.
Report a bug
If you spot a problem with this page, click here to create a
Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page.
Requires a signed-in GitHub account. This works well for small changes.
If you'd like to make larger changes you may want to consider using
local clone.
Module std.utf
Encode and decode UTF-8, UTF-16 and UTF-32 strings.
UTF character support is restricted to
'\u0000' <= character <= '\U0010FFFF'
.
Category | Functions |
---|---|
Decode | decode
decodeFront
|
Lazy decode | byCodeUnit
byChar
byWchar
byDchar
byUTF
|
Encode | encode
toUTF8
toUTF16
toUTF32
toUTFz
toUTF16z
|
Length | codeLength
count
stride
strideBack
|
Index | toUCSindex
toUTFindex
|
Validation | isValidDchar
isValidCodepoint
validate
|
Miscellaneous | replacementDchar
UseReplacementDchar
UTFException
|
See Also
Wikipedia
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335
Functions
Name | Description |
---|---|
byCodeUnit(r)
|
Iterate a range of char, wchar, or dchars by code unit. |
codeLength(c)
|
Returns the number of code units that are required to encode the code point
c when C is the character type used to encode it.
|
codeLength(input)
|
Returns the number of code units that are required to encode str
in a string whose character type is C . This is particularly useful
when slicing one string with the length of another and the two string
types use different character types.
|
count(str)
|
Returns the total number of code points encoded in str .
|
decode(str, index)
|
Decodes and returns the code point starting at str[index] . index
is advanced to one past the decoded code point. If the code point is not
well-formed, then a UTFException is thrown and index remains
unchanged.
|
decodeBack(str, numCodeUnits)
|
decodeBack is a variant of decode which specifically decodes
the last code point. Unlike decode , decodeBack accepts any
bidirectional range of code units (rather than just a string or random access
range). It also takes the range by ref and pops off the elements as it
decodes them. If numCodeUnits is passed in, it gets set to the number
of code units which were in the code point which was decoded.
|
decodeFront(str, numCodeUnits)
|
decodeFront is a variant of decode which specifically decodes
the first code point. Unlike decode , decodeFront accepts any
input range
of code units (rather than just a string or random access
range). It also takes the range by ref and pops off the elements as it
decodes them. If numCodeUnits is passed in, it gets set to the number
of code units which were in the code point which was decoded.
|
encode(buf, c)
|
Encodes c into the static array, buf , and returns the actual
length of the encoded character (a number between 1 and 4 for
char[4] buffers and a number between 1 and 2 for
wchar[2] buffers).
|
encode(str, c)
|
Encodes c in str 's encoding and appends it to str .
|
isValidCodepoint(c)
|
Checks if a single character forms a valid code point. |
isValidDchar(c)
|
Check whether the given Unicode code point is valid. |
stride(str, index)
|
Calculate the length of the UTF sequence starting at index
in str .
|
strideBack(str, index)
|
Calculate the length of the UTF sequence ending one code unit before
index in str .
|
toUCSindex(str, index)
|
Given index into str and assuming that index is at the start
of a UTF sequence, toUCSindex determines the number of UCS characters
up to index . So, index is the index of a code unit at the
beginning of a code point, and the return value is how many code points into
the string that that code point is.
|
toUTF16(s)
|
Encodes the elements of s to UTF-16 and returns a newly GC allocated
wstring of the elements.
|
toUTF16z(str)
|
toUTF16z is a convenience function for toUTFz!(const(wchar)*) .
|
toUTF32(s)
|
Encodes the elements of s to UTF-32 and returns a newly GC allocated
dstring of the elements.
|
toUTF8(s)
|
Encodes the elements of s to UTF-8 and returns a newly allocated
string of the elements.
|
toUTFindex(str, n)
|
Given a UCS index n into str , returns the UTF index.
So, n is how many code points into the string the code point is, and
the array index of the code unit is returned.
|
validate(str)
|
Checks to see if str is well-formed unicode or not.
|
Classes
Name | Description |
---|---|
UTFException
|
Exception thrown on errors in std.utf functions. |
Templates
Name | Description |
---|---|
toUTFz
|
Returns a C-style zero-terminated string equivalent to str . str
must not contain embedded '\0' 's as any C function will treat the first
'\0' that it sees as the end of the string. If str is
true , then a string containing only '\0' is returned.
|
Manifest constants
Name | Type | Description |
---|---|---|
replacementDchar
|
Inserted in place of invalid UTF sequences. |
Aliases
Name | Type | Description |
---|---|---|
byChar
|
byUTF!char
|
Iterate an input range
of characters by char, wchar, or dchar.
These aliases simply forward to byUTF with the
corresponding C argument.
|
byDchar
|
byUTF!dchar
|
Iterate an input range
of characters by char, wchar, or dchar.
These aliases simply forward to byUTF with the
corresponding C argument.
|
byUTF
|
byUTF!UC
|
Iterate an input range
of characters by char type C by encoding the elements of the range.
|
byWchar
|
byUTF!wchar
|
Iterate an input range
of characters by char, wchar, or dchar.
These aliases simply forward to byUTF with the
corresponding C argument.
|
UseReplacementDchar
|
Flag!("useReplacementDchar")
|
Whether or not to replace invalid UTF with replacementDchar
|
Authors
License
Copyright © 1999-2025 by the D Language Foundation | Page generated by ddox.