View source code
Display the source code in std/uni.d from which this page was generated on github.
Report a bug
If you spot a problem with this page, click here to create a Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone.

Struct std.uni.InversionList

InversionList is a set of code points represented as an array of open-right [a, b) intervals (see CodepointInterval above). The name comes from the way the representation reads left to right. For instance a set of all values [10, 50), [80, 90), plus a singular value 60 looks like this:

struct InversionList(SP) ;
10, 50, 60, 61, 80, 90

The way to read this is: start with negative meaning that all numbers smaller then the next one are not present in this set (and positive - the contrary). Then switch positive/negative after each number passed from left to right.

This way negative spans until 10, then positive until 50, then negative until 60, then positive until 61, and so on. As seen this provides a space-efficient storage of highly redundant data that comes in long runs. A description which Unicode character properties fit nicely. The technique itself could be seen as a variation on RLE encoding.

Sets are value types (just like int is) thus they are never aliased.

Constructors

NameDescription
this (set) Construct from another code point set of any type.
this (intervals) Construct a set from a forward range of code point intervals.
this () Construct a set from plain values of code point intervals.

Properties

NameTypeDescription
byCodepoint[get] autoA range that spans each code point in this set.
byInterval[get] autoGet range that spans all of the code point intervals in this InversionList.
empty[get] boolTrue if this set doesn't contain any code points.
inverted[get] autoObtains a set that is the inversion of this set.
length[get] size_tNumber of code points in this set

Methods

NameDescription
add (a, b) Add an interval [a, b) to this set.
opBinary (rhs)

Sets support natural syntax for set algebra, namely:

Operator Math notation Description
& a ∩ b intersection
| a ∪ b union
- a ∖ b subtraction
~ a ~ b symmetric set difference i.e. (a ∪ b) \ (a ∩ b)
opBinaryRight (ch) Tests the presence of codepoint ch in this set, the same as opIndex.
opIndex (val) Tests the presence of code point val in this set.
opOpAssign (rhs) The 'op=' versions of the above overloaded operators.
opUnary () Obtains a set that is the inversion of this set.
toSourceCode (funcName) Generates string with D source code of unary function with name of funcName taking a single dchar argument. If funcName is empty the code is adjusted to be a lambda function.
toString (sink, fmt) Obtain a textual representation of this InversionList in form of open-right intervals.

Example

auto a = CodepointSet('a', 'z'+1);
auto b = CodepointSet('A', 'Z'+1);
auto c = a;
a = a | b;
assert(a == CodepointSet('A', 'Z'+1, 'a', 'z'+1));
assert(a != c);

See also unicode for simpler construction of sets from predefined ones.

Memory usage is 8 bytes per each contiguous interval in a set. The value semantics are achieved by using the COW technique and thus it's not safe to cast this type to shared.

Note

It's not recommended to rely on the template parameters or the exact type of a current code point set in std.uni. The type and parameters may change when the standard allocators design is finalized. Use isCodepointSet with templates or just stick with the default alias CodepointSet throughout the whole code base.

Authors

Dmitry Olshansky

License

Boost License 1.0.