CXStringEdit

From cxwiki

The CXStringEdit class is a string class optimised for simple editing. As compared to the standard CXString, it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.

A CXStringEdit object nominally stores UTF-8 encoded text with a zero termination byte. Short strings are stored in an internal buffer, while long strings are stored in an allocated buffer. CXStringEdit objects do not distinguish "null" and "empty" strings.

 

In practice, CXString objects can contain any binary data at all with little overhead:

  • The UTF-8 encoding of the payload is not checked or enforced, except by functions which explicitly deal with UTF-8 glyphs. 
  • While they do have a guaranteed zero terminator byte, nothing prevents additional zero bytes within the payload. 
  • The zero terminator byte is not considered part of the payload, so will not be accidentally appended to a "non-zero-terminated" binary payload. 

 

Construction

Various constructors are available to allow a CXString to be build from a C String, other string classes, etc.

// Construct an empty string.
CXStringEdit(void);

// Construct a string as a copy of an input string object or character string.
CXStringEdit(const CXStringEdit&);
CXStringEdit(CXStringEdit&&);
CXStringEdit(const char* __nullable op);
CXStringEdit(const char* __nonnull op, const char* __nonnull end);
CXStringEdit(const char* __nullable op, size_t len);
CXStringEdit(const CXString& op);
CXStringEdit(const CXStringArgument& op);
CXStringEdit(NSString* __nullable str);

 

Comparison

Bytewise comparison operators are available. If case-insensitive operations are required, the CXStringUtils functions should be used.

// Byte-for-byte equality test operators.
bool operator==(const CXStringArgument& str) const;
bool operator!=(const CXStringArgument& str) const;

// Byte-for-byte sort operators.
bool operator<(const CXStringArgument &other) const;
bool operator<=(const CXStringArgument &other) const;
bool operator>(const CXStringArgument &other) const;
bool operator>=(const CXStringArgument &other) const;

 


Accessors

A variety of simple accessors are provided to give read and write access to the payload.

// Returns a non-const, non-null C internal pointer to our payload.   
// This allos direct manipulation of any character within the string. 
// You may write any data in [0 .. GetLength()-1] inclusive and may 
// also read (but not write) the zero termination byte at 
// [GetLength()]. This pointer becomes invalid if you delete the
// CXStringEdit object, or modify its Length in any way, whether
// directly or indirectly. You may write zero bytes into the payload,
// but doing so does not affect the payload's length. If you wish to
// edit in a manner that reduces the payload's length, you should
// perform the character edits first and then use SetLength() to
// adjust the length.
char* __nonnull GetBufferUnsafe(void);

// Equivalent to GetBufferUnsafe() with the exception that the zero
// termination byte is not present.
char* __nonnull GetBufferUnsafeUnterminated(void);

// Equivalent to GetBufferUnsafe() with the exception that the zero
// termination byte is not present and modifications to the payload
// are not permitted.
const char* __nonnull GetBufferUnsafeUnterminated(void) const;

// Returs an internal C String pointer to this object's payload. The
// pointer remains valid until this object is deleted or its length
// is changed (implicitly or explicitly).
const char* __nonnull c_str(void) const;

// (Cocoa or iOS only) Returns an NSString object which represents
// the same string as this object. Since NSString cannot be used
// to store arbitrary data, any invalid UTF-8 will be converted to
// placeholder characters.
operator NSString* __nonnull (void) const;

// Returns a const reference to any character in the payload or
// the zero terminator. Valid indices are in the range 
// [0..Length()] inclusive.
// Indices are expressed in Bytes, not Glyphs.
const char& operator[] (int index) const;

// Returns a non-const reference to any character in the payload.
// Valid indices are in the range [0..Length()-1] inclusive.
// Indices are expressed in Bytes, not Glyphs.
char& operator[] (int index);

// Returns the length of the payload in Bytes, not including 
// the zero termination.
size_t Length(void) const;

// Returns true if the payload contains zero bytes.
bool IsEmpty(void) const;

 

 

Helpers

Helper methods are available for composing a string, editing the string, and working with glyphs (unicode codepoints) as opposed to bytes.

// Replaces this string with an empty string. Does not deallocate
// the string buffer.
void Clear(void);

// Appends the specified character to the end of this string. This
// does not perform any extra encoding, and can be used to append
// arbitrary data.
void AddChar(char ch);

// Appends the specified character string to the end of this string.
void AddChars(const char* __nullable ch, size_t len);

// Appends the specified string to the end of this string.
void Add(CXStringArgument str);

// Appends the output of the printf-style format tokens to the end
// of this string.
void Addf(const char* __nonnull format, ...) CX_PRINTF_ARGS(2, 3);

// Appends the output of the printf-style format tokens to the end
// of this string.
void Addv(const char* __nonnull format, va_list arg);

// Appends a trivial text encoding of 'value' to the end of this
// string. The encoding will contain at least 'minDigits' characters.
void AddInt(sint32 value, int minDigits = 0);

// Appends a trivial text encoding of 'value' to the end of this
// string. The encoding will contain at least 'minDigits' characters.
void AddInt64(sint64 value, int minDigits = 0);

// Appends a trivial text encoding of 'value' to the end of this
// string. The encoding will contain at least 'minDigits' characters.
void AddUInt64(uint64 value, int minDigits = 0);

// Returns a new string which contains the concatenation of this
// string and 'other'.
CXStringEdit operator+(const CXStringArgument& other) const;

// Erases the specified character range. The supplied index and
// length refer to byte offsets, not to glyph indices. Out-of-
// bounds indices are clamped.
void Del(size_t startIndex, size_t length);

// Replaces the specified character range with the specified text
// string. The supplied index and length refer to byte offsets,
// not to glyph indices. Out-of-bounds indices are clamped.
void Replace(size_t startIndex, size_t oldLength, const CXStringArgument& newText);

// Returns a copy of the specified characters. The supplied indices
// refer to byte offsets, not to glyph indices. Out-of-bounds 
// indices are clamped.
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;

// Converts all glyphs in this string to lowercase.
// Note: This currently only supports ASCII glyphs.
void ToLower(void);

// Converts all glyphs in this string to uppercase.
// Note: This currently only supports ASCII glyphs.
void ToUpper(void);

// Returns the byte index of first match of the specified glyph at
// or after the specified startIndex. Returns -1 if no match.
signed_size_t Find(char ch, size_t startPosition = 0);

// Adds the UTF8 bytes to the end of this string which represent 
// the specified unicode codepoint.
bool AddGlyph(uint glyph);

// Returns the unicode codepoint of the front glyph in this string. Returns 0 
// if the string does not start with a valid UTF-8 sequence.
uint GetFirstGlyph(void) const;

// Returns the unicode codepoint of the back characters in this string. 
// Returns 0 if the string does not end with a valid UTF-8 sequence.
uint GetLastGlyph(void) const;

// Deletes the front glyph from this string. If the string does not    
// start with a valid UTF-8 sequence, this will delete characters until
// the string starts with a valid UTF-8 sequence or until the string is 
// empty. Does nothing if the string is empty.
void DeleteFirstGlyph(void);

// Deletes the back glyph from this string. If the string does not end
// with a valid UTF-8 sequence, this will delete characters until the
// string ends with a valid UTF-8 sequence or until the string is empty.
// Does nothing if the string is empty.
void DeleteLastGlyph(void);

// Returns a copy of the specified glyphs. The supplied indices refer to
// glyph indices, not byte offsets. Out-of-bounds indices are clamped.
CXStringEdit CopyGlyphs(signed_size_t startGlyphIndex, signed_size_t endGlyphIndex) const;

// Returns the glyph (unicode codepoint) at the specified index. The
// supplied index refers to the glyph index, not a byte offset. Returns
// 0 if the index is out of bounds.
uint32 GetIndexedGlyph(signed_size_t glyphIndex) const;

// Returns the total number of glyphs in the string. Any bytes present
// which do not form valid UTF-8 encodings are each counted as one glyph.
size_t CountGlyphs(void) const;