Difference between revisions of "CXStringEdit"

From cxwiki

Line 1: Line 1:
<div class="mw-parser-output"><div class="mw-parser-output"><div class="mw-parser-output">
+
<div class="mw-parser-output"><div class="mw-parser-output"><div class="mw-parser-output"><div class="mw-parser-output">
 
The CXStringEdit class is a string class optimised for simple editing. As compared to the standard [[CXString|CXString]], it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.
 
The CXStringEdit class is a string class optimised for simple editing. As compared to the standard [[CXString|CXString]], it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.
  
Line 84: Line 84:
  
 
Helper methods are available for composing a string, editing the string, and working with glyphs (unicode codepoints) as opposed to bytes.
 
Helper methods are available for composing a string, editing the string, and working with glyphs (unicode codepoints) as opposed to bytes.
<syntaxhighlight lang="c++">//
+
<syntaxhighlight lang="c++">// Replaces this string with an empty string. Does not deallocate
 +
// the string buffer.
 
void Clear(void);
 
void Clear(void);
  
//
+
// Appends the specified character to the end of this string. This
 +
// does not perform any extra encoding, and can be used to append
 +
// arbitrary data.
 
void AddChar(char ch);
 
void AddChar(char ch);
  
//
+
// Appends the specified character string to the end of this string.
 
void AddChars(const char* __nullable ch, size_t len);
 
void AddChars(const char* __nullable ch, size_t len);
  
//
+
// Appends the specified string to the end of this string.
 
void Add(CXStringArgument str);
 
void Add(CXStringArgument str);
  
 
+
// Appends the output of the printf-style format tokens to the end
 
+
// of this string.
//
 
 
void Addf(const char* __nonnull format, ...) CX_PRINTF_ARGS(2, 3);
 
void Addf(const char* __nonnull format, ...) CX_PRINTF_ARGS(2, 3);
  
//
+
// Appends the output of the printf-style format tokens to the end
 +
// of this string.
 
void Addv(const char* __nonnull format, va_list arg);
 
void Addv(const char* __nonnull format, va_list arg);
  
//
+
// Appends a trivial text encoding of 'value' to the end of this
 +
// string. The encoding will contain at least 'minDigits' characters.
 
void AddInt(sint32 value, int minDigits = 0);
 
void AddInt(sint32 value, int minDigits = 0);
  
//
+
// Appends a trivial text encoding of 'value' to the end of this
 +
// string. The encoding will contain at least 'minDigits' characters.
 
void AddInt64(sint64 value, int minDigits = 0);
 
void AddInt64(sint64 value, int minDigits = 0);
  
//
+
// Appends a trivial text encoding of 'value' to the end of this
 +
// string. The encoding will contain at least 'minDigits' characters.
 
void AddUInt64(uint64 value, int minDigits = 0);
 
void AddUInt64(uint64 value, int minDigits = 0);
  
//
+
// Returns a new string which contains the concatenation of this
 +
// string and 'other'.
 
CXStringEdit operator+(const CXStringArgument& other) const;
 
CXStringEdit operator+(const CXStringArgument& other) const;
  
//
+
// Erases the specified character range. The supplied index and
 +
// length refer to byte offsets, not to glyph indices. Out-of-
 +
// bounds indices are clamped.
 
void Del(size_t startIndex, size_t length);
 
void Del(size_t startIndex, size_t length);
  
//
+
// Replaces the specified character range with the specified text
 +
// string. The supplied index and length refer to byte offsets,
 +
// not to glyph indices. Out-of-bounds indices are clamped.
 
void Replace(size_t startIndex, size_t oldLength, const CXStringArgument& newText);
 
void Replace(size_t startIndex, size_t oldLength, const CXStringArgument& newText);
  
//
+
// Returns a copy of the specified characters. The supplied indices
 +
// refer to byte offsets, not to glyph indices. Out-of-bounds
 +
// indices are clamped.
 
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;
 
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;
  
//
+
// Converts all glyphs in this string to lowercase.
 +
// Note: This currently only supports ASCII glyphs.
 
void ToLower(void);
 
void ToLower(void);
  
//
+
// Converts all glyphs in this string to uppercase.
 +
// Note: This currently only supports ASCII glyphs.
 
void ToUpper(void);
 
void ToUpper(void);
  
// Returns the byte index of first match of the specified glpyh at or after the specified startIndex. Returns -1 if no match.
+
// Returns the byte index of first match of the specified glyph at or after the specified startIndex. Returns -1 if no match.
 
signed_size_t Find(char ch, size_t startPosition = 0);
 
signed_size_t Find(char ch, size_t startPosition = 0);
  
Line 173: Line 188:
  
 
&nbsp;
 
&nbsp;
</div> </div> </div>
+
</div> </div> </div> </div>

Revision as of 03:43, 25 February 2018

The CXStringEdit class is a string class optimised for simple editing. As compared to the standard CXString, it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.

A CXStringEdit object nominally stores UTF-8 encoded text with a zero termination byte. Short strings are stored in an internal buffer, while long strings are stored in an allocated buffer. CXStringEdit objects do not distinguish "null" and "empty" strings.

 

In practice, CXString objects can contain any binary data at all with little overhead:

  • The UTF-8 encoding of the payload is not checked or enforced, except by functions which explicitly deal with UTF-8 glyphs. 
  • While they do have a guaranteed zero terminator byte, nothing prevents additional zero bytes within the payload. 
  • The zero terminator byte is not considered part of the payload, so will not be accidentally appended to a "non-zero-terminated" binary payload. 

 

Construction

Various constructors are available to allow a CXString to be build from a C String, other string classes, etc.

// Construct an empty string.
CXStringEdit(void);

// Construct a string as a copy of an input string object or character string.
CXStringEdit(const CXStringEdit&);
CXStringEdit(CXStringEdit&&);
CXStringEdit(const char* __nullable op);
CXStringEdit(const char* __nonnull op, const char* __nonnull end);
CXStringEdit(const char* __nullable op, size_t len);
CXStringEdit(const CXString& op);
CXStringEdit(const CXStringArgument& op);
CXStringEdit(NSString* __nullable str);

 

Comparison

Bytewise comparison operators are available. If case-insensitive operations are required, the CXStringUtils functions should be used.

// Byte-for-byte equality test operators.
bool operator==(const CXStringArgument& str) const;
bool operator!=(const CXStringArgument& str) const;

// Byte-for-byte sort operators.
bool operator<(const CXStringArgument &other) const;
bool operator<=(const CXStringArgument &other) const;
bool operator>(const CXStringArgument &other) const;
bool operator>=(const CXStringArgument &other) const;

 


Accessors

A variety of simple accessors are provided to give read and write access to the payload.

//
char* __nonnull GetBufferUnsafe(void);

//
char* __nonnull GetBufferUnsafeUnterminated(void);

//
const char* __nonnull GetBufferUnsafeUnterminated(void) const;

//
const char* __nonnull c_str(void) const;

//
operator NSString* __nonnull (void) const;

//
const char& operator[] (int index) const;

//
char& operator[] (int index);

//
size_t Length(void) const;

//
bool IsEmpty(void) const;

 

 

Helpers

Helper methods are available for composing a string, editing the string, and working with glyphs (unicode codepoints) as opposed to bytes.

// Replaces this string with an empty string. Does not deallocate
// the string buffer.
void Clear(void);

// Appends the specified character to the end of this string. This
// does not perform any extra encoding, and can be used to append
// arbitrary data.
void AddChar(char ch);

// Appends the specified character string to the end of this string.
void AddChars(const char* __nullable ch, size_t len);

// Appends the specified string to the end of this string.
void Add(CXStringArgument str);

// Appends the output of the printf-style format tokens to the end
// of this string.
void Addf(const char* __nonnull format, ...) CX_PRINTF_ARGS(2, 3);

// Appends the output of the printf-style format tokens to the end
// of this string.
void Addv(const char* __nonnull format, va_list arg);

// Appends a trivial text encoding of 'value' to the end of this
// string. The encoding will contain at least 'minDigits' characters.
void AddInt(sint32 value, int minDigits = 0);

// Appends a trivial text encoding of 'value' to the end of this
// string. The encoding will contain at least 'minDigits' characters.
void AddInt64(sint64 value, int minDigits = 0);

// Appends a trivial text encoding of 'value' to the end of this
// string. The encoding will contain at least 'minDigits' characters.
void AddUInt64(uint64 value, int minDigits = 0);

// Returns a new string which contains the concatenation of this
// string and 'other'.
CXStringEdit operator+(const CXStringArgument& other) const;

// Erases the specified character range. The supplied index and
// length refer to byte offsets, not to glyph indices. Out-of-
// bounds indices are clamped.
void Del(size_t startIndex, size_t length);

// Replaces the specified character range with the specified text
// string. The supplied index and length refer to byte offsets,
// not to glyph indices. Out-of-bounds indices are clamped.
void Replace(size_t startIndex, size_t oldLength, const CXStringArgument& newText);

// Returns a copy of the specified characters. The supplied indices
// refer to byte offsets, not to glyph indices. Out-of-bounds 
// indices are clamped.
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;

// Converts all glyphs in this string to lowercase.
// Note: This currently only supports ASCII glyphs.
void ToLower(void);

// Converts all glyphs in this string to uppercase.
// Note: This currently only supports ASCII glyphs.
void ToUpper(void);

// Returns the byte index of first match of the specified glyph at or after the specified startIndex. Returns -1 if no match.
signed_size_t Find(char ch, size_t startPosition = 0);

// Adds the UTF8 bytes to the end of this string which represent the specified unicode codepoint.
bool AddGlyph(uint glyph);

// Returns the unicode codepoint of the front glyph in this string. Returns 0 
// if the string does not start with a valid UTF-8 sequence.
uint GetFirstGlyph(void) const;

// Returns the unicode codepoint of the back characters in this string. 
// Returns 0 if the string does not end with a valid UTF-8 sequence.
uint GetLastGlyph(void) const;

// Deletes the front glyph from this string. If the string does not start 
// with a valid UTF-8 sequence, this will delete characters until the string
// starts with a valid UTF-8 sequence or until the string is empty.
// Does nothing if the string is empty.
void DeleteFirstGlyph(void);

// Deletes the back glyph from this string. If the string does not end
// with a valid UTF-8 sequence, this will delete characters until the string
// ends with a valid UTF-8 sequence or until the string is empty.
// Does nothing if the string is empty.
void DeleteLastGlyph(void);

// Returns a copy of the specified glyphs. The supplied indices refer to
// glyph indices, not byte offsets. Out-of-bounds indices are clamped.
CXStringEdit CopyGlyphs(signed_size_t startGlyphIndex, signed_size_t endGlyphIndex) const;

// Returns the glyph (unicode codepoint) at the specified index. The
// supplied index refers to the glyph index, not a byte offset. Returns
// 0 if the index is out of bounds.
uint32 GetIndexedGlyph(signed_size_t glyphIndex) const;

// Returns the total number of glyphs in the string. Any bytes present
// which do not form valid UTF-8 encodings are each counted as one glyph.
size_t CountGlyphs(void) const;