Difference between revisions of "CXStringEdit"

From cxwiki

Line 1: Line 1:
<div class="mw-parser-output"><div class="mw-parser-output">
+
<div class="mw-parser-output"><div class="mw-parser-output"><div class="mw-parser-output">
 
The CXStringEdit class is a string class optimised for simple editing. As compared to the standard [[CXString|CXString]], it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.
 
The CXStringEdit class is a string class optimised for simple editing. As compared to the standard [[CXString|CXString]], it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.
  
Line 17: Line 17:
  
 
Various constructors are available to allow a CXString to be build from a C String, other string classes, etc.
 
Various constructors are available to allow a CXString to be build from a C String, other string classes, etc.
<syntaxhighlight lang="c++">//
+
<syntaxhighlight lang="c++">// Construct an empty string.
 
CXStringEdit(void);
 
CXStringEdit(void);
  
//
+
// Construct a string as a copy of an input string object or character string.
 
CXStringEdit(const CXStringEdit&);
 
CXStringEdit(const CXStringEdit&);
 
CXStringEdit(CXStringEdit&&);
 
CXStringEdit(CXStringEdit&&);
Line 35: Line 35:
  
 
Bytewise comparison operators are available.&nbsp;If case-insensitive operations are required, the&nbsp;[http://cxwiki.typhoonsystems.com.au/index.php?title=CXStringUtils&action=edit&redlink=1 CXStringUtils]&nbsp;functions should be used.
 
Bytewise comparison operators are available.&nbsp;If case-insensitive operations are required, the&nbsp;[http://cxwiki.typhoonsystems.com.au/index.php?title=CXStringUtils&action=edit&redlink=1 CXStringUtils]&nbsp;functions should be used.
<syntaxhighlight lang="c++">bool operator==(const CXStringArgument& str) const;
+
<syntaxhighlight lang="c++">// Byte-for-byte equality test operators.
 +
bool operator==(const CXStringArgument& str) const;
 
bool operator!=(const CXStringArgument& str) const;
 
bool operator!=(const CXStringArgument& str) const;
 +
 +
// Byte-for-byte sort operators.
 
bool operator<(const CXStringArgument &other) const;
 
bool operator<(const CXStringArgument &other) const;
 
bool operator<=(const CXStringArgument &other) const;
 
bool operator<=(const CXStringArgument &other) const;
Line 109: Line 112:
 
//
 
//
 
void AddUInt64(uint64 value, int minDigits = 0);
 
void AddUInt64(uint64 value, int minDigits = 0);
 
 
  
 
//
 
//
 
CXStringEdit operator+(const CXStringArgument& other) const;
 
CXStringEdit operator+(const CXStringArgument& other) const;
 
  
 
  
 
//
 
//
Line 125: Line 124:
 
//
 
//
 
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;
 
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;
 
  
 
  
 
//
 
//
Line 134: Line 131:
 
void ToUpper(void);
 
void ToUpper(void);
  
//
+
// Returns the byte index of first match of the specified glpyh at or after the specified startIndex. Returns -1 if no match.
 
signed_size_t Find(char ch, size_t startPosition = 0);
 
signed_size_t Find(char ch, size_t startPosition = 0);
  
  
+
// Adds the UTF8 bytes to the end of this string which represent the specified unicode codepoint.
 
 
  
 
 
 
// The following functions work with UTF8 glyphs, not bytes.
 
 
 
//
 
 
bool AddGlyph(uint glyph);
 
bool AddGlyph(uint glyph);
  
//
+
// Returns the unicode codepoint of the front glyph in this string. Returns 0
 +
// if the string does not start with a valid UTF-8 sequence.
 
uint GetFirstGlyph(void) const;
 
uint GetFirstGlyph(void) const;
  
//
+
// Returns the unicode codepoint of the back characters in this string.
 +
// Returns 0 if the string does not end with a valid UTF-8 sequence.
 
uint GetLastGlyph(void) const;
 
uint GetLastGlyph(void) const;
  
//
+
// Deletes the front glyph from this string. If the string does not start
 +
// with a valid UTF-8 sequence, this will delete characters until the string
 +
// starts with a valid UTF-8 sequence or until the string is empty.
 +
// Does nothing if the string is empty.
 
void DeleteFirstGlyph(void);
 
void DeleteFirstGlyph(void);
  
//
+
// Deletes the back glyph from this string. If the string does not end
 +
// with a valid UTF-8 sequence, this will delete characters until the string
 +
// ends with a valid UTF-8 sequence or until the string is empty.
 +
// Does nothing if the string is empty.
 
void DeleteLastGlyph(void);
 
void DeleteLastGlyph(void);
  
//
+
// Returns a copy of the specified glyphs. The supplied indices refer to
 +
// glyph indices, not byte offsets. Out-of-bounds indices are clamped.
 
CXStringEdit CopyGlyphs(signed_size_t startGlyphIndex, signed_size_t endGlyphIndex) const;
 
CXStringEdit CopyGlyphs(signed_size_t startGlyphIndex, signed_size_t endGlyphIndex) const;
  
//
+
// Returns the glyph (unicode codepoint) at the specified index. The
 +
// supplied index refers to the glyph index, not a byte offset. Returns
 +
// 0 if the index is out of bounds.
 
uint32 GetIndexedGlyph(signed_size_t glyphIndex) const;
 
uint32 GetIndexedGlyph(signed_size_t glyphIndex) const;
  
//
+
// Returns the total number of glyphs in the string. Any bytes present
 +
// which do not form valid UTF-8 encodings are each counted as one glyph.
 
size_t CountGlyphs(void) const;</syntaxhighlight>
 
size_t CountGlyphs(void) const;</syntaxhighlight>
  
Line 170: Line 173:
  
 
&nbsp;
 
&nbsp;
</div> </div>
+
</div> </div> </div>

Revision as of 19:59, 24 February 2018

The CXStringEdit class is a string class optimised for simple editing. As compared to the standard CXString, it offers vastly superior general edit performance for short strings, vastly superior append performance even for long strings, and a variety of helper methods for string editing.

A CXStringEdit object nominally stores UTF-8 encoded text with a zero termination byte. Short strings are stored in an internal buffer, while long strings are stored in an allocated buffer. CXStringEdit objects do not distinguish "null" and "empty" strings.

 

In practice, CXString objects can contain any binary data at all with little overhead:

  • The UTF-8 encoding of the payload is not checked or enforced, except by functions which explicitly deal with UTF-8 glyphs. 
  • While they do have a guaranteed zero terminator byte, nothing prevents additional zero bytes within the payload. 
  • The zero terminator byte is not considered part of the payload, so will not be accidentally appended to a "non-zero-terminated" binary payload. 

 

Construction

Various constructors are available to allow a CXString to be build from a C String, other string classes, etc.

// Construct an empty string.
CXStringEdit(void);

// Construct a string as a copy of an input string object or character string.
CXStringEdit(const CXStringEdit&);
CXStringEdit(CXStringEdit&&);
CXStringEdit(const char* __nullable op);
CXStringEdit(const char* __nonnull op, const char* __nonnull end);
CXStringEdit(const char* __nullable op, size_t len);
CXStringEdit(const CXString& op);
CXStringEdit(const CXStringArgument& op);
CXStringEdit(NSString* __nullable str);

 

Comparison

Bytewise comparison operators are available. If case-insensitive operations are required, the CXStringUtils functions should be used.

// Byte-for-byte equality test operators.
bool operator==(const CXStringArgument& str) const;
bool operator!=(const CXStringArgument& str) const;

// Byte-for-byte sort operators.
bool operator<(const CXStringArgument &other) const;
bool operator<=(const CXStringArgument &other) const;
bool operator>(const CXStringArgument &other) const;
bool operator>=(const CXStringArgument &other) const;

 


Accessors

A variety of simple accessors are provided to give read and write access to the payload.

//
char* __nonnull GetBufferUnsafe(void);

//
char* __nonnull GetBufferUnsafeUnterminated(void);

//
const char* __nonnull GetBufferUnsafeUnterminated(void) const;

//
const char* __nonnull c_str(void) const;

//
operator NSString* __nonnull (void) const;

//
const char& operator[] (int index) const;

//
char& operator[] (int index);

//
size_t Length(void) const;

//
bool IsEmpty(void) const;

 

 

Helpers

Helper methods are available for composing a string, editing the string, and working with glyphs (unicode codepoints) as opposed to bytes.

//
void Clear(void);

//
void AddChar(char ch);

//
void AddChars(const char* __nullable ch, size_t len);

//
void Add(CXStringArgument str);



//
void Addf(const char* __nonnull format, ...) CX_PRINTF_ARGS(2, 3);

//
void Addv(const char* __nonnull format, va_list arg);

//
void AddInt(sint32 value, int minDigits = 0);

//
void AddInt64(sint64 value, int minDigits = 0);

//
void AddUInt64(uint64 value, int minDigits = 0);

//
CXStringEdit operator+(const CXStringArgument& other) const;

//
void Del(size_t startIndex, size_t length);

//
void Replace(size_t startIndex, size_t oldLength, const CXStringArgument& newText);

//
CXStringEdit Copy(signed_size_t startIndex, signed_size_t endIndex) const;

//
void ToLower(void);

//
void ToUpper(void);

// Returns the byte index of first match of the specified glpyh at or after the specified startIndex. Returns -1 if no match.
signed_size_t Find(char ch, size_t startPosition = 0);

// Adds the UTF8 bytes to the end of this string which represent the specified unicode codepoint.
bool AddGlyph(uint glyph);

// Returns the unicode codepoint of the front glyph in this string. Returns 0 
// if the string does not start with a valid UTF-8 sequence.
uint GetFirstGlyph(void) const;

// Returns the unicode codepoint of the back characters in this string. 
// Returns 0 if the string does not end with a valid UTF-8 sequence.
uint GetLastGlyph(void) const;

// Deletes the front glyph from this string. If the string does not start 
// with a valid UTF-8 sequence, this will delete characters until the string
// starts with a valid UTF-8 sequence or until the string is empty.
// Does nothing if the string is empty.
void DeleteFirstGlyph(void);

// Deletes the back glyph from this string. If the string does not end
// with a valid UTF-8 sequence, this will delete characters until the string
// ends with a valid UTF-8 sequence or until the string is empty.
// Does nothing if the string is empty.
void DeleteLastGlyph(void);

// Returns a copy of the specified glyphs. The supplied indices refer to
// glyph indices, not byte offsets. Out-of-bounds indices are clamped.
CXStringEdit CopyGlyphs(signed_size_t startGlyphIndex, signed_size_t endGlyphIndex) const;

// Returns the glyph (unicode codepoint) at the specified index. The
// supplied index refers to the glyph index, not a byte offset. Returns
// 0 if the index is out of bounds.
uint32 GetIndexedGlyph(signed_size_t glyphIndex) const;

// Returns the total number of glyphs in the string. Any bytes present
// which do not form valid UTF-8 encodings are each counted as one glyph.
size_t CountGlyphs(void) const;