linebreakdef.h File Reference

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm. More...

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  LineBreakProperties
 Struct for entries of line break properties. More...
struct  LineBreakPropertiesLang
 Struct for association of language-specific line breaking properties with language names. More...

Defines

#define EOS   0xFFFF
 Constant value to mark the end of string.

Typedefs

typedef utf32_t(*) get_next_char_t (const void *, size_t, size_t *)
 Abstract function interface for lb_get_next_char_utf8, lb_get_next_char_utf16, and lb_get_next_char_utf32.

Enumerations

enum  LineBreakClass {
  LBP_Undefined, LBP_OP, LBP_CL, LBP_CP,
  LBP_QU, LBP_GL, LBP_NS, LBP_EX,
  LBP_SY, LBP_IS, LBP_PR, LBP_PO,
  LBP_NU, LBP_AL, LBP_ID, LBP_IN,
  LBP_HY, LBP_BA, LBP_BB, LBP_B2,
  LBP_ZW, LBP_CM, LBP_WJ, LBP_H2,
  LBP_H3, LBP_JL, LBP_JV, LBP_JT,
  LBP_AI, LBP_BK, LBP_CB, LBP_CR,
  LBP_LF, LBP_NL, LBP_SA, LBP_SG,
  LBP_SP, LBP_XX
}
 Line break classes. More...

Functions

utf32_t lb_get_next_char_utf8 (const utf8_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-8 sequence.
utf32_t lb_get_next_char_utf16 (const utf16_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-16 sequence.
utf32_t lb_get_next_char_utf32 (const utf32_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-32 sequence.
void set_linebreaks (const void *s, size_t len, const char *lang, char *brks, get_next_char_t get_next_char)
 Sets the line breaking information for a generic input string.

Variables

LineBreakProperties lb_prop_default []
 Default line breaking properties as from the Unicode Web site.
LineBreakPropertiesLang lb_prop_lang_map []
 Association data of language-specific line breaking properties with language names.


Detailed Description

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm.

Version:
2.1, 2011/05/07
Author:
Wu Yongwei

Define Documentation

#define EOS   0xFFFF

Constant value to mark the end of string.

It is not a valid Unicode character.


Typedef Documentation

typedef utf32_t(*) get_next_char_t(const void *, size_t, size_t *)

Abstract function interface for lb_get_next_char_utf8, lb_get_next_char_utf16, and lb_get_next_char_utf32.


Enumeration Type Documentation

enum LineBreakClass

Line break classes.

This is a direct mapping of Table 1 of Unicode Standard Annex 14, Revision 26.

Enumerator:
LBP_Undefined  Undefined.
LBP_OP  Opening punctuation.
LBP_CL  Closing punctuation.
LBP_CP  Closing parenthesis.
LBP_QU  Ambiguous quotation.
LBP_GL  Glue.
LBP_NS  Non-starters.
LBP_EX  Exclamation/Interrogation.
LBP_SY  Symbols allowing break after.
LBP_IS  Infix separator.
LBP_PR  Prefix.
LBP_PO  Postfix.
LBP_NU  Numeric.
LBP_AL  Alphabetic.
LBP_ID  Ideographic.
LBP_IN  Inseparable characters.
LBP_HY  Hyphen.
LBP_BA  Break after.
LBP_BB  Break before.
LBP_B2  Break on either side (but not pair).
LBP_ZW  Zero-width space.
LBP_CM  Combining marks.
LBP_WJ  Word joiner.
LBP_H2  Hangul LV.
LBP_H3  Hangul LVT.
LBP_JL  Hangul L Jamo.
LBP_JV  Hangul V Jamo.
LBP_JT  Hangul T Jamo.
LBP_AI  Ambiguous (alphabetic or ideograph).
LBP_BK  Break (mandatory).
LBP_CB  Contingent break.
LBP_CR  Carriage return.
LBP_LF  Line feed.
LBP_NL  Next line.
LBP_SA  South-East Asian.
LBP_SG  Surrogates.
LBP_SP  Space.
LBP_XX  Unknown.


Function Documentation

utf32_t lb_get_next_char_utf16 ( const utf16_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-16 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-16 surrogate pair.

Parameters:
[in] s input UTF-16 string
[in] len length of the string in words
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered

utf32_t lb_get_next_char_utf32 ( const utf32_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-32 sequence.

The index will be advanced to the next character.

Parameters:
[in] s input UTF-32 string
[in] len length of the string in dwords
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered

utf32_t lb_get_next_char_utf8 ( const utf8_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-8 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-8 sequence.

Parameters:
[in] s input UTF-8 string
[in] len length of the string in bytes
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered

void set_linebreaks ( const void *  s,
size_t  len,
const char *  lang,
char *  brks,
get_next_char_t  get_next_char 
)

Sets the line breaking information for a generic input string.

Parameters:
[in] s input string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
[in] get_next_char function to get the next UTF-32 character


Variable Documentation

struct LineBreakProperties lb_prop_default[]

Default line breaking properties as from the Unicode Web site.

struct LineBreakPropertiesLang lb_prop_lang_map[]

Association data of language-specific line breaking properties with language names.

This is the definition for the static data in this file. If you want more flexibility, or do not need the data here, you may want to redefine lb_prop_lang_map in your C source file.


Generated on Sat May 14 15:01:39 2011 for liblinebreak by  doxygen 1.5.2