Copyright ©1995 by NeXT Computer, Inc.  All Rights Reserved.

IXLexemeExtraction



Adopted By: no NEXTSTEP classes
Declared In: indexing/IXAttributeReader.h



Protocol Description

IXLexemeExtraction defines methods implemented by readers, which are objects that lexically analyze a stream of text for consumption by a parser, such as an IXAttributeParser.  IXAttributeReader subclasses that conform to this protocol are called custom readers, as they implement this protocol to customize certain aspects of lexical analysis.



Method Types

Lexing a stream getLexeme:inLength:fromStream:
Manipulating a word/lexeme foldCase:inLength:



Instance Methods

foldCase:inLength:
(unsigned int)foldCase:(char *)aString inLength:(unsigned int)aLength

Changes all characters in aString to be lowercase, according to the rules of the language being read.  aLength is the length of the string buffer in which aString resides, not the length of the string, which is null-terminated.  Returns the length of the changed string.



getLexeme:inLength:fromStream:
(unsigned int)getLexeme:(char *)aString
inLength:(unsigned int)aLength
fromStream:(NXStream *)stream

Extracts a lexeme from stream, putting it into aStringaLength is the length of the string buffer into which the receiver may place the lexeme.  This method should return the actual length of the string put into the buffer.

This method may be implemented by subclasses of IXAttributeReader that need more control over lexeme recognition than IXAttributeReader's simple delimiter map strategy can provide.  This includes readers that need to recognize phrases or idioms (like "joie de vivre") and readers that handle text in non-phonetic alphabets or in streams that contain special escape sequences. For example, the IXJapaneseReader class developed by Canon uses this method to override the default lexeme recognition, in order to detect embedded escape sequences that denote shifts among the three different Kanji character encodings.