Enterprise Objects Framework Release 1.0 Copyright ©1994 by NeXT Computer, Inc.  All Rights Reserved.








NSCharacterSet Class Cluster



Class Cluster Description

An NSCharacterSet object represents a set of Unicode characters. The NSString and NSScanner classes use NSCharacterSets to group characters together for searching operations, so that they can find any of a particular set of characters during a search. The cluster's two public classes, NSCharacterSet and NSMutableCharacterSet, declare the programmatic interface for static and dynamic character sets, respectively.

The objects you create using these classes are referred to as character set objects (and when no confusion will result, merely as character sets). Because of the nature of class clusters, character set objects are not actual instances of the NSCharacterSet or NSMutableCharacterSet classes but of one of their private subclasses. Although a character set object's class is private, its interface is public, as declared by these abstract superclasses, NSCharacterSet and NSMutableCharacterSet. (See "Class Clusters" in the introduction to the Foundation Kit for more information on class clusters and creating subclasses within a cluster.) The character set classes adopt the NSCopying and NSMutableCopying protocols, making it convenient to convert a character set of one type to the other.



Using a Character Set

Character set objects are value objects, in that they don't perform any tasks. The NSString and NSScanner classes define methods that take NSCharacterSets as arguments so that they can find any of several characters. For example, this code excerpt finds the range of the first uppercase letter in myString:

NSString *myString = @"some text in an NSString...";
NSRange letterRange;

letterRange = [myString rangeOfCharacterFromSet:[NSCharacterSet
uppercaseLetterCharacterSet]];

letterRange.location is equal to the index of the first "N" in "NSString" after rangeOfCharacterFromSet: is invoked. If the first letter of the string were "S" then letterRange.location would be 0.

See the NSScanner class cluster specification for an example using an NSScanner.



Building a Character Set

NSCharacterSet provides methods to quickly create "standard" character sets, such as letters (uppercase or lowercase), decimal digits, whitespace, and so on.  You can use a standard character set as a starting point for building your own custom set by creating an immutable standard set and making a mutable copy of it.  For example, to create a character set containing letters, decimal digits, and basic punctuation, you could use this code:

myCharSet = [[NSCharacterSet alphanumericCharacterSet] mutableCopy];
[myCharSet addCharactersInString:@";:,.";];

You can also start from scratch by using alloc and init to create an empty character set.

If your application frequently uses a custom character set, you'll want to save its definition in a resource file and load that instead of explicitly adding individual characters each time you need to create the set.  You can save a character set by getting its bitmap representation (an NSData object) and saving that object to a file:

NSString *filename = @"/some/file";
NSData *charSetRep = [myCharSet bitmapRepresentation];
[charSetRep writeToFile:filename atomically:YES];

To read a character set file, load it into an NSData object and use characterSetWithBitmapRepresentation::

charSetRep = [NSData dataWithContentsOfFile:filename];
myCharSet = [NSCharacterSet
characterSetWithBitmapRepresentation:charSetRep];


Notes on Unicode Support

The NSCharacterSet classes don't fully support Unicode at this time. Only the low 256 character values, corresponding to the NEXTSTEP character set, are implemented. The definitions of the standard character sets defined by NSCharacterSet will change in the future to include the full set of Unicode characters. String objects created from C strings work properly with character set objects as they're currently implemented, and both will continue to work as NEXTSTEP support for the Unicode character encoding increases.









NSCharacterSet



Inherits From: NSObject
Conforms To: NSCopying
NSMutableCopying
Declared In: foundation/NSCharacterSet.h



Class Description

The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).  NSCharacterSet's two primitive methods--characterIsMember: and bitmapRepresentation--provide the basis for all other instance methods in its interface.  A subclass of NSCharacterSet needs only to override these methods for proper behavior.



Adopted Protocols

NSCopying copyWithZone:
copy
NSMutableCopying mutableCopyWithZone:
mutableCopy



Method Types

Creating a standard character set + alphanumericCharacterSet
+ controlCharacterSet
+ decimalDigitCharacterSet
+ decomposableCharacterSet
+ illegalCharacterSet
+ letterCharacterSet
+ lowercaseLetterCharacterSet
+ nonBaseCharacterSet
+ uppercaseLetterCharacterSet
+ whitespaceCharacterSet
+ whitespaceAndNewlineCharacterSet
Creating a custom character set + characterSetWithRange:
+ characterSetWithCharactersInString:
+ characterSetWithBitmapRepresentation:
Testing set membership characterIsMember:
Inverting a character set invertedSet
Getting a binary representation bitmapRepresentation



Class Methods

alphanumericCharacterSet
+ (NSCharacterSet *)alphanumericCharacterSet

Returns a character set containing the uppercase and lowercase NEXTSTEP alphabetic characters (a-z, A-Z, other alphabetic characters such as é, É, ç, Ç, and so on) and the decimal digit characters (0-9).

See also:  letterCharacterSet, decimalDigitCharacterSet




characterSetWithBitmapRepresentation:

+ (NSCharacterSet *)characterSetWithBitmapRepresentation:(NSData *)data

Returns a character set containing characters determined by the bitmap representation data.  This method is useful for creating a character set object with data from a file or other external data source.

See also:  bitmapRepresentation




characterSetWithRange:

+ (NSCharacterSet *)characterSetWithRange:(NSRange)aRange

Returns a character set containing characters whose Unicode values are given by aRangeaRange.location is the value of the first character, and aRange.location + aRange.length 1 is the value of the last. If aRange.length is 0, an empty character set is returned.

For example, this code excerpt creates a character set object containing the lowercase English alphabetic characters:

NSCharacterSet *lcLetters;

lcLetters = [NSCharacterSet
characterSetWithRange:(NSRange){(unsigned int) 'a', 26}];


characterSetWithCharactersInString:
+ (NSCharacterSet *)characterSetWithCharactersInString:(NSString *)aString

Returns a character set containing the characters in aString. If aString is empty, an empty character set is returned. aString must not be nil.




controlCharacterSet

+ (NSCharacterSet *)controlCharacterSet

Returns a character set containing the control characters (characters with decimal Unicode values 0 to 31 and 127 to 159).




decimalDigitCharacterSet

+ (NSCharacterSet *)decimalDigitCharacterSet

Returns a character set containing only decimal digit characters (0-9).

See also:  alphanumericCharacterSet




decomposableCharacterSet

+ (NSCharacterSet *)decomposableCharacterSet

Returns a character set containing all individual Unicode characters that can also be represented as composed character sequences. Composed character sequences are simply letters with accents for the currently supported subset of Unicode (decimal values 0 through 255). See the NSString class cluster description for a brief introduction to composed character sequences.

See also:  nonBaseCharacterSet




illegalCharacterSet

+ (NSCharacterSet *)illegalCharacterSet

Returns a character set containing the illegal Unicode values.  See The Unicode Standard:  Worldwide Character Encoding for details on illegal Unicode values.




letterCharacterSet

+ (NSCharacterSet *)letterCharacterSet

Returns a character set containing the uppercase and lowercase NEXTSTEP alphabetic characters (a-z, A-Z, other alphabetic characters such as é, É, ç, Ç, and so on).

See also:  alphanumericCharacterSet, lowercaseLetterCharacterSet, uppercaseLetterCharacterSet




lowercaseLetterCharacterSet

+ (NSCharacterSet *)lowercaseLetterCharacterSet

Returns a character set containing only lowercase NEXTSTEP alphabetic characters (a-z, other alphabetic characters such as é, ç, and so on).

See also:  uppercaseLetterCharacterSet, letterCharacterSet




nonBaseCharacterSet

+ (NSCharacterSet *)nonBaseCharacterSet

Returns an empty character set. There are no non-base characters in the subset of Unicode currently supported.

See also:  decomposableCharacterSet




uppercaseLetterCharacterSet

+ (NSCharacterSet *)uppercaseLetterCharacterSet

Returns a character set containing only uppercase NEXTSTEP alphabetic characters (A-Z, other alphabetic characters such as É, Ç, and so on).

See also:  lowercaseLetterCharacterSet, letterCharacterSet




whitespaceAndNewlineCharacterSet

+ (NSCharacterSet *)whitespaceAndNewlineCharacterSet

Returns a character set containing only whitespace characters (space and tab) and the newline character.

See also:  whitespaceCharacterSet




whitespaceCharacterSet

+ (NSCharacterSet *)whitespaceCharacterSet

Returns a character set containing only in-line whitespace characters (space and tab).  This set doesn't contain the newline or carriage return characters.

See also:  whitespaceAndNewlineCharacterSet



Instance Methods

characterIsMember:
(BOOL)characterIsMember:(unichar)aCharacter

Returns YES if aCharacter is in the receiving character set, NO if it isn't.




bitmapRepresentation

(NSData *)bitmapRepresentation

Returns an NSData object encoding the receiving character set in binary format.  This format is suitable for saving to a file or otherwise transmitting or archiving.

A bitmap representation is an byte array of 216 bits (that is, 8192 bytes). The value of the bit at position 2n represents the presence of the character with decimal Unicode value n. To add a character with decimal Unicode value n to a bitmap representation, use a statement such as:

bitmapRep[n >> 3] |= (((unsigned)1) << (n & 7));

To remove that character:

bitmapRep[n >> 3] &= ~(((unsigned)1) << (n & 7));

To test for the presence of that character, use an expression such as:

(bitmapRep[n >> 3] & (((unsigned)1) << (n & 7)))

See also:  + characterSetWithBitmapRepresentation:




invertedSet

(NSCharacterSet *)invertedSet

Returns a character set containing only characters that don't exist in the receiver.  Inverting an immutable character set is much more efficient that inverting a mutable character set.

See also:  invert (NSMutableCharacterSet)









NSMutableCharacterSet



Inherits From: NSCharacterSet : NSObject
Conforms To: NSCopying (NSCharacterSet)
NSMutableCopying (NSCharacterSet)
Declared In: foundation/NSCharacterSet.h



Class Description

The NSMutableCharacterSet class declares the programmatic interface to objects that manage a modifiable set of Unicode characters.  NSMutableCharacterSet defines no primitive methods; subclasses must override all methods declared by this class.



Adopted Protocols

NSCopying copyWithZone:
copy
NSMutableCopying mutableCopyWithZone:
mutableCopy



Method Types

Adding and removing characters addCharactersInRange:
removeCharactersInRange:
addCharactersInString:
removeCharactersInString:
Combining character sets formIntersectionWithCharacterSet:
formUnionWithCharacterSet:
Inverting a character set invert



Instance Methods

addCharactersInRange:
(void)addCharactersInRange:(NSRange)aRange

Adds the characters whose integer values are given by aRange to the receiver.  aRange.location is the value of the first character to add, and aRange.location + aRange.length 1 is the value of the last.  If aRange.length is 0, this method has no effect.

See also:  removeCharactersInRange:, addCharactersInString:




addCharactersInString:

(void)addCharactersInString:(NSString *)aString

Adds the characters in aString to those in the receiver.  If aString is empty, this method has no effect.  aString must not be nil.

See also:  removeCharactersInString:, addCharactersInRange:




formIntersectionWithCharacterSet:

(void)formIntersectionWithCharacterSet:(NSCharacterSet *)otherSet

Modifies the receiver so that it contains only those characters that exist in both the receiver and in otherSet.

See also:  formUnionWithCharacterSet:




formUnionWithCharacterSet:

(void)formUnionWithCharacterSet:(NSCharacterSet *)otherSet

Modifies the receiver so that it contains all characters that exist in either the receiver or otherSet, barring duplicates.

See also:  formIntersectionWithCharacterSet:




invert

(void)invert

Replaces all of the characters in the receiver with all the characters it didn't previously contain. Inverting a mutable character set is much less efficient that inverting an immutable character set.

See also:  invertedSet (NSCharacterSet)




removeCharactersInRange:

(void)removeCharactersInRange:(NSRange)aRange

Removes from the receiver the characters whose integer values are given by aRangeaRange.location is the value of the first character to add, and aRange.location + aRange.length 1 is the value of the last.  If aRange.length is 0, this method has no effect.

See also:  addCharactersInRange:, removeCharactersInString:




removeCharactersInString:

(void)removeCharactersInString:(NSString *)aString

Removes the characters in aString from those in the receiver.  If aString is empty, this method has no effect.  aString must not be nil.

See also:  addCharactersInString:, removeCharactersInRange: