Copyright ©1995 by NeXT Computer, Inc.  All Rights Reserved.

IXWeightingDomain



Inherits From: Object
Declared In: indexing/IXWeightingDomain.h



Class Description

An IXWeightingDomain represents word count, rank, and frequency information for a body of text.  It can be used to convert word counts between several different formats, and to discover information about specific words, or tokens, in the body of text. An IXWeightingDomain doesn't store the body of text whose statistics it represents, and doesn't maintain any sort of record of what the body of text is.  It is simply a summary of the word frequency information, to be used as needed.

IXAttributeParser uses IXWeightingDomain to compute word peculiarities when parsing text.  The peculiarity of a word in a text sample is its frequency in the sample divided by its frequency in the IXWeightingDomain (in this case called the reference domain), normalized by taking the square root.  The result is a measure of the frequency of the word in the sample relative to the reference domain.  Words that are common in the reference domain receive lesser significance than they would have had, and words that are rare in the reference domain receive greater significance.  The effect is to bias the weights with a filter that reduces domain-specific "noise words."



Instance Variables

unsigned int beenRanked;

unsigned int totalTokens;

unsigned int uniqueTokens;

unsigned int indexCount;

unsigned int totalLength;

void *tokenArray;

unsigned int *tokenIndex;

beenRanked YES if tokens have been ranked.
totalTokens The number of tokens in the sample.
uniqueTokens The number of unique tokens in the sample.
indexCount The number of entries in the token index.
totalLength The total of all the token lengths.
tokenArray Array of tokens with rank and count.
tokenIndex Array of offsets into tokenArray.



Method Types

Initializing instances initFromDomain:
initFromHistogram:
initFromWFTable:
Saving domain information writeDomain:
writeHistogram:
writeWFTable:
Counting tokens totalTokens
uniqueTokens
Retrieving information about tokens
countForToken:ofLength:
rankForToken:ofLength:
frequencyOfToken:ofLength:
peculiarityOfToken:ofLength:andFrequency:



Instance Methods

countForToken:ofLength:
(unsigned int)countForToken:(void *)aToken ofLength:(unsigned int)aLength

Returns the number of times aToken occurs in the body of text represented by the IXWeightingDomain.  aLength must be the length, in bytes, of aToken.

See also:    rankForToken:ofLength:, frequencyOfToken:ofLength:, peculiarityOfToken:ofLength:andFrequency:



frequencyOfToken:ofLength:
(float)frequencyOfToken:(void *)aToken ofLength:(unsigned int)aLength

Returns the frequency of occurrence for aToken  in the body of text represented by the IXWeightingDomain.  aLength must be the length, in bytes, of aToken.  The frequency is equal to the number of times aToken occurs divided by the total number of tokens in the IXWeightingDomain.

See also:  peculiarityOfToken:ofLength:andFrequency:, countForToken:ofLength:, rankForToken:ofLength:



initFromDomain:
initFromDomain:(NXStream *)stream

Initializes a newly allocated IXWeightingDomain from stream, which should contain data in domain format as created by the writeDomain: method.

See also:  initFromHistogram:, initFromWFTable:, writeDomain:



initFromHistogram:
initFromHistogram:(NXStream *)stream

Initializes the IXWeightingDomain from stream, which should contain data in histogram format as created by the writeHistogram: method.

See also:  initFromDomain:, initFromWFTable:, writeHistogram:



initFromWFTable:
initFromWFTable:(NXStream *)stream

Initializes the IXWeightingDomain from stream, which should contain data in the NEXTSTEP Release 2 WFTable format.

See also:  initFromDomain:, initFromHistogram:, writeWFTable:



peculiarityOfToken:ofLength:andFrequency:
(float)peculiarityOfToken:(void *)aToken
ofLength:(unsigned int)aLength
andFrequency:(float)aFrequency

Returns the peculiarity of aToken occurring in some domain with frequency aFrequency, relative to the body of text represented by the reference domain.  aLength must be the length, in bytes, of aToken.  The peculiarity is equal to the square root of aFrequency divided by the frequency of the token within the reference domain.

See also:  frequencyOfToken:ofLength:, countForToken:ofLength:, rankForToken:ofLength:



rankForToken:ofLength:
(unsigned int)rankForToken:(void *)aToken ofLength:(unsigned int)aLength

Returns the rank of aToken in the IXWeightingDomain; the rank is the token's position in an ordering of the set of unique tokens by count.  aLength must be the length, in bytes, of aToken.  The token with the highest count has a rank of 1; the token with the lowest count has a rank equal to the number of unique tokens.

See also:  countForToken:ofLength:, frequencyOfToken:ofLength:, peculiarityOfToken:ofLength:andFrequency:



totalTokens
(unsigned int)totalTokens

Returns the total number of tokens in the IXWeightingDomain; that is, the sum of the number of occurrences each token, over the set of unique tokens.

See also:  uniqueTokens



uniqueTokens
(unsigned int)uniqueTokens

Returns the number of unique tokens in the IXWeightingDomain.

See also:  totalTokens



writeDomain:
writeDomain:(NXStream *)stream

Writes the IXWeightingDomain to stream in domain format.

See also:  writeHistogram:, writeWFTable:, initFromDomain:



writeHistogram:
writeHistogram:(NXStream *)stream

Writes the IXWeightingDomain to stream in histogram format.

See also:  writeDomain:, writeWFTable:, initFromHistogram:



writeWFTable:
writeWFTable:(NXStream *)stream

Writes the IXWeightingDomain to stream in NEXTSTEP Release 2 WFTable format.

See also:  writeDomain:, writeHistogram:, initFromWFTable: