KinoSearch::Search::Compiler - Query-to-Scorer compiler.
# (Compiler is an abstract base class.)
package MyCompiler;
use base qw( KinoSearch::Search::Compiler );
sub make_scorer {
my ( $self, $reader ) = @_;
return MyScorer->new( compiler => $self, reader => $reader );
}
1;
The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Scorer object that can do real work.
The simplest Compiler subclasses -- such as those associated with constant-scoring Query types -- might simply implement a make_scorer() method which passes along verbatim information from the Query to the Scorer's constructor.
However it is common for the Compiler to perform some calculations which affect it's "weight" -- a floating point multiplier that the Scorer will factor into each document's score. If that is the case, then the Compiler subclass may wish to override get_weight(), sum_of_squared_weights(), and apply_norm_factor().
Compiling a Scorer is a two stage process.
The first stage takes place during the Compiler's constructor, which is where the Query object meets a Searchable object for the first time. Searchables operate on a specific document collection and they can tell you certain statistical information about the collection -- such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. KinoSearch's core Compiler classes plug this information into the classic TF/IDF weighting algorithm to adjust the Compiler's weight; custom subclasses might do something similar.
The second stage of compilation is the make_scorer() factory method, which is where the Compiler meets an IndexReader object. IndexReaders, which are typically associated with a single index on a single machine, are lower-level than Searchables, which may represent a document collection spread out over a search cluster (comprising several indexes/IndexReaders). The Compiler object can use new information supplied by the IndexReader -- such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searchable -- when figuring out what to feed to the Scorer's constructor, or whether make_scorer() should return a Scorer at all.
my $compiler = MyCompiler->SUPER::new(
parent => $my_query,
searchable => $searcher,
similarity => $schema->fetch_sim( $my_query->get_field ),
boost => undef,
);
Abstract constructor. Must be inherited, as an error will be thrown if the class name matches the package name.
Factory method returning a Scorer. May return undef if the Scorer would have matched no documents.
Return the Compiler's numerical weight, a scoring multiplier. By default, returns the object's boost.
Compute and return a raw weighting factor. (This quantity is used by normalize()). By default, simply returns 1.0.
Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children.
The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor.
Take a newly minted Compiler object and apply query-specific normalization factors. Should be called at or near the end of construction.
For a TermQuery, the scoring formula is approximately:
( tf_d * idf_t / norm_d ) * ( tf_q * idf_t / norm_q )
normalize() is theoretically concerned with applying the second half of that formula to a the Compiler's weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented.
Accessor for the Compiler's parent Query object.
Accessor for the Compiler's Similarity object.
Return a list of Span objects, indicating where in the given field the text that matches the parent query occurs. In this case, the span's offset and length are measured in Unicode code points. The base class's method returns an empty list.
KinoSearch::Search::Compiler isa KinoSearch::Search::Query isa KinoSearch::Obj.
Copyright 2005-2008 Marvin Humphrey
See KinoSearch version 0.20.