KinoSearch::Docs::Cookbook::CustomQuery - Sample subclass of Query.
Explore KinoSearch's support for custom query types by creating a "PrefixQuery" class to handle trailing wildcards.
my $prefix_query = PrefixQuery->new(
field => 'content',
query_string => 'foo*',
);
my $hits = $searcher->search( query => $prefix_query );
...
To add support for a new query type, we need three classes: a Query, a Compiler, and a Scorer.
The PrefixQuery class on its own isn't enough because a Query object's role is limited to expressing an abstract specification for the search. A Query is basically nothing but metadata; execution is left to the Query's companion Compiler and Scorer.
Here's a simplified sketch illustrating how a Searcher's search() method ties together the three classes.
sub search { my ( $self, $query ) = @_; my $compiler = $query->make_compiler( searchable => $self ); my $scorer = $compiler->make_scorer( reader => $self->get_reader ); my @hits = $scorer->capture_hits; return \@hits; }
Our PrefixQuery class will have two attributes: a query string and a field name.
package PrefixQuery; use base qw( KinoSearch::Search::Query ); use Carp; # Inside-out member vars and hand-rolled accessors. my %query_string; my %field; sub get_query_string { my $self = shift; return $query_string{$$self} } sub get_field { my $self = shift; return $field{$$self} }
PrefixQuery's constructor collects and validates the attributes.
sub new { my ( $class, %args ) = @_; my $query_string = delete $args{query_string}; my $field = delete $args{field}; my $self = $class->SUPER::new(%args); confess("'query_string' param is required") unless defined $query_string; confess("Invalid query_string: '$query_string'") unless $query_string =~ /\*\s*$/; confess("'field' param is required") unless defined $field; $query_string{$$self} = $query_string; $field{$$self} = $field; return $self; }
Since this is an inside-out class, we'll need a destructor:
sub DESTROY { my $self = shift; delete $query_string{$$self}; delete $field{$$self}; $self->SUPER::DESTROY; }
The last thing we'll need is a make_compiler() factory method which kicks out a subclass of Compiler.
sub make_compiler { my $self = shift; return PrefixCompiler->new( @_, parent => $self ); }
PrefixQuery's make_compiler() method will be called internally at search-time by objects which subclass KinoSearch::Search::Searchable -- such as Searchers.
A Searchable is associated with a particular collection of documents. These documents may all reside in one index, as with Searcher, or they may be spread out across multiple indexes on multiple machines, as with KinoSearch::Search::MultiSearcher.
Searchable objects have access to certain statistical information about the collections they represent; for instance, a Searchable can tell you how many documents are in the collection...
my $maximum_number_of_docs_in_collection = $searchable->max_docs;
... or how many documents a specific term appears in:
my $term_appears_in_this_many_docs = $searchable->doc_freq(
field => 'content',
term => 'foo',
);
Such information can be used by sophisticated Compiler implementations to assign more or less heft to individual queries or sub-queries. However, we're not going to bother with weighting for this demo; we'll just assign a fixed score of 1.0 to each matching document.
We don't need to write a constructor, as it will suffice to inherit new() from KinoSearch::Search::Compiler. The only method we need to implement for PrefixCompiler is make_scorer().
package PrefixCompiler; use base qw( KinoSearch::Search::Compiler ); sub make_scorer { my ( $self, $index_reader ) = @_; # Acquire a Lexicon and seek it to our query string. my $substring = $self->get_parent->get_query_string; $substring =~ s/\*.\s*$//; my $field = $self->get_parent->get_field; my $lexicon = $index_reader->lexicon( field => $field ); return unless $lexicon; $lexicon->seek($substring); # Accumulate PostingLists for each matching term. my @posting_lists; while ( defined( my $term = $lexicon->get_term ) ) { last unless $term =~ /^$substring/; my $posting_list = $index_reader->posting_list( field => $field, term => $term, ); if ($posting_list) { push @posting_lists, $posting_list; } last unless $lexicon->next; } return unless @posting_lists; return PrefixScorer->new( posting_lists => \@posting_lists ); }
PrefixCompiler gets access to an IndexReader object when make_scorer() gets called. From the IndexReader we acquire a Lexicon, which is an iterator for a field's unique terms; we scan through the Lexicon's terms, acquiring a PostingList for each term that matches our prefix.
Each of these PostingList objects represents a set of documents which match the query.
The Scorer subclass is the most involved.
package PrefixScorer; use base qw( KinoSearch::Search::Scorer ); # Inside-out member vars. my %doc_nums; my %tally; my %tick; sub new { my ( $class, %args ) = @_; my $posting_lists = delete $args{posting_lists}; my $self = $class->SUPER::new(%args); # Cheesy but simple way of interleaving PostingList doc sets. my %all_doc_nums; for my $posting_list (@$posting_lists) { while ( my $doc_num = $posting_list->next ) { $all_doc_nums{$doc_num} = undef; } } my @doc_nums = sort { $a <=> $b } keys %all_doc_nums; $doc_nums{$$self} = \@doc_nums; $tick{$$self} = -1; $tally{$$self} = KinoSearch::Search::Tally->new; $tally{$$self}->set_score(1.0); # fixed score of 1.0 return $self; } sub DESTROY { my $self = shift; delete $doc_nums{$$self}; delete $tick{$$self}; delete $tally{$$self}; $self->SUPER::DESTROY; }
In addition to the constructor and destructor, there are three methods that must be overridden.
next() advances the Scorer to the next valid matching doc.
sub next { my $self = shift; my $doc_nums = $doc_nums{$$self}; my $tick = ++$tick{$$self}; return 0 if $tick >= scalar @$doc_nums; return $doc_nums->[$tick]; }
get_doc_num() returns the current document number, or 0 if the Scorer is exhausted. (Document numbers start at 1, so 0 is a sentinel.)
sub get_doc_num { my $self = shift; my $tick = $tick{$$self}; my $doc_nums = $doc_nums{$$self}; return $tick < scalar @$doc_nums ? $doc_nums->[$tick] : 0; }
tally() returns an object which isa KinoSearch::Search::Tally and conveys the score of the current match. Since we're content to return a fixed score of 1.0, we just return the same Tally object every time.
sub tally { my $self = shift; return $tally{$$self}; }
To try out PrefixQuery, insert the FlatQueryParser module (which supports PrefixQuery) into the search.cgi sample app, as described in KinoSearch::Docs::Cookbook::CustomQueryParser.
Copyright 2008 Marvin Humphrey
See KinoSearch version 0.20.