NAME

KinoSearch::Docs::Cookbook::CustomQuery - Sample subclass of Query.

ABSTRACT

Explore KinoSearch's support for custom query types by creating a "PrefixQuery" class to handle trailing wildcards.

    my $prefix_query = PrefixQuery->new(
        field        => 'content',
        query_string => 'foo*',
    );
    my $hits = $searcher->search( query => $prefix_query );
    ...

Query, Compiler, and Scorer

To add support for a new query type, we need three classes: a Query, a Compiler, and a Scorer.

  • PrefixQuery - a subclass of KinoSearch::Search::Query, and the only class that client code will deal with directly.
  • PrefixCompiler - a subclass of KinoSearch::Search::Compiler, whose primary role is to compile a PrefixQuery to a PrefixScorer.
  • PrefixScorer - a subclass of KinoSearch::Search::Scorer, which does the heavy lifting: it applies the query to individual documents and calculates a score for each one.

The PrefixQuery class on its own isn't enough because a Query object's role is limited to expressing an abstract specification for the search. A Query is basically nothing but metadata; execution is left to the Query's companion Compiler and Scorer.

Here's a simplified sketch illustrating how a Searcher's search() method ties together the three classes.

    sub search {
        my ( $self, $query ) = @_;
        my $compiler = $query->make_compiler( searchable => $self );
        my $scorer = $compiler->make_scorer( reader => $self->get_reader );
        my @hits = $scorer->capture_hits;
        return \@hits;
    }

PrefixQuery

Our PrefixQuery class will have two attributes: a query string and a field name.

    package PrefixQuery;
    use base qw( KinoSearch::Search::Query );
    use Carp;
    
    # Inside-out member vars and hand-rolled accessors.
    my %query_string;
    my %field;
    sub get_query_string { my $self = shift; return $query_string{$$self} }
    sub get_field        { my $self = shift; return $field{$$self} }

PrefixQuery's constructor collects and validates the attributes.

    sub new {
        my ( $class, %args ) = @_;
        my $query_string = delete $args{query_string};
        my $field        = delete $args{field};
        my $self         = $class->SUPER::new(%args);
        confess("'query_string' param is required")
            unless defined $query_string;
        confess("Invalid query_string: '$query_string'")
            unless $query_string =~ /\*\s*$/;
        confess("'field' param is required")
            unless defined $field;
        $query_string{$$self} = $query_string;
        $field{$$self}        = $field;
        return $self;
    }

Since this is an inside-out class, we'll need a destructor:

    sub DESTROY {
        my $self = shift;
        delete $query_string{$$self};
        delete $field{$$self};
        $self->SUPER::DESTROY;
    }

The last thing we'll need is a make_compiler() factory method which kicks out a subclass of Compiler.

    sub make_compiler {
        my $self = shift;
        return PrefixCompiler->new( @_, parent => $self );
    }

PrefixCompiler

PrefixQuery's make_compiler() method will be called internally at search-time by objects which subclass KinoSearch::Search::Searchable -- such as Searchers.

A Searchable is associated with a particular collection of documents. These documents may all reside in one index, as with Searcher, or they may be spread out across multiple indexes on multiple machines, as with KinoSearch::Search::MultiSearcher.

Searchable objects have access to certain statistical information about the collections they represent; for instance, a Searchable can tell you how many documents are in the collection...

    my $maximum_number_of_docs_in_collection = $searchable->max_docs;

... or how many documents a specific term appears in:

    my $term_appears_in_this_many_docs = $searchable->doc_freq(
        field => 'content',
        term  => 'foo',
    );

Such information can be used by sophisticated Compiler implementations to assign more or less heft to individual queries or sub-queries. However, we're not going to bother with weighting for this demo; we'll just assign a fixed score of 1.0 to each matching document.

We don't need to write a constructor, as it will suffice to inherit new() from KinoSearch::Search::Compiler. The only method we need to implement for PrefixCompiler is make_scorer().

    package PrefixCompiler;
    use base qw( KinoSearch::Search::Compiler );

    sub make_scorer {
        my ( $self, $index_reader ) = @_;
        
        # Acquire a Lexicon and seek it to our query string.
        my $substring = $self->get_parent->get_query_string;
        $substring =~ s/\*.\s*$//;
        my $field = $self->get_parent->get_field;
        my $lexicon = $index_reader->lexicon( field => $field );
        return unless $lexicon;
        $lexicon->seek($substring);
        
        # Accumulate PostingLists for each matching term.
        my @posting_lists;
        while ( defined( my $term = $lexicon->get_term ) ) {
            last unless $term =~ /^$substring/;
            my $posting_list = $index_reader->posting_list(
                field => $field,
                term  => $term,
            );
            if ($posting_list) {
                push @posting_lists, $posting_list;
            }
            last unless $lexicon->next;
        }
        return unless @posting_lists;
        
        return PrefixScorer->new( posting_lists => \@posting_lists );
    }

PrefixCompiler gets access to an IndexReader object when make_scorer() gets called. From the IndexReader we acquire a Lexicon, which is an iterator for a field's unique terms; we scan through the Lexicon's terms, acquiring a PostingList for each term that matches our prefix.

Each of these PostingList objects represents a set of documents which match the query.

PrefixScorer

The Scorer subclass is the most involved.

    package PrefixScorer;
    use base qw( KinoSearch::Search::Scorer );
    
    # Inside-out member vars.
    my %doc_nums;
    my %tally;
    my %tick;
    
    sub new {
        my ( $class, %args ) = @_;
        my $posting_lists = delete $args{posting_lists};
        my $self          = $class->SUPER::new(%args);
        
        # Cheesy but simple way of interleaving PostingList doc sets.
        my %all_doc_nums;
        for my $posting_list (@$posting_lists) {
            while ( my $doc_num = $posting_list->next ) {
                $all_doc_nums{$doc_num} = undef;
            }
        }
        my @doc_nums = sort { $a <=> $b } keys %all_doc_nums;
        $doc_nums{$$self} = \@doc_nums;
        
        $tick{$$self}  = -1;
        $tally{$$self} = KinoSearch::Search::Tally->new;
        $tally{$$self}->set_score(1.0);    # fixed score of 1.0
        
        return $self;
    }

    sub DESTROY {
        my $self = shift;
        delete $doc_nums{$$self};
        delete $tick{$$self};
        delete $tally{$$self};
        $self->SUPER::DESTROY;
    }

In addition to the constructor and destructor, there are three methods that must be overridden.

next() advances the Scorer to the next valid matching doc.

    sub next {
        my $self     = shift;
        my $doc_nums = $doc_nums{$$self};
        my $tick     = ++$tick{$$self};
        return 0 if $tick >= scalar @$doc_nums;
        return $doc_nums->[$tick];
    }

get_doc_num() returns the current document number, or 0 if the Scorer is exhausted. (Document numbers start at 1, so 0 is a sentinel.)

    sub get_doc_num {
        my $self     = shift;
        my $tick     = $tick{$$self};
        my $doc_nums = $doc_nums{$$self};
        return $tick < scalar @$doc_nums ? $doc_nums->[$tick] : 0;
    }

tally() returns an object which isa KinoSearch::Search::Tally and conveys the score of the current match. Since we're content to return a fixed score of 1.0, we just return the same Tally object every time.

    sub tally {
        my $self = shift;
        return $tally{$$self};
    }

Usage

To try out PrefixQuery, insert the FlatQueryParser module (which supports PrefixQuery) into the search.cgi sample app, as described in KinoSearch::Docs::Cookbook::CustomQueryParser.

COPYRIGHT

Copyright 2008 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20.

Copyright © 2004-2008 Marvin Humphrey