 |
|
 |
|
On Fri, 27 Jan 2012 17:39:13 +0100, "Uwe Schindler" <...@thetaphi.de
You are creating a TermScorer on a composite (non atomic IndexReader like
SegmentReader). That's still supported in 3.x, but no longer allowed in 4.0.
The backwards layer in 3.x had a bug before Lucene 3.5, so theoretically
your code should work on 3.5:
https://issues.apache.org/jira/browse/LUCENE-3442
But still: null is a valid return value for scorer()!!! It may return null,
if no document can match this query. Means the term does not exist at all.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uw...@thetaphi.de
stored and
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
 |
|
 |
|
On Fri, 27 Jan 2012 17:46:18 +0100, "Uwe Schindler" <...@thetaphi.de
One addition:
In general, your way how to get a scorer from a query is not supported (and
does not work correct for all queries), the right way is *not* to use
query.createWeight(searcher) but instead
searcher.createNormalizedWeight(query).
But that has nothing to do with the null scorer, which is a valid return
value if the term does not exist and no docs can ever match.
4.0.
your
null, if no
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
 |
|
 |
|
On Tue, 31 Jan 2012 12:29:52 +0400, Michael Kazekin <...@mediainsight.info
Uwe, thank you very much for such verbose answer!
I tried the code you mentioned ( searcher.createNormalizedWeight(query) ),
but it doesn't work on Lucene 3.5 for me either :(
My Solr server returns the document correctly on specified term
(field and value), field is indexed and stored.
I'm really stuck on it, because the API code seems to be simple and
has to behave as expected, index exists, solr returns correct results.
May be you have some thoughts on it, because my knowledge of "inner
Lucene" is not very good.
My code is:
File file = new File(luceneDir);
Preconditions.checkArgument(file.isDirectory(), "Lucene
directory: " + file.getAbsolutePath() + " does not exist or is not a
directory");
Directory directory = FSDirectory.open(file); //index exists,
IndexReader reader = SegmentReader.open(directory, true);
IndexSearcher searcher = new IndexSearcher(reader);
TermQuery termQuery = new TermQuery(new Term("lang", "en"));
Weight weight = searcher.createNormalizedWeight(termQuery);
Scorer scorer = weight.scorer(reader, true, false);
System.out.println("scorer = " + scorer); //outputs "scorer = null"
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
 |
|
 |
|
On Tue, 31 Jan 2012 09:50:30 +0100, "Uwe Schindler" <...@thetaphi.de
Hi,
As this was originally a Solr index, are you sure, that the term is exactly
in *that* spelling (including case) in the index? You should open the index
with the Luke desktop tool and inspect the term index! Solr uses an analyzer
when indexing or searching, so depending on the Solr config, it might be
that the term is "normalized" or changed in any other way inside the index.
TermQuery does not analyze, it looks up the raw term.
Btw: SegmentReader.open(...) is wrong, must be IndexReader.open(), it just
works because SR is a subclass and you can call static methods in
subclasses.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uw...@thetaphi.de
Lucene"
directory");
null"
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
 |
|
 |
|
On Wed, 01 Feb 2012 11:42:08 +0400, Michael Kazekin <...@mediainsight.info
Uwe,
I looked them up in Luke, these fields are present, are named the same, and
have proper values, so the problem seems to be somewhere else :((
But anyway, thanks for your help!
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
 |
|
 |
|
On Fri, 27 Jan 2012 20:21:49 +0000, Hany Azzam <...@eecs.qmul.ac.uk
Hi,
I have two indexes. One that contains all the documents in the collection and the other contains only the relevant documents. I am using Lucene 4.0 and the new SimilariyBase class to build my retrieval models (similarity functions). One of the retrieval models requires statistics to be computed across both of the indexes. How can an IndexSearcher use the two indexes at the same time to compute different components of the retrieval model? Is that possible?
Thank you very much,
Hany
|
|
 |
|
 |
 |
|
 |
|
On Fri, 27 Jan 2012 15:29:10 -0500, Robert Muir <...@gmail.com
On Fri, Jan 27, 2012 at 3:21 PM, Hany Azzam <...@eecs.qmul.ac.uk
you can make a multireader over the two indexreaders, then make an
indexsearcher over that multireader... or are you trying to do
something else?
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
 |
|
 |
|
On Fri, 27 Jan 2012 21:53:51 +0000, Hany Azzam <...@eecs.qmul.ac.uk
Hi Robert,
Thanks for the reply. I am trying to do something different. If I use a mutireader then the searching/scoring will take place over the two indexes at the same time. However, in my case the subcomponents of the retrieval model are calculated over separate evidence spaces. For example, the retrieval model calculates something like that:
score := P(query_term | documents) * P(query_term | relevant_documents)
The P(query_term | documents) can be estimated using the index over the whole collection of documents. The P(query_term | relevant_documents) can be estimated using the index over the relevant documents only (which are known prior to the execution of the query).
The question is can I do such a calculation which uses to separate indexes in one scoring function?
Of course one option is to use the MultiSimilarity Class and combine the score somehow. However, the retrieval function is more complex than that and a simple combination using product or summation won't be feasible.
Any ideas on how to resolve this problem (if possible :))?
Thanks again,
h.
|
|
 |
|
 |
 |
|
 |
|
On Fri, 27 Jan 2012 17:10:04 -0500, Robert Muir <...@gmail.com
On Fri, Jan 27, 2012 at 4:53 PM, Hany Azzam <...@eecs.qmul.ac.uk
In this situation, if you want to combine the statistics from
different indexes in your own way, you can look at
IndexSearcher.termStatistics() and
IndexSearcher.collectionStatistics().
These are intended for situations like distributed search, but maybe
you can make use of them.
here is some pseudocode:
IndexReader relevant = IndexReader.open(relevantDirectory);
IndexReader documents = IndexReader.open(documentsDirectory);
final IndexSearcher relevantSearcher = new IndexSearcher(relevant);
IndexSearcher documentsSearcher = new IndexSearcher(documents) {
@Override
public CollectionStatistics collectionStatistics(String field)
throws IOException {
CollectionStatistics documentStats = super.collectionStatistics(field);
return new CollectionStatistics(...
someCombinationOf(documentStats + stuff from relevantSearcher));
}
// do a similar thing for termStatistics()....
};
documentsSearcher.search(...)
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java...@lucene.apache.org
For additional commands, e-mail: java...@lucene.apache.org
|
|
 |
|
 |
|
|