How does Lucene weigh a term if it appears in the query with different weights (below or above 1)? -
since use algorithm create weighted multi-term queries fire lucene in query-time, happens 1 term appears several times different weights, e.g. [ignore dutch language]
euro^5 geld^5 euro^5
assuming answer lie in either multiplying or adding weights, compared top-10 returned docs each option. results seemed confirm weights multiplied, i.e., query above equal to
euro^25 geld^5
however, using same test in query both weights below 1 , above 1 occured, got different results 3 of following queries:
euro^0.5 geld^5 euro^0.5
euro^0.25 geld^5
euro^1 geld^5
that means either (or both) test results due chance , weights summed nor multiplied; or cannot combine weights below , above 1. can me out? or can find in-depth information lucene's mysterious query-time weight handling ways?
for it's worth: use dutchanalyzer , queryparser , lucene version 4.2.1.
there 3 highly useful resources point to:
tfidfsimilarity
documentation - details default scoring algorithm, in fair detail.indexsearcher.explain
- createsexplanation
object, showing why given document scored given query.explanation.tostring
orexplanation.tohtml
should give full, human readable (sort of) output of how scoring algorithm applied.luke - provides nice interface exploring , debugging lucene index. capable of producing scoring explanation detailed above. believe current official version still lagging behind lucene, @ 4.0, can find home-grown updates 4.2 lying around.
Comments
Post a Comment