How does Lucene weigh a term if it appears in the query with different weights (below or above 1)? -

- May 15, 2013

since use algorithm create weighted multi-term queries fire lucene in query-time, happens 1 term appears several times different weights, e.g. [ignore dutch language]

euro^5 geld^5 euro^5

assuming answer lie in either multiplying or adding weights, compared top-10 returned docs each option. results seemed confirm weights multiplied, i.e., query above equal to

euro^25 geld^5

however, using same test in query both weights below 1 , above 1 occured, got different results 3 of following queries:

euro^0.5 geld^5 euro^0.5

euro^0.25 geld^5

euro^1 geld^5

that means either (or both) test results due chance , weights summed nor multiplied; or cannot combine weights below , above 1. can me out? or can find in-depth information lucene's mysterious query-time weight handling ways?

for it's worth: use dutchanalyzer , queryparser , lucene version 4.2.1.

there 3 highly useful resources point to:

tfidfsimilarity documentation - details default scoring algorithm, in fair detail.
indexsearcher.explain - creates explanation object, showing why given document scored given query. explanation.tostring or explanation.tohtml should give full, human readable (sort of) output of how scoring algorithm applied.
luke - provides nice interface exploring , debugging lucene index. capable of producing scoring explanation detailed above. believe current official version still lagging behind lucene, @ 4.0, can find home-grown updates 4.2 lying around.

Search This Blog

SSIS

How does Lucene weigh a term if it appears in the query with different weights (below or above 1)? -

Comments

Post a Comment

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -