scalability - Develop a distributed Full-Text search Index (AKA Inverted index) -

- April 15, 2014

i know how develop simple inverted index on single machine. in short standard hash table kept in-memory where: - key - word - value - list of word locations example, code here: http://rosettacode.org/wiki/inverted_index#java

question:

now i'm trying make distributed among n nodes , in turn:

make index horizontally scalable
apply automatic sharding index.

i'm interested in automatic sharding. ideas or links welcome!

thanks.

sharding self quite complex task not solved in modern dbs. typical problems in distributed dbs cap theorem, , other low-level , quite challenging tasks rebalancing cluster data after adding new blank node or after naturally-occured imbalance in data.

the best data distribution implemented in db i've seen in cassandra. full text search not yet implemented in cassandra, might consider building distributed index upon it.

some other implemented options elasticsearch , solrcloud. in example given 1 important detail missing word-stemming. word stemming search form of word "sing", "sings", "singer". lucene , 2 previous solutions have implemented majority of languages.

Search This Blog

SSIS

scalability - Develop a distributed Full-Text search Index (AKA Inverted index) -

Comments

Post a Comment

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -