Composite columns and "IN" relation in Cassandra -

- February 15, 2010

i have following column family in cassandra storing time series data in small number of "wide" rows:

create table data_bucket (   day_of_year int,   minute_of_day int,   event_id int,   data ascii,   primary key (data_of_year, minute_of_day, event_id) )

on cql shell, able run query such this:

select * data_bucket day_of_year = 266 , minute_of_day = 244    , event_id in (4, 7, 11, 1990, 3433)

essentially, fix value of first component of composite column name (minute_of_day) , want select non-contiguous set of columns based on distinct values of second component (event_id). since "in" relation interpreted equality relation, works fine.

now question is, how accomplish same type of composite column slicing programmatically , without cql. far have tried python client pycassa , java client astyanax, without success.

any thoughts welcome.

edit:

i'm adding describe output of column family seen through cassandra-cli. since looking thrift-based solution, maybe help.

columnfamily: data_bucket   key validation class: org.apache.cassandra.db.marshal.int32type   default column value validator: org.apache.cassandra.db.marshal.asciitype   cells sorted by: org.apache.cassandra.db.marshal.compositetype(org.apache.cassandra.db.marshal.int32type,org.apache.cassandra.db.marshal.int32type)   gc grace seconds: 864000   compaction min/max thresholds: 4/32   read repair chance: 0.1   dc local read repair chance: 0.0   populate io cache on flush: false   replicate on write: true   caching: keys_only   bloom filter fp chance: default   built indexes: []   compaction strategy: org.apache.cassandra.db.compaction.sizetieredcompactionstrategy   compression options:     sstable_compression: org.apache.cassandra.io.compress.snappycompressor

there no "in"-type query in thrift api. perform series of get queries each composite column value (day_of_year, minute_of_day, event_id).

if event_ids sequential (and question says not) perform single get_slice query, passing in range (e.g., day_of_year, minute_of_day, , range of event_ids). grab bunches of them in way , filter response programatically (e.g., grab data on date event ids between 4-3433). more data transfer, more processing on client side not great option unless looking range.

so, if want use "in" cassandra need switch cql-based solution. if considering using cql in python option cassandra-dbapi2. worked me:

import cql  # replace settings appropriate host = 'localhost' port = 9160 keyspace = 'keyspace_name'  # connect connection = cql.connect(host, port, keyspace, cql_version='3.0.1') cursor = connection.cursor() print "connected!"  # execute cql cursor.execute("select * data_bucket day_of_year = 266 , minute_of_day = 244 , event_id in (4, 7, 11, 1990, 3433)") row in cursor:   print str(row) # data  # shut connection cursor.close() connection.close()

(tested cassandra 2.0.1.)

Search This Blog

SSIS

Composite columns and "IN" relation in Cassandra -

Comments

Post a Comment

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -