Composite columns and "IN" relation in Cassandra -
i have following column family in cassandra storing time series data in small number of "wide" rows:
create table data_bucket ( day_of_year int, minute_of_day int, event_id int, data ascii, primary key (data_of_year, minute_of_day, event_id) )
on cql shell, able run query such this:
select * data_bucket day_of_year = 266 , minute_of_day = 244 , event_id in (4, 7, 11, 1990, 3433)
essentially, fix value of first component of composite column name (minute_of_day) , want select non-contiguous set of columns based on distinct values of second component (event_id). since "in" relation interpreted equality relation, works fine.
now question is, how accomplish same type of composite column slicing programmatically , without cql. far have tried python client pycassa , java client astyanax, without success.
any thoughts welcome.
edit:
i'm adding describe output of column family seen through cassandra-cli. since looking thrift-based solution, maybe help.
columnfamily: data_bucket key validation class: org.apache.cassandra.db.marshal.int32type default column value validator: org.apache.cassandra.db.marshal.asciitype cells sorted by: org.apache.cassandra.db.marshal.compositetype(org.apache.cassandra.db.marshal.int32type,org.apache.cassandra.db.marshal.int32type) gc grace seconds: 864000 compaction min/max thresholds: 4/32 read repair chance: 0.1 dc local read repair chance: 0.0 populate io cache on flush: false replicate on write: true caching: keys_only bloom filter fp chance: default built indexes: [] compaction strategy: org.apache.cassandra.db.compaction.sizetieredcompactionstrategy compression options: sstable_compression: org.apache.cassandra.io.compress.snappycompressor
there no "in"-type query in thrift api. perform series of get
queries each composite column value (day_of_year
, minute_of_day
, event_id
).
if event_id
s sequential (and question says not) perform single get_slice
query, passing in range (e.g., day_of_year
, minute_of_day
, , range of event_id
s). grab bunches of them in way , filter response programatically (e.g., grab data on date event ids between 4-3433). more data transfer, more processing on client side not great option unless looking range.
so, if want use "in" cassandra need switch cql-based solution. if considering using cql in python option cassandra-dbapi2. worked me:
import cql # replace settings appropriate host = 'localhost' port = 9160 keyspace = 'keyspace_name' # connect connection = cql.connect(host, port, keyspace, cql_version='3.0.1') cursor = connection.cursor() print "connected!" # execute cql cursor.execute("select * data_bucket day_of_year = 266 , minute_of_day = 244 , event_id in (4, 7, 11, 1990, 3433)") row in cursor: print str(row) # data # shut connection cursor.close() connection.close()
(tested cassandra 2.0.1.)
Comments
Post a Comment