logging - Combining URI and Refer cobinations from Apache with pig -


i start stating sysadmin trade , pig newbie, please gentle.

i attempting use pig parse apache web logs our cdn. 1 application, have 3 distinct call types can gathered uri , 3 different app/version strings (caused inconsistency in app development). need gather them , produce 1 report detailing number of each type of call each app/version.

the call types contain on of following: valid, wms, tile app name in useragent field can following:

app%20name/0.0 cfnetwork/609.1.4 darwin/13.0.0"

android app name 0.0.0 (sch-i605 - android 4.1.2, sdk xx)

app name 0.0.0 (iphone os 6.1.3 - iphone, xxx.xx.xxx.xx.xxxx, xxxxxxxx 0.0)"

this had working before discovered inconsistency of useragent naming. hack @ best, producing needed.

any appreciated.

register file:/home/hadoop/lib/pig/piggybank.jar define logloader org.apache.pig.piggybank.storage.apachelog.combinedlogloader(); define dayextractor org.apache.pig.piggybank.evaluation.util.apachelogparser.dateextractor('yyyy-mm-dd'); define extract org.apache.pig.piggybank.evaluation.string.extract; logs = load '$input' using logloader (remoteaddr, remotelogname, user, time, method, uri, proto, status, bytes, referer,useragent); filtered = filter logs useragent matches '.*mapkit.*' or useragent matches '.*darwin.*' or useragent matches '.*android.*'; darwinonly = foreach filtered generate dayextractor(time) day, uri, bytes, useragent; filtervalid = filter darwinonly uri matches '.*valid.*'; filtertile = filter darwinonly uri matches '.*tile.*'; filterwms = filter darwinonly uri matches '.*wms.*'; validapptime = foreach filtervalid generate day validframeday, extract(useragent, '([^\\s]+)') validframeapp,bytes validbytes; wmsapptime = foreach filterwms generate day wmsday, extract(useragent, '([^\\s]+)') wmsapp,  bytes wmsbytes; tileapptime = foreach filtertile generate day tileday, extract(useragent, '([^\\s]+)') tileapp, bytes tilebytes; groupwms = group wmsapptime ($0,$1); grouptile = group tileapptime ($0,$1); groupvalid = group validapptime ($0,$1); wmsappcount = foreach groupwms generate flatten(group), count($1) wmsnum, sum(wmsapptime.wmsbytes) wmstotalbytes; validappcount = foreach groupvalid generate flatten(group), count($1) validnum, sum(validapptime.validbytes) validtotalbytes; tileappcount = foreach grouptile generate flatten(group), count($1) tilenum, sum(tileapptime.tilebytes) tiletotalbytes:int; y = cogroup validappcount (validframeday,validframeapp), wmsappcount (wmsday,wmsapp), tileappcount (tileday,tileapp); z = foreach y generate group dailyapp, validappcount.validnum, validappcount.validtotalbytes, wmsappcount.wmsnum, wmsappcount.wmstotalbytes, tileappcount.tilenum, tileappcount.tiletotalbytes; store z '$output'; 


Comments

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -