bash - Command line combine files at change in part of name and part of file -
i on aix, bash, , cannot install additional software @ time limited command line batch processing , maybe custom java scripts. so, have ton of xml files in different directories. here subset may like.
root_dir pages pages_1.xml queries queries_1.xml queries_2.xml queries_3.xml
i have put script gets me want, don't know how last piece of puzzle if possible in batch script. create new directory under root, copy of xml files new directory, , rename them remove spaces if there in name, , buffer integer can sorted in alphabetical / numerical order. new output looks this:
copy_dir pages_001.xml queries_001.xml queries_002.xml queries_003.xml
i there. last piece these separate xml files need combined 1 xml file each type, history_001.xml history_099.xml need combined, queries_001.xml queries_099.xml need combined, after specific point in file. have regex files select parts want, need figure out how loop through each file subset. maybe jumped gun , should before moving them, assuming in 1 directory, how can go this?
here example of data. of xml files carry these same types of information.
pages
<?xml version="1.0"?> <project name=""> <rundate></rundate> <object_type code="false" firstitem="1" id="5" items="65" name="pages"> <primary_key>page name</primary_key> <secondary_key>language code</secondary_key> <secondary_key>page field id</secondary_key> <secondary_key>field type</secondary_key> <secondary_key>record (table) name</secondary_key> <secondary_key>field name</secondary_key> <item id="acctg_template_ap"> ... </item> <item id="acctg_template_ar"> ... </item> </object_type> </project>
queries
<?xml version="1.0"?> <project name=""> <rundate></rundate> <object_type code="false" firstitem="1" id="10" items="46" name="queries"> <primary_key>query name</primary_key> <primary_key>user id</primary_key> <item id="1099g_all_short. "> ... </item> <item id="1099g_all_vouchers. "> ... </item> </object_type> </project>
regex pull out header
(?:(?!(^\s*i<item)).)*
regex pull out detail
^(\s*<item id=).*(</item>)
regex pull out footer
^(\s*</object_type).*
so assuming want have counter, loop through each object type xml subset, if first loop pull header , detail , output new summary file, continue other files concat detail, if last file or change new object type output footer well. think possible using bash script?
this spit commands sorting , classification, provide functions/scripts/whatever right thing files first
, middle
, last
, or only
in group. first
, middle
commands have handle empty argument lists, middle
two-element groups , first
groups without 1
-sequenced file.
edit: broke seds out 1 command per line handle seds don't semicolons
run e.g. sh this.sh *_*.*
#!/bin/sh # # spit commands sort, group, , classify argument filenames # sorting number between `_` , `.` in names , # grouping text before _. { # through sort `ls -v` on gnu/anything... f; pfx=${f%%_*} tail=${f#*_} sortable=`printf %s_%03d.%s $pfx ${tail%.*} ${tail##*.}` [ $f != $sortable ] \ && echo mv $f $sortable >&2 echo $sortable done \ | sort \ | sed ' /_0*1\./! h // { x 1! { y/\n/ / p } } $!d x y/\n/ / ' \ | sed ' s/\([^ ]*\)\(.*\) \(.*\)/first \1\nmiddle\2\nlast \3/ t s/^/only / ' } 2>&1
the first of above sed
s accumulates groups of one-per-line words can identified first line. second classifies groups , subs in right commands. they're separate because first sed involves double-pump handle widow group, plus they're hairy enough is.
Comments
Post a Comment