bash - Command line combine files at change in part of name and part of file -

- September 15, 2010

i on aix, bash, , cannot install additional software @ time limited command line batch processing , maybe custom java scripts. so, have ton of xml files in different directories. here subset may like.

root_dir    pages       pages_1.xml    queries       queries_1.xml       queries_2.xml       queries_3.xml

i have put script gets me want, don't know how last piece of puzzle if possible in batch script. create new directory under root, copy of xml files new directory, , rename them remove spaces if there in name, , buffer integer can sorted in alphabetical / numerical order. new output looks this:

copy_dir     pages_001.xml     queries_001.xml     queries_002.xml     queries_003.xml

i there. last piece these separate xml files need combined 1 xml file each type, history_001.xml history_099.xml need combined, queries_001.xml queries_099.xml need combined, after specific point in file. have regex files select parts want, need figure out how loop through each file subset. maybe jumped gun , should before moving them, assuming in 1 directory, how can go this?

here example of data. of xml files carry these same types of information.

pages

<?xml version="1.0"?> <project name="">   <rundate></rundate>   <object_type code="false" firstitem="1" id="5" items="65" name="pages">     <primary_key>page name</primary_key>     <secondary_key>language code</secondary_key>     <secondary_key>page field id</secondary_key>     <secondary_key>field type</secondary_key>     <secondary_key>record (table) name</secondary_key>     <secondary_key>field name</secondary_key>     <item id="acctg_template_ap">       ...     </item>     <item id="acctg_template_ar">       ...     </item>   </object_type> </project>

queries

<?xml version="1.0"?> <project name="">   <rundate></rundate>   <object_type code="false" firstitem="1" id="10" items="46" name="queries">     <primary_key>query name</primary_key>     <primary_key>user id</primary_key>     <item id="1099g_all_short. ">       ...     </item>     <item id="1099g_all_vouchers. ">       ...     </item>   </object_type> </project>

regex pull out header

(?:(?!(^\s*i<item)).)*

regex pull out detail

^(\s*<item id=).*(</item>)

regex pull out footer

^(\s*</object_type).*

so assuming want have counter, loop through each object type xml subset, if first loop pull header , detail , output new summary file, continue other files concat detail, if last file or change new object type output footer well. think possible using bash script?

this spit commands sorting , classification, provide functions/scripts/whatever right thing files first, middle, last, or only in group. first , middle commands have handle empty argument lists, middle two-element groups , first groups without 1-sequenced file.

edit: broke seds out 1 command per line handle seds don't semicolons

run e.g. sh this.sh *_*.*

#!/bin/sh # # spit commands sort, group, , classify argument filenames  # sorting number between `_` , `.` in names ,  # grouping text before _. { # through sort `ls -v` on gnu/anything... f;     pfx=${f%%_*}     tail=${f#*_}     sortable=`printf %s_%03d.%s $pfx ${tail%.*} ${tail##*.}`     [ $f != $sortable ] \       && echo  mv $f $sortable >&2     echo $sortable done \ | sort \ | sed '     /_0*1\./! h     // {        x        1! {           y/\n/ /           p        }     }     $!d     x     y/\n/ / ' \ | sed '     s/\([^ ]*\)\(.*\) \(.*\)/first \1\nmiddle\2\nlast \3/     t     s/^/only / ' } 2>&1

the first of above seds accumulates groups of one-per-line words can identified first line. second classifies groups , subs in right commands. they're separate because first sed involves double-pump handle widow group, plus they're hairy enough is.

Search This Blog

SSIS

bash - Command line combine files at change in part of name and part of file -

Comments

Post a Comment

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -