linux - Bash, Remove empty XML tags -
linux - Bash, Remove empty XML tags -
i need help couple of questions, using bash tools
i want remove empty xml tags file eg: class="lang-xml prettyprint-override"> <createofficecode> <operatorid>ve</operatorid> <officecode>1234</officecode> <countrycodelength>0</countrycodelength> <areacodelength>3</areacodelength> <attributes></attributes> <chargearea></chargearea> </createofficecode>
to become:
class="lang-xml prettyprint-override"> <createofficecode> <operatorid>ve</operatorid> <officecode>1234</officecode> <countrycodelength>0</countrycodelength> <areacodelength>3</areacodelength> </createofficecode>
for have done command
sed -i '/><\//d' file
which not strict, more trick, more appropriate find <pattern></pattern>
, remove it. suggestion?
<createofficegroup> <createofficename>john</createofficename> <createofficecode> </createofficecode> </createofficegroup>
to:
class="lang-xml prettyprint-override"> <createofficegroup> <createofficename>john</createofficename> </createofficegroup>
as whole thing? from: class="lang-xml prettyprint-override"> <createofficegroup> <createofficename>john</createofficename> <createofficecode> <operatorid>ve</operatorid> <officecode>1234</officecode> <countrycodelength>0</countrycodelength> <areacodelength>3</areacodelength> <attributes></attributes> <chargearea></chargearea> </createofficecode> <createofficesize> <chairs></chairs> <tables></tables> </createofficesize> </createofficegroup>
to:
class="lang-xml prettyprint-override"> <createofficegroup> <createofficename>john</createofficename> <createofficecode> <operatorid>ve</operatorid> <officecode>1234</officecode> <countrycodelength>0</countrycodelength> <areacodelength>3</areacodelength> </createofficecode> </createofficegroup>
can reply questions individuals? give thanks much!
xmlstarlet command-line xml processor. doing want one-line operation (until desired recursive behavior added), , work variants of xml syntax describing same input:
the simple version:
xmlstarlet ed \ -d '//*[not(./*) , (not(./text()) or normalize-space(./text())="")]' \ input.xml
the fancy version:
strip_recursively() { local doc last_doc ifs= read -r -d '' doc while :; last_doc=$doc doc=$(xmlstarlet ed \ -d '//*[not(./*) , (not(./text()) or normalize-space(./text())="")]' \ /dev/stdin <<<"$last_doc") if [[ $doc = "$last_doc" ]]; printf '%s\n' "$doc" homecoming fi done } strip_recursively <input.xml
/dev/stdin
used rather -
(at cost platform portability) improve portability across releases of xmlstarlet; adjust taste.
with scheme having older dependencies installed, more xml parser have installed bundled python.
#!/usr/bin/env python import xml.etree.elementtree etree import sys doc = etree.parse(sys.stdin) def prune(parent): ever_changed = false while true: changed = false el in parent.getchildren(): if len(el.getchildren()) == 0: if ((el.text none or el.text.strip() == '') , (el.tail none or el.tail.strip() == '')): parent.remove(el) changed = true else: changed = changed or prune(el) ever_changed = changed or ever_changed if changed false: homecoming ever_changed prune(doc.getroot()) print etree.tostring(doc.getroot())
xml linux bash sed
Comments
Post a Comment