encoding - Java―read, process and write UTF-8 file -
i trying write reads utf-8 encoded file can have encoding errors, process content , write result output file encoded in utf-8.
my program should modify content (kind of search , replace) , copy rest 1 one. in other words: if term search equals term replace, in- , output-file should equal well.
generally using code:
in = paths.get( <filename1> ); out = paths.get( <filename2> ); files.deleteifexists( out ); files.createfile( out ); charsetdecoder decoder = standardcharsets.utf_8.newdecoder(); decoder.onmalformedinput( codingerroraction.ignore ); decoder.onunmappablecharacter( codingerroraction.ignore ); bufferedreader reader = new bufferedreader( new inputstreamreader( new fileinputstream( this.in.tofile() ), decoder ) ); charsetencoder encoder = standardcharsets.utf_8.newencoder(); encoder.onmalformedinput( codingerroraction.ignore ); encoder.onunmappablecharacter( codingerroraction.ignore ); bufferedwriter writer = new bufferedwriter( new outputstreamwriter( new fileoutputstream( this.out.tofile() ), encoder) ); char[] charbuffer = new char[100]; int readcharcount; stringbuffer buffer = new stringbuffer(); while( ( readcharcount = reader.read( charbuffer ) ) > 0 ) { buffer.append( charbuffer, 0, readcharcount ); //here goes more code process content //buffer must written output on each iteration } writer.write( buffer.tostring() ); reader.close(); writer.close();
but not working. compare files have little junit test fails:
byte[] bytesf1 = files.readallbytes( paths.get( <filename1> ) ); byte[] bytesf2 = files.readallbytes( paths.get( <filename2> ) ); asserttrue( bytesf1.equals( bytesf2 ) );
what doing wrong, or can working?
thanks in ahead, philipp
edit
unless manage make test work after ensuring input file encoded in utf-8, basic error, real point of interest , question is:
does approach above guarantee defects in utf-8 file copied 1 one, or process of loading chars stringbuffer
change this?
java arrays don't implement value-based equals
. fail:
asserttrue( bytesf1.equals( bytesf2 ) );
consider:
assertarrayequals(bytesf1, bytesf2);
or
asserttrue(arrays.equals(bytesf1, bytesf2));
Comments
Post a Comment