encoding - Java―read, process and write UTF-8 file -


i trying write reads utf-8 encoded file can have encoding errors, process content , write result output file encoded in utf-8.

my program should modify content (kind of search , replace) , copy rest 1 one. in other words: if term search equals term replace, in- , output-file should equal well.

generally using code:

in = paths.get( <filename1> ); out = paths.get( <filename2> );  files.deleteifexists( out ); files.createfile( out );  charsetdecoder decoder = standardcharsets.utf_8.newdecoder(); decoder.onmalformedinput( codingerroraction.ignore ); decoder.onunmappablecharacter( codingerroraction.ignore );  bufferedreader reader = new bufferedreader(      new inputstreamreader(         new fileinputstream( this.in.tofile() ), decoder ) );  charsetencoder encoder = standardcharsets.utf_8.newencoder(); encoder.onmalformedinput( codingerroraction.ignore ); encoder.onunmappablecharacter( codingerroraction.ignore );  bufferedwriter writer = new bufferedwriter(      new outputstreamwriter(         new fileoutputstream( this.out.tofile() ), encoder) );  char[] charbuffer = new char[100]; int readcharcount; stringbuffer buffer = new stringbuffer();  while( ( readcharcount = reader.read( charbuffer ) ) > 0 ) {     buffer.append( charbuffer, 0, readcharcount );     //here goes more code process content     //buffer must written output on each iteration }  writer.write( buffer.tostring() ); reader.close(); writer.close(); 

but not working. compare files have little junit test fails:

byte[] bytesf1 = files.readallbytes( paths.get( <filename1> ) ); byte[] bytesf2 = files.readallbytes( paths.get( <filename2> ) ); asserttrue( bytesf1.equals( bytesf2 ) );  

what doing wrong, or can working?

thanks in ahead, philipp

edit

unless manage make test work after ensuring input file encoded in utf-8, basic error, real point of interest , question is:

does approach above guarantee defects in utf-8 file copied 1 one, or process of loading chars stringbuffer change this?

java arrays don't implement value-based equals. fail:

asserttrue( bytesf1.equals( bytesf2 ) );  

consider:

assertarrayequals(bytesf1, bytesf2); 

or

asserttrue(arrays.equals(bytesf1, bytesf2)); 

Comments

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -