Monday, March 3, 2014

Avro - a simple example

When moving data from one place to another or just storing it, there are loads of options from plain text to specialized, binary formats. Somewhere in the middle are XML, JSON, ProtocolBuffers, Thrift and a newer entry Avro. Avro differs a little from some of these as it is in a binary format like protobufs and Thrift, but unlike these two, it also stores the schema with the file. It is easy to use being very similar to protobufs and Thrift or even XSD derived classes. 

Follow the detail below (or the simple tutorial on the Avro pages):

Download or use dependency management (maven/gradle/etc) to get: avro-1.7.6.jar and avro-tools-1.7.6.jar and Jackson JSON library - specifically, core-asl and mapper-asl jars (those are 1.9.x jar names) or core for v2.x of Jackson. Make sure they're on the build path.

Create a schema (as in example.avsc):
 {"namespace": "example.avro",  
  "type": "record", "name": "MyExample",  
  "fields": [  
    {"name": "title_of_doc", "type": "string"},  
    {"name": "author_name", "type": ["string", "null"]},  
    {"name": "number_pages", "type": ["int", "null"]}  
  ]  
 }  
... and run the avro command line tool to generate the class:
cd .../workspace/avro-example/
java -jar /path/to/avro-tools-1.7.6.jar compile schema example.avsc .
which will create a myExample.java file in example/avro folder

Move the example/avro folder to be under src or move the newly created file to be under src/example.avro package or add the new file to the build path.

Put the schema to use by pulling it in as a class and creating a few instances - note the different constructors.  Then open a writer and filewriter to write out the data, then open a reader and filereader to pull it back in - that should cover the basics!  Note that the reader and writer and their corresponding filereader and filewriter can have differing schemas - in case you have versioning and want to open a file with one schema, but operate on the data with another.

 package example.avro;  
 import java.io.File;  
 import java.io.IOException;  
 import org.apache.avro.file.DataFileReader;  
 import org.apache.avro.file.DataFileWriter;  
 import org.apache.avro.io.DatumReader;  
 import org.apache.avro.io.DatumWriter;  
 import org.apache.avro.specific.SpecificDatumReader;  
 import org.apache.avro.specific.SpecificDatumWriter;  
 public class AvroEx {  
      public static void main(String args[]){  
           MyExample exmplDoc = new MyExample(); //basic constructor, class from the record name in avsc file  
           exmplDoc.setTitleOfDoc("Testing for fun"); //notice it replaced title_of_doc with TitleOfDoc  
           exmplDoc.setNumberPages(123);  
           MyExample exmplDoc2 = new MyExample("Growing Green Software",322,"Mr Green"); //alt constructor  
           MyExample exmplDoc3 = MyExample.newBuilder().setTitleOfDoc("Forget Testing") //using builder requires setting   
                     .setAuthorName("Miss Read").setNumberPages(null) //all fields even if null  
                     .build();  
           //Write out an AVRO file  
           File file = new File("Example-out-in.avro");  
           DatumWriter<MyExample> userDatumW = new SpecificDatumWriter<MyExample>(MyExample.class); //serialize in memory  
           DataFileWriter<MyExample> dataFW = new DataFileWriter<MyExample>(userDatumW); //allow difference schema if necessary  
           try {  
                dataFW.create(exmplDoc.getSchema(), file);//write schema and records to file  
                dataFW.append(exmplDoc);  
                dataFW.append(exmplDoc2);  
                dataFW.append(exmplDoc3);  
                dataFW.close();  
           } catch (IOException e) {  
                e.printStackTrace();  
           }  
           //Read in AVRO data  
           DatumReader<MyExample> userDR = new SpecificDatumReader<MyExample>(MyExample.class);  
           try {  
                DataFileReader<MyExample> dataFR = new DataFileReader<MyExample>(file,userDR); //schema option again  
                MyExample userReadIn = null;   
                while (dataFR.hasNext()){  
                     userReadIn = dataFR.next(userReadIn);  
                     System.out.println(userReadIn);  
                }  
                dataFR.close();  
           } catch (IOException e) {  
                e.printStackTrace();  
           }  
      }  
 }  

Run the AvroEx.java file as an application.  It will create the avro file for writing and reading and print out the data that was written out and read back in.