Posts Tagged ‘java’


Using Hadoop to Create SOLR Indexes

September 26, 2010

One of the most challenging projects I faced at work recently was to create a Apache SOLR index consisting of approx 15 million records. This index had been created once in the history of the company using a MySQL database and SOLR’s Data Import Handler (DIH). It had not been attempted since then because the original indexing process was time consuming (12-14 hours), required human supervision, and on failure had to be restarted from the very beginning.

For small data sets (say, less than 100,000 records) SOLR’s DIH and MySQL works fine. However, for these larger sets it’s just too much of a drain on resources.  Some members of our team and the architecture team had had success working with large data sets by leveraging the Apache Hadoop project. One of the most attractive aspect of Hadoop is that the processing is distributed which should reduce the total time to index. Also Hadoop has a robust fail-over system which would remove the need for human supervision. We architected a data pipeline by which data would be processed by modules. When one module completed its task it would alert the system and the next module would begin work on the the output of the previous module. The SOLR indexing is one module.

Read the rest of this entry ?


r cannot be resolved

January 2, 2010

In the Google group Android Beginners I frequently see messages that ask what the error “r cannot be resolved” means in Eclipse.

Read the rest of this entry ?


OutOfMemoryException in Maven

August 19, 2008

I came across a situation today where Maven threw an OutOfMemoryException. I didn’t think the process would have taken that much memory but it clearly did. I was trying to deploy. I then tried it again but skipped the tests. No good. I found out that by setting the MAVEN_OPTS environment variable to something like -Xmx512m I was good to go.