Log in

Sign in with your email address and password:

No account? Sign up here.
Forgot your password? Reset it here.
Didn't receive account confirmation email? Request a new one.


Solr Tutorial Part 1 – Getting Started

In this Solr tutorial I will go through the steps for installing and running Solr 4.9 on an Ubuntu (Server) Linux system. The latest version of Solr (at the time of writing) is 4.9. In this example we will look at performing some simple indexing of blogging websites.

Solr plays a big part in returning relevant search results, and for searching outperforms MySQL and other relational databases.

1. Getting started – Setting up the environment

Get a Linux web server

If you haven’t already got Ubuntu Linux up and running, make sure to install it. I will be using a local LAMP development set up, but any variation of a Linux web server should do. I’d recommend using at least Ubuntu LTS 12.04, whether locally or online.

Get Java

If you haven’t already got Java installed, you’ll need to install Java 1.7 for Solr to run. To check which java version you are using, simply run ‘java -version’ command from the command shell.

2. Install Solr

Now we have the correct Java version running, we need to install Solr. Firstly, find the latest Solr release. Once you’ve decided on a download link, simply use the linux command wget to download it and extract it to a folder of your choice (I chose to put it in the folder above ‘public_html’):

Solr Installation Directory

Solr installation directory structure

There are 5 folders in our Solr installation root directory, let’s take a closer look at their contents, as this will help us further down the track in creating searchable indexes.

  • contrib: Solr contrib modules.These are extensions to Solr. The final JAR file for each of these contrib modules is actually in the dist/ folder – the files in contrib/ are mainly the dependant JAR files. Here are some brief descriptions of the contents of this folder, but note that we’ll visit these later in more detail. – analysis-extras: A few text analysis components that have large dependencies. Some multilingual support (for Chinese and Polish) amongst other things are in here.- clustering: An engine for clustering search results- dataimporthandler: The DataImportHandler (DIH) is a very powerful (and popular) contrib module that imports data into Solr from a database or some other sources.- extraction: Integration with Apache Tika– a framework for  extracting text from common file formats. This module is also called SolrCell and Tika is also used by the DIH’s TikaEntityProcessor.- uima: Integration with Apache UIMA – a framework for extracting metadata out of text. For example, modules in here are able to identify proper names and the language being used, among other things. Very handy.- velocity: Simple Search UI framework based on the Velocity templating language. We’ll get into this later.
  • dist: Solr’s WAR and contrib JAR files. The Solr WAR file is the main file for deploying Solr to a Java web server.It doesn’t contain any contrib JARs. This dist/ folder also contains the core of Solr as a JAR file (used for embedding Solr in another application) and Solr’s test framework (for testing Solr extensions).
  • docs: As the name suggest, Solr documentation. It contains a quick tutorial and Solr’s API (useful).
  • example: A complete Solr server, to be used as an example. It includes the Jetty servlet engine (a Java web server), Solr, some sample data and sample Solr configurations. The sudfolders of concern to us are:- etc: Jetty’s configuration. Here you can configure amongst other things the webport used from 8983 to 80 (HTTP default). We’ll leave it at 8983 for now though …- exampledocs: Sample documents to be indexed into the default Solr configuration. Also the post.jar runnable Java program for sending the documents to Solr.- solr: The default sample Solr configuration. This is a great starting point for all new Solr applications, and we’ll start here too – webapps: Where Jetty expects to deploy Solr from. A copy of Solr’s WAR file is in here.

First things first – back up ‘example’ folder

Before starting any new Solr application, we should always create a backup of the ‘example’ folder. This way we can make fresh copies when needed (or when things go really haywire – touch wood they won’t! – we can always reset to the ogirinal ‘factory default’). For our case, we’ll also start by making a descriptive copy of the example folder to serve as our ‘playground’:

Test the Solr server with a ping

Change directory to  your newly created example folder (in my case, ‘example-blogsites’) and start the Solr server as follows:

(The ampersand at the end is to start the process in a different thread and allows us to still use this same command line for other operations, while the server process is running in the background). Now point your browser to your Solr server’s ping test service:
Solr Ping Test | Twoggle
If the server is running properly, it will respond with an XML response, as shown above. If we get the “OK” status, it indicates we’ve successfully started the Solr server – hooray! By the way, you can also get a JSON response by appending ‘?indent=yes&wt=json’ to the above Go ahead and try it.

The Solr admin access panel is accessible from: http://twdev.localhost:8983/solr  (replacing the ‘twdev.localhost’ with your server’s name of course).

Give yourself a pat on the back – you’ve successfully installed your new Solr search server! In part 2 of this Solr tutorial, we’ll look at the more interesting stuff – how to actually index data and retrieve it.


Comments

    • Gvanto
      #

      Hi devang,

      Sorry it’s taken so long. Still working on the next part of the tutorial. ElasticSearch has now taken up a big part of the market share in this space and I’ve been busy with that. Will try and get to this soon, thanks for your patience.

Submit a Comment