Getting Started with Apache Solr 7: Part 1

What is Solr?

In simple words, it is a search engine. Similar to a relational database system (like MySQL etc.), it can store textual, numeric, spatial or binary data and allows quick search and retrieval. Here’s a list of the equivalent concepts between a database system and Solr:

Database System Solr
Table Collection
Row Document
Column Field

Solr is suitable for searching, filtering and faceting across full text fields and other types of fields and flexible for influencing ranked order of retrieved results based on relevance to the queries. A major difference between Solr and database systems is that Solr is not suitable for join operations across multiple collections, unlike database systems where joins across tables are very common. Database systems practitioners suggest data to be normalized, but it is recommended to have your data as de-normalized as possible in Solr. Solr offers tons of other features like searching across multiple fields at once, spell correction, highlighting, grouping, streaming functions, robust scaling features etc. We shall explore all of those in subsequent posts.

Why this guide?

Since Solr 7, there are newer and cleaner ways to interact with Solr. That makes the starting experience a lot easier than it used to be. The new V2 APIs at /api endpoint offer a cleaner way to access Solr. Also, Solr supports JSON by default. Apart from this, collection creation is much simpler as the user no longer needs to upload a configset before creating a collection.

Since the early days of Solr, there have been various ways to achieve the same objectives in Solr across different versions. Therefore, earlier tutorials and guides offer various different styles of accessing the APIs. The intention of this tutorial, and subsequent ones here, is to present an easy, modern and clean way to interact with Solr.

Running Solr 7

For this tutorial, lets use Docker to start up Solr 7.1, create a collection, index a few documents and perform some search queries. This article assumes no prior knowledge of Docker; just assumes that Docker is already installed. To install Docker, visit https://get.docker.com.

$ docker run -it -p 8983:8983 -p 9983:9983 solr:7.1 /opt/solr/bin/solr -c -f

Output:

2017-12-01 18:42:27.008 INFO (main) [ ] o.e.j.s.Server jetty-9.3.20.v20170531
2017-12-01 18:42:27.601 INFO (main) [ ] o.a.s.s.SolrDispatchFilter Welcome to Apache Solr™ version 7.1.0
2017-12-01 18:42:27.602 INFO (main) [ ] o.a.s.s.SolrDispatchFilter / <strong>| __<em>| |</em> _ Starting in cloud mode on port 8983
2017-12-01 18:42:27.602 INFO (main) [ ] o.a.s.s.SolrDispatchFilter &#95;_ \/ _ \ | '<em>| Install dir: /opt/solr, Default config dir: /opt/solr/server/solr/configsets/_default/conf
2017-12-01 18:42:27.627 INFO (main) [ ] o.a.s.s.SolrDispatchFilter |</em></strong>/&#95;__/<em>|</em>| Start time: 2017-12-01T18:42:27.605Z
2017-12-01 18:42:27.657 INFO (main) [ ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /opt/solr/server/solr
2017-12-01 18:42:27.669 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from /opt/solr/server/solr/solr.xml
2017-12-01 18:42:27.849 INFO (main) [ ] o.a.s.c.SolrResourceLoader [null] Added 0 libs to classloader, from paths: []
2017-12-01 18:42:28.422 INFO (main) [ ] o.a.s.c.SolrZkServerProps Reading configuration from: /opt/solr/server/solr/zoo.cfg
2017-12-01 18:42:28.424 INFO (main) [ ] o.a.s.c.SolrZkServer STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983
2017-12-01 18:42:28.924 INFO (main) [ ] o.a.s.c.ZkContainer Zookeeper client=localhost:9983
2017-12-01 18:42:29.445 INFO (main) [ ] o.a.s.c.Overseer Overseer (id=null) closing
2017-12-01 18:42:29.449 INFO (main) [ ] o.a.s.c.OverseerElectionContext I am going to be the leader 172.17.0.2:8983_solr
2017-12-01 18:42:29.460 INFO (main) [ ] o.a.s.c.Overseer Overseer (id=99100508059860992-172.17.0.2:8983_solr-n_0000000000) starting
2017-12-01 18:42:29.566 INFO (main) [ ] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/172.17.0.2:8983_solr
2017-12-01 18:42:29.592 INFO (zkCallback-3-thread-1-processing-n:172.17.0.2:8983_solr) [ ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (0) -> (1)
2017-12-01 18:42:30.031 INFO (main) [ ] o.a.s.c.CorePropertiesLocator Found 0 core definitions underneath /opt/solr/server/solr
2017-12-01 18:42:30.096 INFO (main) [ ] o.e.j.s.Server Started @3835ms

At this point, Solr is started up and is ready to go (point your browser to http://localhost:8983/solr to view the Solr’s Admin UI). Before we proceed, though, let us understand the various parts of the Docker command used to start Solr. The base command is “run” which is used to instantiate a Docker container within which Solr will be running.

  1. The flags “-it” instruct Docker to run the container interactively (as opposed to running it in the background) and allocating a pseudo TTY to go with it.
  2. The flag “-p” is used to expose a port from within a container and map it to a port opened in the host computer (where Docker is running). Since 8983 is the default Solr port, it is exposed through this mechanism so that we could interact with Solr now. The port 9983 in this example refers to a ZooKeeper port to which other Solr containers can potentially connect to later so as to form a Solr cluster of multiple Solr nodes.
  3. “solr:7.1” refers to the application and version that needs to be started. In this case, the Solr application’s official Docker image will be pulled from the central Docker repositories (called Docker Hub) and the image will be used to start Solr containers.
  4. “/opt/solr/bin/solr -c -f” is the command that starts Solr after the container is started. Inside the container, Solr is installed in the /opt/solr directory and the ./bin/solr script is used to start Solr. The -c parameter to the bin/solr script instructs it to start Solr in “cloud” mode or “SolrCloud” mode. It means that Solr would start up as part of a cluster in a distributed mode. In the cloud mode, an (embedded) instance of ZooKeeper, used for cluster coordination, would be started up alongside the Solr process; other Solr nodes can be made to be part of this SolrCloud cluster by connecting themselves to this ZooKeeper instance. The “-f” parameter instructs the bin/solr script to start Solr in the foreground mode so that the Docker container continues to run and the logs are displayed.

Interacting with Solr: A books collection

From a separate terminal, issue the following commands:

Create a collection:

curl -X POST \
  http://localhost:8983/api/collections \
  -d '{
  "create": {
    "name": "books",
    "numShards": 1
  }
}'

Indexing documents into the collection:

Indexing one document at a time:

curl -X POST \
  -D '{"id":"1", "title":"Hitchhikers Guide to the Galaxy", "author":"Douglas Adams"}' \
  http://localhost:8983/api/collections/books/update?commit=true

curl -X POST \
  -D '{"id":"2", "title":"My Family and Other Animals", "author":"Gerald Durrell"}' \
  http://localhost:8983/api/collections/books/update?commit=true

curl -X POST \
  -D '{"id":"3", "title":"1984", "author":"George Orwell"}' \
  http://localhost:8983/api/collections/books/update?commit=true

curl -X POST \
  -D '{"id":"4", "title":"Lucene in Action", "author":"Erik Hatcher"}' \
  http://localhost:8983/api/collections/books/update?commit=true

Or, batch indexing:
curl -X POST -d \
  '[
    {"id":"1", "title":"Hitchhikers Guide to the Galaxy", "author":"Douglas Adams"},
    {"id":"2", "title":"My Family and Other Animals", "author":"Gerald Durrell"},
    {"id":"3", "title":"1984", "author":"George Orwell"},
    {"id":"4", "title":"Lucene in Action", "author":"Erik Hatcher"}
  ]' \
  "http://localhost:8983/api/collections/books/update?commit=true"

Search queries:

Get all Solr documents (books):

curl "http://localhost:8983/api/collections/books/select?q=*:*"

Search for books titled “lucene”:
curl "http://localhost:8983/api/collections/books/select?q=title:lucene"

Search for books by “orwell”:
curl "http://localhost:8983/api/collections/books/select?q=author:orwell"

Search for books by “douglas adams”
curl 'http://localhost:8983/api/collections/books/select?q=author:"douglas+adams"'

Conclusion

This was just a quick 5 minute introduction. There are various nuances associated with collection creation (like sharding, replication), indexing (schema management etc.) and querying (different query parsers etc.). We shall explore all of those aspects in subsequent parts. Refer to the official reference guide for more details.

One comment

  1. Awesome!

    Suggestions:
    * you don’t need the “-f” since Solr’s docker image is smart enough to know to put it in foreground mode since that is the docker user expectation
    * you don’t need “-it” interactive mode since the server process is not going to need user input.

Leave a Reply to David Smiley Cancel reply

Your email address will not be published. Required fields are marked *