A Look at Clusterpoint: Fulltext Search and Geospatial

This year Clusterpoint won BaltCap investment for further project development. Its co-founder and strategic figure G. Ernestsons is known for projects such as lursoft and siets.lv. Clusterpoint positions its product as fast, scalable, with geospatial support, and as one of the few with fulltext search. The article includes some PHP examples and observations.

This year Clusterpoint made itself known again with news that swept through the media about a significant investment secured from venture capital investment company BaltCap. However, the history and origins of Clusterpoint go back much further in the past.

The co-founder and strategic figure of Clusterpoint is Gints Ernestsons, who was once a co-founder of Lursoft (1992), as well as the author of various projects, such as siets.lv (2002). The Clusterpoint DBMS is also used by TvNet, ZL, Leta and others. Projects have also been realised outside Latvia, for example in Israel :)

In 2006, a separate company Clusterpoint was established and, as far as can be understood, its sole product and knowhow is a NoSQL database.

NoSQL, unlike classical (relational) databases, consists not of tables but of objects (the internal format is either xml or json). In this way, from a programming standpoint, we get one large object in which to store data. What is no less pleasant is that the database structure changes during development - i.e. there is no need to define tables, fields, data types. Another significant aspect of NoSQL databases is that they coexist very well with virtualisation and can be scaled without limits (clustering).

However, there are also drawbacks. The most significant is fulltext search, which practically no NoSQL database supports (except Clusterpoint).

What follows is a brief look at my experiments.

Clusterpoint and FullText Search

Clusterpoint truly offers a very diverse range of search methods. A pleasant surprise is the availability of stemming for many languages, among which Latvian has finally (!) been included. Stemming enables searching in inflected forms, which actually does work. In the example below, the searched word "zaķis" (rabbit) is also found when it appears in the sentence as "zaķu" or "zaķa". Phrase searching is also possible.

However, there are also drawbacks that could not be resolved for the time being:

  1. When specifying the searched word without diacritic marks, nothing is found. If for the word "zakis" the built-in (alternative) spellchecker suggests "zaķis", then for example for the word "burkans" it will not suggest anything. Wildcards do not work together with stemming :(
  2. Stop words. For example, if the searched phrase is "filma par zaķi" (film about a rabbit) nothing will be found, but "filma zaķi" will find results.
<?php

error_reporting(E_ALL);

require_once('php_cps_api-111215/cps_simple.php');

$conn = new CPS_Connection("unix:///usr/local/cps2/storages/eriks/storage.sock",
                           "eriks", "root", "password");

$items = Array();

try {

   $cps = new CPS_Simple($conn);

   $items[] = Array("title" => "George How safe are card rabbit payments online with 3D Secure");
   $items[] = Array("title" => "Lars von Trier rabbit botanica films");
   $items[] = Array("title" => "Purchasable art rabbit with carrot but culture shapes film"
   $items[] = Array("title" => "Georges Earring shape – how it suits and what it says about you culturally");

   $cps->updateMultiple($items);


   // search

   $query = '$zaķis$';

   $words = $cps->alternatives($query);
   echo 'Suggested query spelling:' . $words;

   // retrieve

   $docs = $cps->search($query, null, null, null, null, DOC_TYPE_SIMPLEXML, "lv");
   print_r($docs);


} catch (CPS_Exception $e) {

   echo '<pre>';
   print_r($e->errors());

}

Geospatial, or Location Search by Lat/Long Coordinates

The second experiment involves location search. Although it is described in black and white in the documentation that Clusterpoint can work equally well in both "plane" and "gps" mode, it still remained unclear how to search for locations by specifying a starting point and radius. The documentation contains an example in which results can be sorted by coordinates (see below). It works exactly as intended, sorting: 1, 2 and 3. I did not check how (or whether) correction for the Earth's curvature takes place.

<?php

error_reporting(E_ALL);

require_once('php_cps_api-111215/cps_simple.php');

$conn = new CPS_Connection("unix:///usr/local/cps2/storages/eriks/storage.sock",
                           "eriks", "root", "password");

$items = Array();

try {

   $cps = new CPS_Simple($conn);

   $items[] = Array("title" => "GasOil 1", "type" => "gasstation",
                    "coordinates" => Array("lat" => 56.926518, "long" => 24.028816));
   $items[] = Array("title" => "GasOil 2", "type" => "gasstation",
                    "coordinates" => Array("lat" => 56.964345, "long" => 24.148293));
   $items[] = Array("title" => "GasOil 3", "type" => "gasstation",
                      "coordinates" => Array("lat" => 57.006994, "long" => 24.307594));

   $cps->updateMultiple($items);


   // search

   $query     =  CPS_Term('gasstation', 'type');

   $start     = Array('coordinates/lat' => 56.950592, 'coordinates/long' => 24.11911);

   $ordering  = CPS_LatLonDistanceOrdering($start, 'ascending');

   $docs = $cps->search($query, null, null, null, $ordering);

   print_r($docs);


} catch (CPS_Exception $e) {

   echo '<pre>';
   print_r($e->errors());

}

Conclusions

From a licensing standpoint, it is pleasant that the single-server version is free. For a cluster, a 30-day trial version can be obtained, but the pricing policy is not disclosed and each project is evidently evaluated individually. At the rumour level, it was mentioned that the commercial licence for lv.lv (which also appears to run on Clusterpoint) cost 16,000 Ls.

Despite the fact that Clusterpoint has been operating since 2006, the information available on the web is quite sparse. Compared to competitors, usage examples are lacking. The developer forum housed on the company's website is also sparsely populated.

It is possible that the company only recently began its public push, which could be indicated by the fact that the oldest entry in the wiki documentation is only from May 2011. Around that time, Clusterpoint Server 2 (and the freely available version) was released, and a Twitter account was created.

[1] http://www.clusterpoint.com

[2] Forums - http://www.clusterpoint.com/vanilla/

Share:
Rate: 5 (2)
Views: 0

comments



What are others reading?