SonarSQL: Differences from MySQL

SonarSQL is a full-fledged MySQL server, that translates MySQL queries to SonarW aggregation pipelines.

As those two languages are inherently incompatible, some differences exist between the behavior SonarSQL/SonarW and native MySQL.

We try to document here the most notable differences.

Schema

NULL for undefined values

In SonarW there is no schema for collections. This means that each document may possess different keys. So a key might be defined in one document, and not defined in another within the same collection.

There are no keys (columns) that are undefined in MySQL, and these values cannot be represented distinctively in results, or queried.

When a query is processed by SonarSQL, it is possible for a record (document) not to have values for certain columns (keys). In this case, they will be displayed as NULL.

Example

Consider a people collection with the following incomplete documents:

{ "name": "Joe", phone: "555-555-5555" }
{ "name": "Jane" }

Querying for name and phone will produce the following:

MySQL [test]> select name,phone from people;
+------+--------------+
| name | phone        |
+------+--------------+
| Joe  | 555-555-5555 |
| Jane | NULL         |
+------+--------------+

NULL masks real values

Another aspect of the schema-less nature of SonarW databases, is that the same keys might have different types for values. Since MySQL requires a specific type for each column in the result, some values may not be represented or displayed.

For example, if a key that is known to be mostly [1] a number has a string value for some records, we won’t be able to display them within MySQL result list.

Example

Consider an items collection with the following documents:

{ "sku": "6532118207", "price": 35 }
{ "sku": "6532118206", "price": 24 }
{ "sku": "6532118205", "price": "back ordered" }

This collection has mostly numbers for prices, but one of the records contains a string. In this case, the price column will be considered a number, so the string will show up as NULL:

MySQL [test]> select sku,price from items;
+------------+-------+
| sku        | price |
+------------+-------+
| 6532118207 |    35 |
| 6532118206 |    24 |
| 6532118205 |  NULL |
+------------+-------+

If it is easy to cast from one type to the other, the value will show up, but otherwise it will be shown as NULL.

Comparisons with NULL values

One of the differences between SonarW and SQL is how NULL is treated. In NoSQL world, NULL is a value and is equivalent to other NULLs.

This means that queries like:

SELECT * FROM table WHERE a = NULL; SELECT * FROM table WHERE a = b; SELECT * FROM table WHERE a <= b; SELECT * FROM table WHERE a >= b; SELECT * FROM table WHERE a != b;

Can all return results where a or b are NULL if they fill the condition.

If you want to avoid that behaviour you can specify that you do not want NULL:

SELECT * FROM table WHERE a != b AND a IS NOT NULL and b IS NOT NULL;

Example

Consider an storage collection with the following documents:

{ "item": "I1", "amount_wh1": 1,    "amount_wh2": 3    }
{ "item": "I2", "amount_wh1": 10,   "amount_wh2": 0    }
{ "item": "I3", "amount_wh1": null, "amount_wh2": null }

This collections represents how much of each item are in each warehouse. Items that are not carried anymore are detonated with NULL. When I query these Items this is what I get:

MySQL [test]> SELECT item FROM storage WHERE amount_wh1 >= amount_wh2;
+------+
| item |
+------+
| I2   |
| I3   |
+------+


MySQL [test]> SELECT item FROM storage WHERE amount_wh2 != 0;
+------+
| item |
+------+
| I1   |
| I3   |
+------+

If I want to remove Items from my results that have null I can run the following queries instead:

MySQL [test]> SELECT item FROM storage WHERE amount_wh1 >= amount_wh2 AND amount_wh1 IS NOT NULL;
+------+
| item |
+------+
| I2   |
+------+


MySQL [test]> SELECT item FROM storage WHERE amount_wh2 != 0 AND amount_wh2 IS NOT NULL ;
+------+
| item |
+------+
| I1   |
+------+

Handling Simple Arrays

Keys in SonarW can contain arrays of documents or values. In MySQL there is no such concept, so arrays are often unwound, so that each item gets its own record.

Unwind is a SonarW operation that takes a field name, and unwinds it so that each document contains only one item from the array.

For more information on unwind see https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/

Unwind is an expensive operation, and thus is only applied when the field is explicitly selected for display. If the field is only used for filtering, unwind will not occur.

Example

Consider an apparel collection with the following documents:

{ "_id": 1, "sku": 6625, "type": "shirt", "sizes": ["S", "M", "L"] }
{ "_id": 2, "sku": 6626, "type": "shirt", "sizes": ["L"] }
{ "_id": 3, "sku": 6627, "type": "shirt", "sizes": ["XS", "S", "XL"] }
{ "_id": 4, "sku": 6628, "type": "shirt", "sizes": ["L", "XL"] }

The sizes field is shown as a varchar, but an array is specified in the Extra column of the DESCRIBE output:

MySQL [test]> describe apparel;
+------------+----------------+------+------+---------+-------+
| Field      | Type           | Null | Key  | Default | Extra |
+------------+----------------+------+------+---------+-------+
| _id        | varchar(24)    | YES  |      |         |       |
| lmrm_bson_ | blob           | YES  |      |         |       |
| sizes      | varchar(65535) | YES  |      |         | array |
| sku        | double         | YES  |      |         |       |
| type       | varchar(65535) | YES  |      |         |       |
+------------+----------------+------+------+---------+-------+

When the sizes field is not selected, no unwind happens:

MySQL [test]> select sku,type from apparel;
+------+-------+
| sku  | type  |
+------+-------+
| 6625 | shirt |
| 6626 | shirt |
| 6627 | shirt |
| 6628 | shirt |
+------+-------+

But when sizes is selected for display, it is being unwound:

MySQL [test]> select sku,type,sizes from apparel;
+------+-------+-------+
| sku  | type  | sizes |
+------+-------+-------+
| 6625 | shirt | S     |
| 6625 | shirt | M     |
| 6625 | shirt | L     |
| 6626 | shirt | L     |
| 6627 | shirt | XS    |
| 6627 | shirt | S     |
| 6627 | shirt | XL    |
| 6628 | shirt | L     |
| 6628 | shirt | XL    |
+------+-------+-------+

When the sizes field is used for filtering, but is not displayed, there is no need to unwind:

MySQL [test]> select sku from apparel where sizes="S";
+------+
| sku  |
+------+
| 6625 |
| 6627 |
+------+

A special case is when all fields are requested using the wild card *. In this case, if the array is a simple one, it is just shown in full as a string under one document:

MySQL [test]> select * from apparel;
+------+-------------------+------+-------+
| _id  | sizes             | sku  | type  |
+------+-------------------+------+-------+
| 1    | ["S", "M", "L"]   | 6625 | shirt |
| 2    | ["L"]             | 6626 | shirt |
| 3    | ["XS", "S", "XL"] | 6627 | shirt |
| 4    | ["L", "XL"]       | 6628 | shirt |
+------+-------------------+------+-------+

Handling Arrays of Documents

Complex arrays are handled in a similar fashion to simple arrays, but there are some subtle differences.

Example

Consider a results collection with the following documents:

{
        "_id" : 1,
        "name" : "Jim",
        "stats" : [
                {
                        "game" : "golf",
                        "wins" : 3
                },
                {
                        "game" : "tennis",
                        "wins" : 1
                },
                {
                        "game" : "table tennis",
                        "wins" : 7
                }
        ]
}
{
        "_id" : 2,
        "name" : "John",
        "stats" : [
                {
                        "game" : "golf",
                        "wins" : 0
                },
                {
                        "game" : "tennis",
                        "wins" : 8
                },
                {
                        "game" : "table tennis",
                        "wins" : 3
                }
        ]
}
{
        "_id" : 3,
        "name" : "James",
        "stats" : [
                {
                        "game" : "golf",
                        "wins" : 4
                },
                {
                        "game" : "tennis",
                        "wins" : 1
                },
                {
                        "game" : "table tennis",
                        "wins" : 2
                }
        ]
}

The stats key is an array of subdocuments. When issuing a DESCRIBE, the key is broken down based on the keys in the subdocument:

MySQL [test]> DESCRIBE results;
+------------+----------------+------+------+---------+-------+
| Field      | Type           | Null | Key  | Default | Extra |
+------------+----------------+------+------+---------+-------+
| _id        | double         | YES  |      |         |       |
| lmrm_bson_ | blob           | YES  |      |         |       |
| name       | varchar(65535) | YES  |      |         |       |
| stats.game | varchar(65535) | YES  |      |         |       |
| stats.wins | double         | YES  |      |         |       |
+------------+----------------+------+------+---------+-------+

As before, no unwind occurs if the stats field is not selected explicitly:

MySQL [test]> select name from results;
+-------+
| name  |
+-------+
| Jim   |
| John  |
| James |
+-------+
3 rows in set (0.00 sec)

MySQL [test]> select name from results where `stats.wins` >4;
+------+
| name |
+------+
| Jim  |
| John |
+------+
2 rows in set (0.00 sec)

Note

Qualifying keys in subdocuments requires the use of backticks (`). Without backtics, MySQL will try to refer to a different table named stats.

There is no concept of “array of subdocuments” in MySQL and thus the underlying values are not displayed when the wildcard is used * (remember, wildcard never unwinds):

MySQL [test]> select * from results;
+------+-------+------------+------------+
| _id  | name  | stats.game | stats.wins |
+------+-------+------------+------------+
|    1 | Jim   | NULL       |       NULL |
|    2 | John  | NULL       |       NULL |
|    3 | James | NULL       |       NULL |
+------+-------+------------+------------+

When filtering on more than one field in the array of subdocuments (in this example both game and wins), a match is only considered if all filters match the same document within the array.

For example, to find out who has more than two wins in golf, we run the following query:

MySQL [test]> select name from results where `stats.wins` > 2 and `stats.game`='golf';
+-------+
| name  |
+-------+
| Jim   |
| James |
+-------+
2 rows in set (0.01 sec)

John does have an entry for golf, and he does have an entry with more than two wins, but they are for the other games, and thus he is not included in the results.

Under the hood this is achieved using $elemMatch operator. For more information and examples see https://docs.mongodb.org/manual/reference/operator/query/elemMatch/.

You can use the EXPLAIN directive to learn how $elemMatch is used for your queries.

GROUP BY and result order

If no ORDER BY clause is specified, MySQL will sort results according to the columns in the GROUP BY clause. SonarSQL will not implicitly sort results in this situation, just as if ORDER BY NULL were specified.

UNION

ALL vs DISTINCT

In MySQL, unions are treated as DISTINCT by default, and DISTINCT overrides any ALL union to its left. In SonarSQL, DISTINCT remains the default behaviour; however, if there is a DISTINCT union, it will override any ALL union regardless of position.

_id field

UNION is handled by the underlying SonarW using a temporary collection. This collection will have its own _id fields. These fields won’t make much sense to the user, so they are hidden from all union queries.

The effect is that an _id field included in the UNION query will either be hidden, or shown as NULL in the result. For example:

MySQL [test]> select * from boys;
+------+-------+
| _id  | name  |
+------+-------+
| 1    | Billy |
| 2    | Bob   |
+------+-------+
2 rows in set (0.00 sec)

MySQL [test]> select * from girls;
+------+-------+
| _id  | name  |
+------+-------+
| 1    | Jenny |
| 2    | Jane  |
+------+-------+
2 rows in set (0.00 sec)

MySQL [test]> select * from boys union select * from girls;
+------+-------+
| _id  | name  |
+------+-------+
| NULL | Billy |
| NULL | Bob   |
| NULL | Jenny |
| NULL | Jane  |
+------+-------+
4 rows in set (0.02 sec)

Filtering By Dates

Dates are stored as actual time and date in Sonar. When the expression DATE('YYYY-MM-DD') is used in MySQL without time specification, it is translated to that date at midnight.

This might lead to surprising results when trying to filter documents by date. Since the expression purchased <= DATE('2015-09-22') will only be true for the records where purchased is before midnight on that date, essentially excluding all events from 2015-09-22. If you want to include all events from 2015-09-22, it is better to use purchased < DATE('2015-09-23') instead.

Using the MySQL BETWEEN construct will automatically overcome this restriction, so expression like BETWEEN DATE('2015-09-04') AND DATE('2015-09-08') will be inclusive to both dates.

Example

Consider a sales collection with the following documents:

{ "_id" : 1, "sales" : 63234, "date" : ISODate("2015-09-03T00:00:00Z") }
{ "_id" : 2, "sales" : 15120, "date" : ISODate("2015-09-04T18:00:00Z") }
{ "_id" : 3, "sales" : 75235, "date" : ISODate("2015-09-05T18:00:00Z") }
{ "_id" : 4, "sales" : 12231, "date" : ISODate("2015-09-06T18:00:00Z") }
{ "_id" : 5, "sales" : 155997, "date" : ISODate("2015-09-07T18:00:00Z") }
{ "_id" : 6, "sales" : 74214, "date" : ISODate("2015-09-08T18:00:00Z") }

Using BETWEEN, we get the expected results:

MySQL [test]> select * from sales where date between date('2015-09-06') and date('2015-09-08');
+------+---------------------+--------+
| _id  | date                | sales  |
+------+---------------------+--------+
| 4    | 2015-09-06 18:00:00 |  12231 |
| 5    | 2015-09-07 18:00:00 | 155997 |
| 6    | 2015-09-08 18:00:00 |  74214 |
+------+---------------------+--------+
3 rows in set (0.00 sec)

However, if you use the same dates with a >= AND <= form, the last day will be excluded:

MySQL [test]> select * from sales where date>=date('2015-09-06') and date<=date('2015-09-08');
+------+---------------------+--------+
| _id  | date                | sales  |
+------+---------------------+--------+
| 4    | 2015-09-06 18:00:00 |  12231 |
| 5    | 2015-09-07 18:00:00 | 155997 |
+------+---------------------+--------+
2 rows in set (0.00 sec)

Using the form recommended above will generate the expected result:

MySQL [test]> select * from sales where date>=date('2015-09-06') and date<date('2015-09-09');
+------+---------------------+--------+
| _id  | date                | sales  |
+------+---------------------+--------+
| 4    | 2015-09-06 18:00:00 |  12231 |
| 5    | 2015-09-07 18:00:00 | 155997 |
| 6    | 2015-09-08 18:00:00 |  74214 |
+------+---------------------+--------+
3 rows in set (0.00 sec)

Functions

Date Format Characters

A subset of the MySQL format specifier characters are implemented. Some specifiers that have not been implemented are converted to a related specifier.

Specifier Description
%a converted to %w; day of week, numeric (0=Sunday..6=Saturday)
%b converted to %m; month, numeric (00..12)
%c converted to %m; month, numeric (00..12)
%D converted to %d; day of month (00..31)
%d day of month (00..31)
%e converted to %d; day of month (00..31)
%f converted to returning milliseconds (000..999)
%H hour (00..23)
%h converted to %h; hour (00..23)
%I converted to %h; hour (00..23)
%i minutes (00..59)
%j day of year (001..366)
%k converted to %h; hour (00..23)
%l converted to %h; hour (00..23)
%M converted to %m; month, numeric (00..12)
%m month, numeric (00..12)
%p unsupported, empty string
%r converted to %T; time, 24-hour (hh:mm:ss)
%S seconds (00..59)
%s seconds (00..59)
%T time, 24-hour (hh:mm:ss)
%U week (00..53), where Sunday is the first day of the week
%u converted to %U; week (00..53), where Sunday is the first day of the week
%V converted to %U; week (00..53), where Sunday is the first day of the week
%v converted to %U; week (00..53), where Sunday is the first day of the week
%W converted to %w; day of week, numeric (0=Sunday..6=Saturday)
%w day of week, numeric (0=Sunday..6=Saturday)
%X unsupported, empty string
%x unsupported, empty string
%Y year, four digits
%y converted to %Y; year, four digits

Sum of NULL

In MySQL, the sum of NULL, is NULL as well. In SonarW, the sum of NULL is zero.

Example

Consider a collection prices with missing information:

{ "_id" : 1, "sku" : "6305543", "price" : null }
{ "_id" : 2, "sku" : "6305544", "price" : null }

Calculating the sum of all prices will result in 0:

MySQL [test]> select sum(price) from prices;
+------------+
| sum(price) |
+------------+
|          0 |
+------------+

Special User Variables

maxTimeMS

SonarW can limit the time used per query. In SonarSQL this limitation can be controlled using the @maxTimeMS user variable. Setting it will have effect for the remainder of the session. New sessions will have the default value (100,000. This can be modified in the configuration file under max-time-ms).

Note

This value is given in milliseconds.

Example

To make sure no query takes more than 15 seconds, issue:

MySQL [test]> set @maxTimeMS=15000;
Query OK, 0 rows affected (0.00 sec)

The value can be retrieved using a simple SELECT:

MySQL [test]> select @maxTimeMS;
+------------+
| @maxTimeMS |
+------------+
| 15000      |
+------------+

sonarwOptimizer

SonarW optimizer can be turned off if it gets in the way. Set this user variable value to zero and the optimizer will be turned off for all queries for the remainder of the session.

This is done by prepending a {$optimizer: 0} phase to each pipeline. The default value of this value is 1 (optimizer enabled).

Example

The default pipeline for a simple query does not indicate any special $optimizer value, and thus the optimizer is enabled:

MySQL [test]> explain select * from ghosts\G
*************************** 1. row ***************************
Collection: ghosts
  Pipeline: db.ghosts.aggregate([
{
    "$project" : {
        "*" : 1
    }
}])

MySQL [test]> select @sonarwOptimizer;
+------------------+
| @sonarwOptimizer |
+------------------+
| 1                |
+------------------+

Now disable the optimizer and see how it affects the pipeline:

MySQL [test]> set @sonarwOptimizer=0;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> explain select * from ghosts\G
*************************** 1. row ***************************
Collection: ghosts
  Pipeline: db.ghosts.aggregate([
{
    "$optimizer" : 0
},
{
    "$project" : {
        "*" : 1
    }
}])
1 row in set (0.01 sec)

explicitTableOrder

TBD - This might not be relevant in join v2 implementation.

Syntax

Letter Case

Field names in SonarW are case sensitive. Even though MySQL is mostly non-case sensitive, SonarSQL abides by the underlying database. When using SonarSQL, you must make sure your field names’ case match what is in the SonarW collection.

A simple way to find the correct case is to use DESCRIBE on the table.

Some tables do not really exist in SonarW (e.g. INFORMATION_SCHEMA.*). For columns in these tables, letter case is immaterial, but must be consistent within the query.

User Management

See the installation and configuration document for the definition of users. At this time, the SET PASSWORD and DROP USER commands are not supported. These operations need to be performed directly with SonarW.

System Tables

System collections in SonarW are visible in the table list (SHOW TABLES), but cannot be queried (table not found). System collections names begin with system.. In addition, the special metadata collection (lmrm__metadata2) cannot be queried.

Tested Clients

User tools

  • MariaDB client
  • Tableau
  • MySQL Workbench
  • Amazon QuickSight
  • Qlik

Libraries

  • C
    • libmysqlclient18, v5.5.44
  • C++
    • mysqlcppcon v1.1.3-5
  • Java 1.8
    • mysql connector 5.1.23
    • mariadb java client v1.1.7
  • PHP 5.5.9
    • mysql
    • mysqli
    • PDO
  • Python 2.7.6
    • mysqldb 1.2.3
    • pymysql 0.6.6
    • mysql-connector 1.1.6
    • SQLAlchemy 0.8.4

Footnotes

[1]Defined as the type that has a majority of the values for this key in the collection.