.. mysql: SonarSQL: Differences from MySQL ================================ .. contents:: :depth: 2 *SonarSQL* is a full-fledged *MySQL* server, that translates *MySQL* queries to *SonarW* aggregation pipelines. As those two languages are inherently incompatible, some differences exist between the behavior *SonarSQL*/*SonarW* and native *MySQL*. We try to document here the most notable differences. Schema ====== NULL for undefined values ------------------------- In *SonarW* there is no schema for collections. This means that each document may possess different keys. So a key might be defined in one document, and not defined in another within the same collection. There are no keys (columns) that are undefined in *MySQL*, and these values cannot be represented distinctively in results, or queried. When a query is processed by *SonarSQL*, it is possible for a record (document) not to have values for certain columns (keys). In this case, they will be displayed as NULL. Example ~~~~~~~ Consider a ``people`` collection with the following incomplete documents:: { "name": "Joe", phone: "555-555-5555" } { "name": "Jane" } Querying for ``name`` and ``phone`` will produce the following:: MySQL [test]> select name,phone from people; +------+--------------+ | name | phone | +------+--------------+ | Joe | 555-555-5555 | | Jane | NULL | +------+--------------+ NULL masks real values ---------------------- Another aspect of the schema-less nature of *SonarW* databases, is that the same keys might have different types for values. Since *MySQL* requires a specific type for each column in the result, some values may not be represented or displayed. For example, if a key that is known to be mostly [#mostly]_ a number has a string value for some records, we won't be able to display them within *MySQL* result list. Example ~~~~~~~ Consider an ``items`` collection with the following documents:: { "sku": "6532118207", "price": 35 } { "sku": "6532118206", "price": 24 } { "sku": "6532118205", "price": "back ordered" } This collection has mostly numbers for prices, but one of the records contains a string. In this case, the price column will be considered a number, so the string will show up as ``NULL``:: MySQL [test]> select sku,price from items; +------------+-------+ | sku | price | +------------+-------+ | 6532118207 | 35 | | 6532118206 | 24 | | 6532118205 | NULL | +------------+-------+ If it is easy to cast from one type to the other, the value will show up, but otherwise it will be shown as ``NULL``. .. _unwind: Comparisons with NULL values ---------------------------- One of the differences between SonarW and SQL is how NULL is treated. In NoSQL world, NULL is a value and is equivalent to other NULLs. This means that queries like: ``SELECT * FROM table WHERE a = NULL;`` ``SELECT * FROM table WHERE a = b;`` ``SELECT * FROM table WHERE a <= b;`` ``SELECT * FROM table WHERE a >= b;`` ``SELECT * FROM table WHERE a != b;`` Can all return results where a or b are NULL if they fill the condition. If you want to avoid that behaviour you can specify that you do not want NULL: ``SELECT * FROM table WHERE a != b AND a IS NOT NULL and b IS NOT NULL;`` Example ~~~~~~~ Consider an ``storage`` collection with the following documents:: { "item": "I1", "amount_wh1": 1, "amount_wh2": 3 } { "item": "I2", "amount_wh1": 10, "amount_wh2": 0 } { "item": "I3", "amount_wh1": null, "amount_wh2": null } This collections represents how much of each item are in each warehouse. Items that are not carried anymore are detonated with NULL. When I query these Items this is what I get:: MySQL [test]> SELECT item FROM storage WHERE amount_wh1 >= amount_wh2; +------+ | item | +------+ | I2 | | I3 | +------+ MySQL [test]> SELECT item FROM storage WHERE amount_wh2 != 0; +------+ | item | +------+ | I1 | | I3 | +------+ If I want to remove Items from my results that have null I can run the following queries instead:: MySQL [test]> SELECT item FROM storage WHERE amount_wh1 >= amount_wh2 AND amount_wh1 IS NOT NULL; +------+ | item | +------+ | I2 | +------+ MySQL [test]> SELECT item FROM storage WHERE amount_wh2 != 0 AND amount_wh2 IS NOT NULL ; +------+ | item | +------+ | I1 | +------+ Handling Simple Arrays ---------------------- Keys in *SonarW* can contain arrays of documents or values. In *MySQL* there is no such concept, so arrays are often unwound, so that each item gets its own record. Unwind is a *SonarW* operation that takes a field name, and unwinds it so that each document contains only one item from the array. For more information on unwind see https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/ Unwind is an expensive operation, and thus is only applied when the field is explicitly selected for display. If the field is only used for filtering, unwind will not occur. Example ~~~~~~~ Consider an ``apparel`` collection with the following documents:: { "_id": 1, "sku": 6625, "type": "shirt", "sizes": ["S", "M", "L"] } { "_id": 2, "sku": 6626, "type": "shirt", "sizes": ["L"] } { "_id": 3, "sku": 6627, "type": "shirt", "sizes": ["XS", "S", "XL"] } { "_id": 4, "sku": 6628, "type": "shirt", "sizes": ["L", "XL"] } The ``sizes`` field is shown as a ``varchar``, but an ``array`` is specified in the ``Extra`` column of the ``DESCRIBE`` output:: MySQL [test]> describe apparel; +------------+----------------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+----------------+------+------+---------+-------+ | _id | varchar(24) | YES | | | | | lmrm_bson_ | blob | YES | | | | | sizes | varchar(65535) | YES | | | array | | sku | double | YES | | | | | type | varchar(65535) | YES | | | | +------------+----------------+------+------+---------+-------+ When the ``sizes`` field is not selected, no unwind happens:: MySQL [test]> select sku,type from apparel; +------+-------+ | sku | type | +------+-------+ | 6625 | shirt | | 6626 | shirt | | 6627 | shirt | | 6628 | shirt | +------+-------+ But when ``sizes`` is selected for display, it is being unwound:: MySQL [test]> select sku,type,sizes from apparel; +------+-------+-------+ | sku | type | sizes | +------+-------+-------+ | 6625 | shirt | S | | 6625 | shirt | M | | 6625 | shirt | L | | 6626 | shirt | L | | 6627 | shirt | XS | | 6627 | shirt | S | | 6627 | shirt | XL | | 6628 | shirt | L | | 6628 | shirt | XL | +------+-------+-------+ When the ``sizes`` field is used for filtering, but *is not* displayed, there is no need to unwind:: MySQL [test]> select sku from apparel where sizes="S"; +------+ | sku | +------+ | 6625 | | 6627 | +------+ A special case is when all fields are requested using the wild card ``*``. In this case, if the array is a simple one, it is just shown in full as a string under one document:: MySQL [test]> select * from apparel; +------+-------------------+------+-------+ | _id | sizes | sku | type | +------+-------------------+------+-------+ | 1 | ["S", "M", "L"] | 6625 | shirt | | 2 | ["L"] | 6626 | shirt | | 3 | ["XS", "S", "XL"] | 6627 | shirt | | 4 | ["L", "XL"] | 6628 | shirt | +------+-------------------+------+-------+ Handling Arrays of Documents ---------------------------- Complex arrays are handled in a similar fashion to simple arrays, but there are some subtle differences. Example ~~~~~~~ Consider a ``results`` collection with the following documents:: { "_id" : 1, "name" : "Jim", "stats" : [ { "game" : "golf", "wins" : 3 }, { "game" : "tennis", "wins" : 1 }, { "game" : "table tennis", "wins" : 7 } ] } { "_id" : 2, "name" : "John", "stats" : [ { "game" : "golf", "wins" : 0 }, { "game" : "tennis", "wins" : 8 }, { "game" : "table tennis", "wins" : 3 } ] } { "_id" : 3, "name" : "James", "stats" : [ { "game" : "golf", "wins" : 4 }, { "game" : "tennis", "wins" : 1 }, { "game" : "table tennis", "wins" : 2 } ] } The ``stats`` key is an array of subdocuments. When issuing a ``DESCRIBE``, the key is broken down based on the keys in the subdocument:: MySQL [test]> DESCRIBE results; +------------+----------------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+----------------+------+------+---------+-------+ | _id | double | YES | | | | | lmrm_bson_ | blob | YES | | | | | name | varchar(65535) | YES | | | | | stats.game | varchar(65535) | YES | | | | | stats.wins | double | YES | | | | +------------+----------------+------+------+---------+-------+ As before, no unwind occurs if the ``stats`` field is not selected explicitly:: MySQL [test]> select name from results; +-------+ | name | +-------+ | Jim | | John | | James | +-------+ 3 rows in set (0.00 sec) MySQL [test]> select name from results where `stats.wins` >4; +------+ | name | +------+ | Jim | | John | +------+ 2 rows in set (0.00 sec) .. Note:: Qualifying keys in subdocuments requires the use of backticks (\`). Without backtics, *MySQL* will try to refer to a different table named ``stats``. There is no concept of "array of subdocuments" in *MySQL* and thus the underlying values are not displayed when the wildcard is used ``*`` (remember, wildcard never unwinds):: MySQL [test]> select * from results; +------+-------+------------+------------+ | _id | name | stats.game | stats.wins | +------+-------+------------+------------+ | 1 | Jim | NULL | NULL | | 2 | John | NULL | NULL | | 3 | James | NULL | NULL | +------+-------+------------+------------+ When filtering on more than one field in the array of subdocuments (in this example both ``game`` and ``wins``), a match is only considered if all filters match the *same* document within the array. For example, to find out who has more than two wins in golf, we run the following query:: MySQL [test]> select name from results where `stats.wins` > 2 and `stats.game`='golf'; +-------+ | name | +-------+ | Jim | | James | +-------+ 2 rows in set (0.01 sec) John does have an entry for golf, and he does have an entry with more than two wins, but they are for the other games, and thus he is not included in the results. Under the hood this is achieved using ``$elemMatch`` operator. For more information and examples see https://docs.mongodb.org/manual/reference/operator/query/elemMatch/. You can use the ``EXPLAIN`` directive to learn how ``$elemMatch`` is used for your queries. GROUP BY and result order ------------------------- If no ``ORDER BY`` clause is specified, *MySQL* will sort results according to the columns in the ``GROUP BY`` clause. *SonarSQL* will not implicitly sort results in this situation, just as if ``ORDER BY NULL`` were specified. ``UNION`` --------- ALL vs DISTINCT ~~~~~~~~~~~~~~~ In *MySQL*, unions are treated as ``DISTINCT`` by default, and ``DISTINCT`` overrides any ``ALL`` union to its left. In *SonarSQL*, ``DISTINCT`` remains the default behaviour; however, if there is a ``DISTINCT`` union, it will override any ``ALL`` union regardless of position. ``_id`` field ------------- ``UNION`` is handled by the underlying *SonarW* using a temporary collection. This collection will have its own ``_id`` fields. These fields won't make much sense to the user, so they are hidden from all union queries. The effect is that an ``_id`` field included in the ``UNION`` query will either be hidden, or shown as ``NULL`` in the result. For example:: MySQL [test]> select * from boys; +------+-------+ | _id | name | +------+-------+ | 1 | Billy | | 2 | Bob | +------+-------+ 2 rows in set (0.00 sec) MySQL [test]> select * from girls; +------+-------+ | _id | name | +------+-------+ | 1 | Jenny | | 2 | Jane | +------+-------+ 2 rows in set (0.00 sec) MySQL [test]> select * from boys union select * from girls; +------+-------+ | _id | name | +------+-------+ | NULL | Billy | | NULL | Bob | | NULL | Jenny | | NULL | Jane | +------+-------+ 4 rows in set (0.02 sec) Filtering By Dates ------------------ Dates are stored as actual time and date in *Sonar*. When the expression ``DATE('YYYY-MM-DD')`` is used in *MySQL* without time specification, it is translated to that date at midnight. This might lead to surprising results when trying to filter documents by date. Since the expression ``purchased <= DATE('2015-09-22')`` will only be true for the records where ``purchased`` is before midnight on that date, essentially excluding all events from ``2015-09-22``. If you want to include all events from ``2015-09-22``, it is better to use ``purchased < DATE('2015-09-23')`` instead. Using the *MySQL* ``BETWEEN`` construct will automatically overcome this restriction, so expression like ``BETWEEN DATE('2015-09-04') AND DATE('2015-09-08')`` will be inclusive to both dates. Example ~~~~~~~ Consider a ``sales`` collection with the following documents:: { "_id" : 1, "sales" : 63234, "date" : ISODate("2015-09-03T00:00:00Z") } { "_id" : 2, "sales" : 15120, "date" : ISODate("2015-09-04T18:00:00Z") } { "_id" : 3, "sales" : 75235, "date" : ISODate("2015-09-05T18:00:00Z") } { "_id" : 4, "sales" : 12231, "date" : ISODate("2015-09-06T18:00:00Z") } { "_id" : 5, "sales" : 155997, "date" : ISODate("2015-09-07T18:00:00Z") } { "_id" : 6, "sales" : 74214, "date" : ISODate("2015-09-08T18:00:00Z") } Using ``BETWEEN``, we get the expected results:: MySQL [test]> select * from sales where date between date('2015-09-06') and date('2015-09-08'); +------+---------------------+--------+ | _id | date | sales | +------+---------------------+--------+ | 4 | 2015-09-06 18:00:00 | 12231 | | 5 | 2015-09-07 18:00:00 | 155997 | | 6 | 2015-09-08 18:00:00 | 74214 | +------+---------------------+--------+ 3 rows in set (0.00 sec) However, if you use the same dates with a ``>= AND <=`` form, the last day will be excluded:: MySQL [test]> select * from sales where date>=date('2015-09-06') and date<=date('2015-09-08'); +------+---------------------+--------+ | _id | date | sales | +------+---------------------+--------+ | 4 | 2015-09-06 18:00:00 | 12231 | | 5 | 2015-09-07 18:00:00 | 155997 | +------+---------------------+--------+ 2 rows in set (0.00 sec) Using the form recommended above will generate the expected result:: MySQL [test]> select * from sales where date>=date('2015-09-06') and date select sum(price) from prices; +------------+ | sum(price) | +------------+ | 0 | +------------+ Special User Variables ====================== ``maxTimeMS`` ------------- *SonarW* can limit the time used per query. In *SonarSQL* this limitation can be controlled using the ``@maxTimeMS`` user variable. Setting it will have effect for the remainder of the session. New sessions will have the default value (100,000. This can be modified in the configuration file under ``max-time-ms``). .. Note:: This value is given in milliseconds. Example ~~~~~~~ To make sure no query takes more than 15 seconds, issue:: MySQL [test]> set @maxTimeMS=15000; Query OK, 0 rows affected (0.00 sec) The value can be retrieved using a simple ``SELECT``:: MySQL [test]> select @maxTimeMS; +------------+ | @maxTimeMS | +------------+ | 15000 | +------------+ ``sonarwOptimizer`` ------------------- *SonarW* optimizer can be turned off if it gets in the way. Set this user variable value to zero and the optimizer will be turned off for all queries for the remainder of the session. This is done by prepending a ``{$optimizer: 0}`` phase to each pipeline. The default value of this value is 1 (optimizer enabled). Example ~~~~~~~ The default pipeline for a simple query does not indicate any special ``$optimizer`` value, and thus the optimizer is enabled:: MySQL [test]> explain select * from ghosts\G *************************** 1. row *************************** Collection: ghosts Pipeline: db.ghosts.aggregate([ { "$project" : { "*" : 1 } }]) MySQL [test]> select @sonarwOptimizer; +------------------+ | @sonarwOptimizer | +------------------+ | 1 | +------------------+ Now disable the optimizer and see how it affects the pipeline:: MySQL [test]> set @sonarwOptimizer=0; Query OK, 0 rows affected (0.00 sec) MySQL [test]> explain select * from ghosts\G *************************** 1. row *************************** Collection: ghosts Pipeline: db.ghosts.aggregate([ { "$optimizer" : 0 }, { "$project" : { "*" : 1 } }]) 1 row in set (0.01 sec) ``explicitTableOrder`` ---------------------- TBD - This might not be relevant in join v2 implementation. Syntax ====== Letter Case ----------- Field names in *SonarW* are case sensitive. Even though *MySQL* is mostly non-case sensitive, *SonarSQL* abides by the underlying database. When using *SonarSQL*, you must make sure your field names' case match what is in the *SonarW* collection. A simple way to find the correct case is to use ``DESCRIBE`` on the table. Some tables do not really exist in *SonarW* (e.g. ``INFORMATION_SCHEMA.*``). For columns in these tables, letter case is immaterial, but must be consistent within the query. User Management =============== See the installation and configuration document for the definition of users. At this time, the ``SET PASSWORD`` and ``DROP USER`` commands are not supported. These operations need to be performed directly with *SonarW*. System Tables ============= System collections in *SonarW* are visible in the table list (``SHOW TABLES``), but cannot be queried (``table not found``). System collections names begin with ``system.``. In addition, the special metadata collection (``lmrm__metadata2``) cannot be queried. Tested Clients ================= User tools ---------- * MariaDB client * Tableau * MySQL Workbench * Amazon QuickSight * Qlik Libraries --------- * C * libmysqlclient18, v5.5.44 * C++ * mysqlcppcon v1.1.3-5 * Java 1.8 * mysql connector 5.1.23 * mariadb java client v1.1.7 * PHP 5.5.9 * mysql * mysqli * PDO * Python 2.7.6 * mysqldb 1.2.3 * pymysql 0.6.6 * mysql-connector 1.1.6 * SQLAlchemy 0.8.4 .. rubric:: Footnotes .. [#mostly] Defined as the type that has a majority of the values for this key in the collection.