.. mysql:

SonarSQL: Differences from MySQL
================================

.. contents::
   :depth: 2

*SonarSQL* is a full-fledged *MySQL* server, that translates *MySQL* queries to
*SonarW* aggregation pipelines.

As those two languages are inherently incompatible, some differences exist
between the behavior *SonarSQL*/*SonarW* and native *MySQL*.

We try to document here the most notable differences.

Schema
======

NULL for undefined values
-------------------------

In *SonarW* there is no schema for collections. This means that each document
may possess different keys. So a key might be defined in one document, and not
defined in another within the same collection.

There are no keys (columns) that are undefined in *MySQL*, and these values
cannot be represented distinctively in results, or queried.

When a query is processed by *SonarSQL*, it is possible for a record
(document) not to have values for certain columns (keys). In this case, they
will be displayed as NULL.

Example
~~~~~~~

Consider a ``people`` collection with the following incomplete documents::

  { "name": "Joe", phone: "555-555-5555" }
  { "name": "Jane" }

Querying for ``name`` and ``phone`` will produce the following::

  MySQL [test]> select name,phone from people;
  +------+--------------+
  | name | phone        |
  +------+--------------+
  | Joe  | 555-555-5555 |
  | Jane | NULL         |
  +------+--------------+

NULL masks real values
----------------------

Another aspect of the schema-less nature of *SonarW* databases, is that the
same keys might have different types for values. Since *MySQL* requires a
specific type for each column in the result, some values may not be
represented or displayed.

For example, if a key that is known to be mostly [#mostly]_ a number has a
string value for some records, we won't be able to display them within *MySQL*
result list.

Example
~~~~~~~

Consider an ``items`` collection with the following documents::

  { "sku": "6532118207", "price": 35 }
  { "sku": "6532118206", "price": 24 }
  { "sku": "6532118205", "price": "back ordered" }

This collection has mostly numbers for prices, but one of the records contains
a string. In this case, the price column will be considered a number, so the
string will show up as ``NULL``::

  MySQL [test]> select sku,price from items;
  +------------+-------+
  | sku        | price |
  +------------+-------+
  | 6532118207 |    35 |
  | 6532118206 |    24 |
  | 6532118205 |  NULL |
  +------------+-------+

If it is easy to cast from one type to the other, the value will show up, but
otherwise it will be shown as ``NULL``.

.. _unwind:

Comparisons with NULL values
----------------------------

One of the differences between SonarW and SQL is how NULL is treated.
In NoSQL world, NULL is a value and is equivalent to other NULLs.

This means that queries like:

``SELECT * FROM table WHERE a = NULL;``
``SELECT * FROM table WHERE a = b;``
``SELECT * FROM table WHERE a <= b;``
``SELECT * FROM table WHERE a >= b;``
``SELECT * FROM table WHERE a != b;``

Can all return results where a or b are NULL if they fill the condition.

If you want to avoid that behaviour you can specify that you do not want NULL:

``SELECT * FROM table WHERE a != b AND a IS NOT NULL and b IS NOT NULL;``


Example
~~~~~~~

Consider an ``storage`` collection with the following documents::

  { "item": "I1", "amount_wh1": 1,    "amount_wh2": 3    }
  { "item": "I2", "amount_wh1": 10,   "amount_wh2": 0    }
  { "item": "I3", "amount_wh1": null, "amount_wh2": null }

This collections represents how much of each item are in each warehouse.
Items that are not carried anymore are detonated with NULL.
When I query these Items this is what I get::

  MySQL [test]> SELECT item FROM storage WHERE amount_wh1 >= amount_wh2;
  +------+
  | item |
  +------+
  | I2   |
  | I3   |
  +------+


  MySQL [test]> SELECT item FROM storage WHERE amount_wh2 != 0;
  +------+
  | item |
  +------+
  | I1   |
  | I3   |
  +------+


If I want to remove Items from my results that have null I can run the following queries instead::

  MySQL [test]> SELECT item FROM storage WHERE amount_wh1 >= amount_wh2 AND amount_wh1 IS NOT NULL;
  +------+
  | item |
  +------+
  | I2   |
  +------+


  MySQL [test]> SELECT item FROM storage WHERE amount_wh2 != 0 AND amount_wh2 IS NOT NULL ;
  +------+
  | item |
  +------+
  | I1   |
  +------+

Handling Simple Arrays
----------------------

Keys in *SonarW* can contain arrays of documents or values. In *MySQL* there
is no such concept, so arrays are often unwound, so that each item gets its
own record.

Unwind is a *SonarW* operation that takes a field name, and unwinds it so that
each document contains only one item from the array.

For more information on unwind see
https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/

Unwind is an expensive operation, and thus is only applied when the field is
explicitly selected for display. If the field is only used for filtering,
unwind will not occur.


Example
~~~~~~~

Consider an ``apparel`` collection with the following documents::

  { "_id": 1, "sku": 6625, "type": "shirt", "sizes": ["S", "M", "L"] }
  { "_id": 2, "sku": 6626, "type": "shirt", "sizes": ["L"] }
  { "_id": 3, "sku": 6627, "type": "shirt", "sizes": ["XS", "S", "XL"] }
  { "_id": 4, "sku": 6628, "type": "shirt", "sizes": ["L", "XL"] }

The ``sizes`` field is shown as a ``varchar``, but an ``array`` is specified
in the ``Extra`` column of the ``DESCRIBE`` output::

  MySQL [test]> describe apparel;
  +------------+----------------+------+------+---------+-------+
  | Field      | Type           | Null | Key  | Default | Extra |
  +------------+----------------+------+------+---------+-------+
  | _id        | varchar(24)    | YES  |      |         |       |
  | lmrm_bson_ | blob           | YES  |      |         |       |
  | sizes      | varchar(65535) | YES  |      |         | array |
  | sku        | double         | YES  |      |         |       |
  | type       | varchar(65535) | YES  |      |         |       |
  +------------+----------------+------+------+---------+-------+

When the ``sizes`` field is not selected, no unwind happens::

  MySQL [test]> select sku,type from apparel;
  +------+-------+
  | sku  | type  |
  +------+-------+
  | 6625 | shirt |
  | 6626 | shirt |
  | 6627 | shirt |
  | 6628 | shirt |
  +------+-------+

But when ``sizes`` is selected for display, it is being unwound::

  MySQL [test]> select sku,type,sizes from apparel;
  +------+-------+-------+
  | sku  | type  | sizes |
  +------+-------+-------+
  | 6625 | shirt | S     |
  | 6625 | shirt | M     |
  | 6625 | shirt | L     |
  | 6626 | shirt | L     |
  | 6627 | shirt | XS    |
  | 6627 | shirt | S     |
  | 6627 | shirt | XL    |
  | 6628 | shirt | L     |
  | 6628 | shirt | XL    |
  +------+-------+-------+

When the ``sizes`` field is used for filtering, but *is not* displayed, there
is no need to unwind::

  MySQL [test]> select sku from apparel where sizes="S";
  +------+
  | sku  |
  +------+
  | 6625 |
  | 6627 |
  +------+

A special case is when all fields are requested using the wild card ``*``. In
this case, if the array is a simple one, it is just shown in full as a string
under one document::

  MySQL [test]> select * from apparel;
  +------+-------------------+------+-------+
  | _id  | sizes             | sku  | type  |
  +------+-------------------+------+-------+
  | 1    | ["S", "M", "L"]   | 6625 | shirt |
  | 2    | ["L"]             | 6626 | shirt |
  | 3    | ["XS", "S", "XL"] | 6627 | shirt |
  | 4    | ["L", "XL"]       | 6628 | shirt |
  +------+-------------------+------+-------+

Handling Arrays of Documents
----------------------------

Complex arrays are handled in a similar fashion to simple arrays, but there
are some subtle differences.

Example
~~~~~~~

Consider a ``results`` collection with the following documents::

  {
	  "_id" : 1,
	  "name" : "Jim",
	  "stats" : [
		  {
			  "game" : "golf",
			  "wins" : 3
		  },
		  {
			  "game" : "tennis",
			  "wins" : 1
		  },
		  {
			  "game" : "table tennis",
			  "wins" : 7
		  }
	  ]
  }
  {
	  "_id" : 2,
	  "name" : "John",
	  "stats" : [
		  {
			  "game" : "golf",
			  "wins" : 0
		  },
		  {
			  "game" : "tennis",
			  "wins" : 8
		  },
		  {
			  "game" : "table tennis",
			  "wins" : 3
		  }
	  ]
  }
  {
	  "_id" : 3,
	  "name" : "James",
	  "stats" : [
		  {
			  "game" : "golf",
			  "wins" : 4
		  },
		  {
			  "game" : "tennis",
			  "wins" : 1
		  },
		  {
			  "game" : "table tennis",
			  "wins" : 2
		  }
	  ]
  }


The ``stats`` key is an array of subdocuments. When issuing a ``DESCRIBE``,
the key is broken down based on the keys in the subdocument::

  MySQL [test]> DESCRIBE results;
  +------------+----------------+------+------+---------+-------+
  | Field      | Type           | Null | Key  | Default | Extra |
  +------------+----------------+------+------+---------+-------+
  | _id        | double         | YES  |      |         |       |
  | lmrm_bson_ | blob           | YES  |      |         |       |
  | name       | varchar(65535) | YES  |      |         |       |
  | stats.game | varchar(65535) | YES  |      |         |       |
  | stats.wins | double         | YES  |      |         |       |
  +------------+----------------+------+------+---------+-------+

As before, no unwind occurs if the ``stats`` field is not selected explicitly::

  MySQL [test]> select name from results;
  +-------+
  | name  |
  +-------+
  | Jim   |
  | John  |
  | James |
  +-------+
  3 rows in set (0.00 sec)

  MySQL [test]> select name from results where `stats.wins` >4;
  +------+
  | name |
  +------+
  | Jim  |
  | John |
  +------+
  2 rows in set (0.00 sec)

.. Note:: Qualifying keys in subdocuments requires the use of
          backticks (\`). Without backtics, *MySQL* will try to refer to a
          different table named ``stats``.

There is no concept of "array of subdocuments" in *MySQL* and thus the
underlying values are not displayed when the wildcard is used ``*`` (remember,
wildcard never unwinds)::

  MySQL [test]> select * from results;
  +------+-------+------------+------------+
  | _id  | name  | stats.game | stats.wins |
  +------+-------+------------+------------+
  |    1 | Jim   | NULL       |       NULL |
  |    2 | John  | NULL       |       NULL |
  |    3 | James | NULL       |       NULL |
  +------+-------+------------+------------+

When filtering on more than one field in the array of subdocuments (in this
example both ``game`` and ``wins``), a match is only considered if all filters
match the *same* document within the array.

For example, to find out who has more than two wins in golf, we run the
following query::

  MySQL [test]> select name from results where `stats.wins` > 2 and `stats.game`='golf';
  +-------+
  | name  |
  +-------+
  | Jim   |
  | James |
  +-------+
  2 rows in set (0.01 sec)

John does have an entry for golf, and he does have an entry with more than two
wins, but they are for the other games, and thus he is not included in the
results.

Under the hood this is achieved using ``$elemMatch`` operator. For more
information and examples see
https://docs.mongodb.org/manual/reference/operator/query/elemMatch/.

You can use the ``EXPLAIN`` directive to learn how ``$elemMatch`` is used for
your queries.

GROUP BY and result order
-------------------------

If no ``ORDER BY`` clause is specified, *MySQL* will sort results according to
the columns in the ``GROUP BY`` clause.  *SonarSQL* will not implicitly sort
results in this situation, just as if ``ORDER BY NULL`` were specified.

``UNION``
---------

ALL vs DISTINCT
~~~~~~~~~~~~~~~

In *MySQL*, unions are treated as ``DISTINCT`` by default, and ``DISTINCT``
overrides any ``ALL`` union to its left.  In *SonarSQL*, ``DISTINCT`` remains
the default behaviour; however, if there is a ``DISTINCT`` union, it will
override any ``ALL`` union regardless of position.

``_id`` field
-------------

``UNION`` is handled by the underlying *SonarW* using a temporary
collection. This collection will have its own ``_id`` fields. These fields
won't make much sense to the user, so they are hidden from all union
queries.

The effect is that an ``_id`` field included in the ``UNION`` query will
either be hidden, or shown as ``NULL`` in the result. For example::

  MySQL [test]> select * from boys;
  +------+-------+
  | _id  | name  |
  +------+-------+
  | 1    | Billy |
  | 2    | Bob   |
  +------+-------+
  2 rows in set (0.00 sec)

  MySQL [test]> select * from girls;
  +------+-------+
  | _id  | name  |
  +------+-------+
  | 1    | Jenny |
  | 2    | Jane  |
  +------+-------+
  2 rows in set (0.00 sec)

  MySQL [test]> select * from boys union select * from girls;
  +------+-------+
  | _id  | name  |
  +------+-------+
  | NULL | Billy |
  | NULL | Bob   |
  | NULL | Jenny |
  | NULL | Jane  |
  +------+-------+
  4 rows in set (0.02 sec)


Filtering By Dates
------------------

Dates are stored as actual time and date in *Sonar*. When the expression
``DATE('YYYY-MM-DD')`` is used in *MySQL* without time specification, it is
translated to that date at midnight.

This might lead to surprising results when trying to filter documents by
date. Since the expression ``purchased <= DATE('2015-09-22')`` will only be
true for the records where ``purchased`` is before midnight on that date,
essentially excluding all events from ``2015-09-22``. If you want to include
all events from ``2015-09-22``, it is better to use ``purchased <
DATE('2015-09-23')`` instead.

Using the *MySQL* ``BETWEEN`` construct will automatically overcome this
restriction, so expression like ``BETWEEN DATE('2015-09-04') AND
DATE('2015-09-08')`` will be inclusive to both dates.

Example
~~~~~~~

Consider a ``sales`` collection with the following documents::

  { "_id" : 1, "sales" : 63234, "date" : ISODate("2015-09-03T00:00:00Z") }
  { "_id" : 2, "sales" : 15120, "date" : ISODate("2015-09-04T18:00:00Z") }
  { "_id" : 3, "sales" : 75235, "date" : ISODate("2015-09-05T18:00:00Z") }
  { "_id" : 4, "sales" : 12231, "date" : ISODate("2015-09-06T18:00:00Z") }
  { "_id" : 5, "sales" : 155997, "date" : ISODate("2015-09-07T18:00:00Z") }
  { "_id" : 6, "sales" : 74214, "date" : ISODate("2015-09-08T18:00:00Z") }

Using ``BETWEEN``, we get the expected results::

  MySQL [test]> select * from sales where date between date('2015-09-06') and date('2015-09-08');
  +------+---------------------+--------+
  | _id  | date                | sales  |
  +------+---------------------+--------+
  | 4    | 2015-09-06 18:00:00 |  12231 |
  | 5    | 2015-09-07 18:00:00 | 155997 |
  | 6    | 2015-09-08 18:00:00 |  74214 |
  +------+---------------------+--------+
  3 rows in set (0.00 sec)


However, if you use the same dates with a ``>= AND <=`` form, the last day
will be excluded::

  MySQL [test]> select * from sales where date>=date('2015-09-06') and date<=date('2015-09-08');
  +------+---------------------+--------+
  | _id  | date                | sales  |
  +------+---------------------+--------+
  | 4    | 2015-09-06 18:00:00 |  12231 |
  | 5    | 2015-09-07 18:00:00 | 155997 |
  +------+---------------------+--------+
  2 rows in set (0.00 sec)

Using the form recommended above will generate the expected result::

  MySQL [test]> select * from sales where date>=date('2015-09-06') and date<date('2015-09-09');
  +------+---------------------+--------+
  | _id  | date                | sales  |
  +------+---------------------+--------+
  | 4    | 2015-09-06 18:00:00 |  12231 |
  | 5    | 2015-09-07 18:00:00 | 155997 |
  | 6    | 2015-09-08 18:00:00 |  74214 |
  +------+---------------------+--------+
  3 rows in set (0.00 sec)

Functions
=========

Date Format Characters
----------------------

A subset of the *MySQL* format specifier characters are implemented.  Some
specifiers that have not been implemented are converted to a related
specifier.

.. This table is duplicated in translated.rst; changes need to be duplicated

========= ===========
Specifier Description
========= ===========
%a        converted to %w; day of week, numeric (0=Sunday..6=Saturday)
%b        converted to %m; month, numeric (00..12)
%c        converted to %m; month, numeric (00..12)
%D        converted to %d; day of month (00..31)
%d        day of month (00..31)
%e        converted to %d; day of month (00..31)
%f        converted to returning milliseconds (000..999)
%H        hour (00..23)
%h        converted to %h; hour (00..23)
%I        converted to %h; hour (00..23)
%i        minutes (00..59)
%j        day of year (001..366)
%k        converted to %h; hour (00..23)
%l        converted to %h; hour (00..23)
%M        converted to %m; month, numeric (00..12)
%m        month, numeric (00..12)
%p        unsupported, empty string
%r        converted to %T; time, 24-hour (hh:mm:ss)
%S        seconds (00..59)
%s        seconds (00..59)
%T        time, 24-hour (hh:mm:ss)
%U        week (00..53), where Sunday is the first day of the week
%u        converted to %U; week (00..53), where Sunday is the first day of the week
%V        converted to %U; week (00..53), where Sunday is the first day of the week
%v        converted to %U; week (00..53), where Sunday is the first day of the week
%W        converted to %w; day of week, numeric (0=Sunday..6=Saturday)
%w        day of week, numeric (0=Sunday..6=Saturday)
%X        unsupported, empty string
%x        unsupported, empty string
%Y        year, four digits
%y        converted to %Y; year, four digits
========= ===========

Sum of NULL
-----------

In *MySQL*, the sum of ``NULL``, is ``NULL`` as well. In *SonarW*,
the sum of ``NULL`` is zero.

Example
~~~~~~~

Consider a collection ``prices`` with missing information::

  { "_id" : 1, "sku" : "6305543", "price" : null }
  { "_id" : 2, "sku" : "6305544", "price" : null }

Calculating the sum of all prices will result in 0::

  MySQL [test]> select sum(price) from prices;
  +------------+
  | sum(price) |
  +------------+
  |          0 |
  +------------+

Special User Variables
======================

``maxTimeMS``
-------------

*SonarW* can limit the time used per query. In *SonarSQL* this limitation can
be controlled using the ``@maxTimeMS`` user variable. Setting it will have
effect for the remainder of the session. New sessions will have the default
value (100,000. This can be modified in the configuration file under ``max-time-ms``).

.. Note:: This value is given in milliseconds.

Example
~~~~~~~

To make sure no query takes more than 15 seconds, issue::

  MySQL [test]> set @maxTimeMS=15000;
  Query OK, 0 rows affected (0.00 sec)

The value can be retrieved using a simple ``SELECT``::

  MySQL [test]> select @maxTimeMS;
  +------------+
  | @maxTimeMS |
  +------------+
  | 15000      |
  +------------+


``sonarwOptimizer``
-------------------

*SonarW* optimizer can be turned off if it gets in the way. Set this user
variable value to zero and the optimizer will be turned off for all queries
for the remainder of the session.

This is done by prepending a ``{$optimizer: 0}`` phase to each pipeline. The
default value of this value is 1 (optimizer enabled).

Example
~~~~~~~

The default pipeline for a simple query does not indicate any special
``$optimizer`` value, and thus the optimizer is enabled::

  MySQL [test]> explain select * from ghosts\G
  *************************** 1. row ***************************
  Collection: ghosts
    Pipeline: db.ghosts.aggregate([
  {
      "$project" : {
          "*" : 1
      }
  }])

  MySQL [test]> select @sonarwOptimizer;
  +------------------+
  | @sonarwOptimizer |
  +------------------+
  | 1                |
  +------------------+

Now disable the optimizer and see how it affects the pipeline::

  MySQL [test]> set @sonarwOptimizer=0;
  Query OK, 0 rows affected (0.00 sec)

  MySQL [test]> explain select * from ghosts\G
  *************************** 1. row ***************************
  Collection: ghosts
    Pipeline: db.ghosts.aggregate([
  {
      "$optimizer" : 0
  },
  {
      "$project" : {
	  "*" : 1
      }
  }])
  1 row in set (0.01 sec)

``explicitTableOrder``
----------------------

TBD - This might not be relevant in join v2 implementation.

Syntax
======

Letter Case
-----------

Field names in *SonarW* are case sensitive. Even though *MySQL* is mostly
non-case sensitive, *SonarSQL* abides by the underlying database. When using
*SonarSQL*, you must make sure your field names' case match what is in the
*SonarW* collection.

A simple way to find the correct case is to use ``DESCRIBE`` on the table.

Some tables do not really exist in *SonarW*
(e.g. ``INFORMATION_SCHEMA.*``). For columns in these tables, letter case is
immaterial, but must be consistent within the query.

User Management
===============

See the installation and configuration document for the definition of users. At this time, the ``SET PASSWORD`` and ``DROP USER`` commands are not supported. These operations need to be performed directly with *SonarW*.

System Tables
=============

System collections in *SonarW* are visible in the table list (``SHOW
TABLES``), but cannot be queried (``table not found``). System collections
names begin with ``system.``. In addition, the special metadata collection
(``lmrm__metadata2``) cannot be queried.

Tested Clients
=================

User tools
----------
* MariaDB client 
* Tableau 
* MySQL Workbench
* Amazon QuickSight
* Qlik

Libraries
---------
* C

  * libmysqlclient18, v5.5.44

* C++

  * mysqlcppcon v1.1.3-5

* Java 1.8

  * mysql connector 5.1.23
  * mariadb java client v1.1.7

* PHP 5.5.9

  * mysql
  * mysqli
  * PDO

* Python 2.7.6

  * mysqldb 1.2.3
  * pymysql 0.6.6
  * mysql-connector 1.1.6
  * SQLAlchemy 0.8.4

.. rubric:: Footnotes

.. [#mostly] Defined as the type that has a majority of the values for this
             key in the collection.