Data Server - basics
Let's make things really simple.
- A Data Server is a component that supplies streaming real-time data to the front end of your application.
- You define your application's Data Server in a Kotlin script file:
server/appname/src/main/genesis/scripts/appname-dataserver.kts - In this file, you define specific
query
codeblocks, each of which is designed to supply different sets of data. - Each
query
listens to a specified table or view; when data on that source changes, it publishes the changes. - A
query
can include a number of other subtleties, such asfilter
clauses or ranges, so that you can create code that matches your precise requirements. - If you use
AppGen
to build from your dictionary, then a basic kts file will be built automatically for you, covering all the tables and views in your data model. You can edit this file to add sophistication to the component. - Otherwise, you can build your kts by defining each
query
codeblock from scratch.
The simplest possible definition
Your application-name-dataserver.kts Kotlin script file contains all the queries you create. These are wrapped in a single dataServer
statement.
Our example below shows a file with a single query, which publishes changes to the table INSTRUMENT_DETAILS.
dataServer {
query(INSTRUMENT_DETAILS)
}
And here is a simple example that has two queries:
Note that each query is a separate subscription.
dataServer {
query(INSTRUMENT_DETAILS)
query(COUNTERPARTY)
}
Naming
You don't have to give each query
a name. If you do provide one, it will be used exactly as you specify. If you don't, then a name is allocated automatically when the data object is created. The syntax for this allocation is "ALL_{table/view name}S". So, in the example below:
- The first query is called
INSTRUMENT_DETAILS
, as this name has been specified. - The second query is called
ALL_COUNTERPARTYS
, because it has not had a name specified. The allocated name is based on the COUNTERPARTY table, which it queries.
dataserver {
// Name of the dataserver: INSTRUMENT_DETAILS
query("INSTRUMENT_DETAILS", INSTRUMENT_DETAILS)
// Name of the dataserver: ALL_COUNTERPARTYS
query(COUNTERPARTY)
}
Specifying fields
By default, all table or view fields in a query definition will be exposed. If you don't want them all to be available, you must define the fields that are required. In the example below, we specify eight fields:
dataServer {
query(INSTRUMENT_DETAILS) {
fields {
INSTRUMENT_CODE
INSTRUMENT_ID
INSTRUMENT_NAME
LAST_TRADED_PRICE
VWAP
SPREAD
TRADED_CURRENCY
EXCHANGE_ID
}
}
}
Derived fields
You can also define derived fields in a Data Server, to supplement the fields supplied by the table or view. The input for the derived field is the Data Server query row. All fields are available for use.
In the example below, we add a ninth field to our Data Server from the previous example. The new field is a derived field:
dataServer {
query(INSTRUMENT_DETAILS) {
fields {
INSTRUMENT_CODE
INSTRUMENT_ID
INSTRUMENT_NAME
LAST_TRADED_PRICE
VWAP
SPREAD
TRADED_CURRENCY
EXCHANGE_ID
derivedField("IS_USD", BOOLEAN) {
data.tradedCurrency == "USD"
}
}
}
}
Configuration settings
Before you define any queries, you can make configuration settings for the Data Server within the config
block. These control the overall behaviour of the Data Server.
Here is an example of some configuration settings:
dataServer {
config {
lmdbAllocateSize = 512.MEGA_BYTE() // top level only setting
// Items below can be overriden in individual query definitions
compression = true
chunkLargeMessages = true
defaultStringSize = 40
batchingPeriod = 500
linearScan = true
excludedEmitters = listOf("PurgeTables")
enableTypeAwareCriteriaEvaluator = true
serializationType = SerializationType.KRYO // Available from version 7.0.0 onwards
serializationBufferMode = SerializationBufferMode.ON_HEAP // Available from version 7.0.0 onwards
directBufferSize = 8096 // Available from version 7.0.0 onwards
}
query("SIMPLE_QUERY", SIMPLE_TABLE) {
config {
// Items below only available in query level config
defaultCriteria = "SIMPLE_PRICE > 0"
backwardsJoins = false
disableAuthUpdates = false
}
}
}
Below, we shall examine the settings that are available for your config
statement.
Global settings
Global settings can be applied at two levels:
- Top level
- Query level
Top-level global setting
lmdbAllocateSize
This sets the size of the memory-mapped file where the in-memory cache stores the Data Server query rows. This configuration setting can only be applied at the top level. It affects the whole Data Server.
By default, the size is defined in bytes. To use MB or GB, use the MEGA_BYTE
or GIGA_BYTE
functions. You can see these in the example below. The default is 2 GB.
lmdbAllocateSize = 2.GIGA_BYTE()
// or
lmdbAllocateSize = 512.MEGA_BYTE()
serializationType
(available from GSF version 7.0.0)
This sets the serialization approach used to convert query rows into bytes and vice versa. There are two options: SerializationType.KRYO and SerializationType.FURY
- The KRYO option is the default setting if
serializationType
is left undefined. It uses the Kryo library. - The FURY option uses the Fury library instead. Internal tests show that Fury can serialize and deserialize row objects 13-15 times more quickly than Kryo in our current Data Server implementation; this leads to faster LMDB read/write operations (up to twice as quick in some cases). This performance improvement has a great impact on the latency incurred when requesting rows from a Data Server query, whether it happens as part of the first subscription message (DATA_LOGON messages) or subsequent row request messages (MORE_ROWS messages). However, the serialized byte array size of a Fury object can be between 10 and 15 per cent larger than a Kryo object, so there is a small penalty to pay for using this option.
serializationBufferMode
(available from GSF version 7.0.0)
This option changes the buffer type used to read/write information from/to LMDB. There are two options: SerializationBufferMode.ON_HEAP and SerializationBufferMode.OFF_HEAP.
- The ON_HEAP option is the default setting if
serializationBufferMode
is left undefined. It uses a custom cache of Java HeapByteBuffer objects that exist within the JVM addressable space. - The OFF_HEAP option uses DirectByteBuffer objects instead, so memory can be addressed directly to access LMDB buffers natively in order to write and read data. OFF_HEAP buffers permit zero-copy serialization/deserialization to LMDB; buffers of this type are generally more efficient as they allow Java to exchange bytes directly with the operating system for I/O operations. To use OFF_HEAP, you must specify the buffer size in advance (see
directBufferSize
setting); consider this carefully if you do not know the size of the serialized query row object in advance.
directBufferSize
This option sets the buffer size used by the serializationBufferMode
option when the SerializationBufferMode.OFF_HEAP
setting is configured.
By default, the size is 8096 bytes.
Query-level global settings
The following settings can be applied at a query-level.
defaultCriteria
This represents the default criteria for the query. Defaults to null
.
disableAuthUpdates
This disables real-time auth updates in order to improve the overall data responsiveness and performance of the server. Defaults to false
.
backJoins
Seen in older versions of the platform, this has been replaced by backwardsJoins
. It is functionally the same.
backwardsJoins
This enables backwards joins on a view query. Backwards joins ensure real-time updates on the fields that have been joined. These need to be configured at the join level of the query in order to work correctly. Defaults to true
.
Shared settings
compression
If this is set to true
, it will compress the query row data before writing it to the in-memory cache. Defaults to false
.
chunkLargeMessages
If this is set to true, it will split large updates into smaller ones. Defaults to false
.
defaultStringSize
This is the size to be used for string storage in the Data Server in-memory cache. Higher values lead to higher memory use; lower values lead to lower memory use, which can lead to string overflow. See the onStringOverflow
setting for details of how the overflows are handled. Defaults to 40
.
batchingPeriod
This is the delay in milliseconds to wait before sending new data to Data Server clients. Defaults to 500ms
.
linearScan
This enables linear scan behaviour in the query definition. If false, it will reject criteria expressions that don't hit defined indexes. Defaults to true
.
excludedEmitters
This enables update filtering for a list of process names. Any database updates that originate from one of these processes will be ignored. Defaults to an empty list.
onStringOverflow
This controls how the system responds to a string overflow. A string overflow happens when the value of a String field in an index is larger than defaultStringSize
, or the size set on the field.
There are two options for handling string overflows:
IGNORE_ROW
- rows with string overflows will be ignored. This can lead to data missing from the Data Server.TRUNCATE_FIELD
- indices with string overflows will be truncated. The data with the overflow will still be returned in full, and will be searchable. However, if multiple rows are truncated to the same value, any subsequent rows will lead to duplicate index exceptions during the insert, so these rows will not be available to the Data Server.
enableTypeAwareCriteriaEvaluator
This enables the type-aware criteria evaluator. Defaults to false
.
The type-aware criteria evaluator can automatically convert criteria comparisons that don't match the original type of the Data Server field; these can still be useful to end users.
For example, you might want a front-end client to perform a criteria search on a TRADE_DATE
field like this: TRADE_DATE > '2015-03-01' && TRADE_DATE < '2015-03-02'
.
This search can be translated automatically to the right field types internally (even though TRADE_DATE
is a field of type DateTime
). The Genesis index search mechanism can also identify the appropriate search intervals in order to provide an optimised experience.
The type-aware evaluator can transform strings to integers, and any other sensible and possible conversion (e.g TRADE_ID == '1'
). As a side note, this type-aware evaluator is also available in DbMon
for operations like search
and qsearch
.
By contrast, the traditional criteria evaluator needs the field types to match the query fields in the Data Server. So the same comparison using the default criteria evaluator for TRADE_DATE
would be something like: TRADE_DATE > new DateTime(1425168000000) && TRADE_DATE < new DateTime(1425254400000)
. This approach is less intuitive and won't work with our automatic index selection mechanism. In this case, you should use our common date expressions (more at Advanced technical details) to handle date searches.
Backwards joins
Each query in a Data Server creates what is effectively an open connection between each requesting user and the Data Server. After the initial send of all the data, the Data Server sends only modifications, deletions and insertions in real time. Unchanged fields are not sent, which avoids unnecessary traffic.
By default, only updates to the root table are sent. So if you need to see updates to fields from a joined table, then you must specifically declare a backwards join. For queries with backwards joins enabled, the primary table and all joined tables and views are actively monitored. Any changes to these tables and views are sent automatically.
Monitoring of backwards joins can come at a cost, as it can cause significant extra processing. Do not use backwards joins unless there is a need for the data in question to be updated in real time. For example, counterparty data, if changed, can wait until overnight. The rule of thumb is that you should only use backwards joins where the underlying data is being updated intraday.
The backwards join flag is true by default. You must explicitly set backwardsJoins = false
if you wish to turn backwards joins off for your query.
Filter clauses
If you include a filter
clause in a query, the request is only processed if the criteria specified in the clause are met. For example, If you have an application that deals with derivatives that have parent and child trades, you can use a filter
clause to confine the query to parent trades only.
Also, you can use these clauses to focus on a specific set of fields or a single field.
Finally, note that filter
clauses can also be used in conjunction with permissioning. In the below example, users that have the TRADER or SUPPORT permission code are able to see all trades that meet the filter
condition.
dataServer {
query("ALL_TRADES_LARGE_POSITION", ENHANCED_TRADE_VIEW) {
permissioning {
permissionCodes = listOf("TRADER", "SUPPORT")
auth(mapName = "ENTITY_VISIBILITY") {
authKey {
key(data.counterpartyId)
}
}
}
filter {
(data.quantity * data.price) > 1000000
}
}
}
Indices
A query can optionally include an indices
block to define additional indexing at the query level. When an index is specified in the query, all rows returned by the query are ordered by the fields specified, in ascending order. The definition and the behaviour of an index in a query are exactly the same as when you define an index in a table.
Here is an example of a simple index:
dataServer {
query("SIMPLE_QUERY", SIMPLE_TABLE) {
indices {
unique("SIMPLE_QUERY_BY_QUANTITY") {
QUANTITY
SIMPLE_ID
}
}
}
}
There are two scenarios where an index is useful:
- Optimising query criteria search. in the example above, if the front-end request specifies criteria such as
QUANTITY > 1000 && QUANTITY < 5000
, then the Data Server will automatically select the best matching index. In our example, it would beSIMPLE_QUERY_BY_QUANTITY
. This means that the platform doesn't need to scan all the query rows stored in the Data Server memory-mapped file cache; instead, it performs a very efficient indexed search. - Where the front-end request specifies an index. You can specify one or more indices in your Data Server query. And the front end can specify which index to use by supplying
ORDER_BY
as part of theDATA_LOGON
process. The Data Server will use that index to query the data. The data will be returned to the client in ascending order, based on the index field definition.
Important: Index definitions are currently limited to unique indices. As quantity does not have a unique constraint in the example definition shown above, we need to add SIMPLE_ID
to the index definition to ensure we maintain uniqueness.
Auto-generated REST endpoints
As we have mentioned before, all queries created in the dataserver file are automatically exposed as a REST endpoints.
Below you see an example of accessing a custom endpoint called ALL_TRADES running locally. This endpoint was automatically generated by the Genesis platform when you defined the query ALL_TRADES in the application's -dataserver.kts file.
[POST] http://localhost:9064/ALL_TRADES/
You can see how to manipulate and work with auto-generated endpoints in our pages on REST endpoints.
Custom Log messages
Genesis provides a class called LOG
that can be used to insert custom log messages. This class provides you with
5 methods to be used accordingly: info
, warn
, error
,trade
,debug
. To use these methods, you need to provide, as an input,
a string that will be inserted into the Logs.
Here is an example where you can insert an info message into the log file:
LOG.info("This is an info message")
The LOG
instance is a Logger from SLF4J.
In order to see these messages, you must set the logging level accordingly. You can do this using the logLevel command.