GS SQL - LLM Agent Reference (Compact Core)

This is the compact CORE for AI agents — only the GS-specific things that prevent failures and that the agent wouldn't know to search for. For everything else (function syntax, full DDL, deep grammar, catalog/view syntax, table function details, edge cases, recipes), use DocsRetrieval / GrepDocs / ReadDoc on the developer HTML docs (gsqlgrammar.html, gsqltables.html, gsqlteq.html, gsqlddl.html, gsql_llm_ddl_reference.html, help_*.html).

0. Relationship to ANSI SQL and PostgreSQL

GS SQL is an ANSI SQL / PostgreSQL-flavored dialect with temporal extensions. Treat your existing PostgreSQL knowledge as the prior — most things that work in PostgreSQL work in GS SQL with the same semantics. Use this doc only for the GS-specific things that won't behave the way you expect. Behaves like ANSI / PostgreSQL (no need to look up): - Operators: || (string concat), ::type (cast), +, -, *, /, %, =, !=, <>, <, <=, >, >=, AND, OR, NOT, IS NULL, IS NOT NULL, BETWEEN, IN, EXISTS, LIKE (case-sensitive), ILIKE (case-insensitive), CAST(expr AS type) - Keyword forms: EXTRACT(field FROM expr), CASE/WHEN/THEN/ELSE, COALESCE, NULLIF - Type names: STRING, BOOLEAN, LONG, DOUBLE, BIGDECIMAL — plus PG aliases (int8/bigint/integer/int4/smallint/int2/text/varchar/bool/numeric/decimal/float8/float4/real/timestamp/timestamptz) - Standard SQL functions: full standard library — character_length, length, position, strpos, instr, lower, upper, trim/ltrim/rtrim, substr, replace, concat, date_trunc, date_part, to_timestamp, to_date, regexp_match, regexp_replace, count, sum, avg, min, max, var, stddev, percentile, abs, round, floor, ceil, sqrt, power, log, ln, exp, pg_typeof, now(), current_date, current_time, current_timestamp - Clauses: SELECT (DISTINCT), FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT/OFFSET, WITH (CTEs incl. RECURSIVE), JOIN (INNER, LEFT/RIGHT/FULL OUTER, CROSS), UNION/UNION ALL/INTERSECT/EXCEPT, INSERT, UPDATE, DELETE, subqueries (correlated and non-correlated), CASE expressions, aggregate DISTINCT (e.g. COUNT(DISTINCT x)) - DDL: CREATE/ALTER/DROP TABLE, VIEW, MATERIALIZED VIEW - public. schema prefix on table refs is silently stripped If you're writing standard SQL — any of the above — trust your training. Search the docs only for the GS-specific things below.

1. Two Query Modes + GS-Specific Feature Inventory

TEQ (Temporal Entity Query) — Component template IDs as table names; stream IDs as columns; components as rows. Default returns lastValue per stream. Supports FK JOINs. Recommended when the org has templates. TDQ (Temporal Deep Query) — Query system.stream / system.active_event / system.user_notification / system.system_notification / system.job_notification directly. Use when no templates exist, when accessing system tables, or when modifying sample data at specific timestamps. Cannot mix TEQ and TDQ tables in the same query. Pick one mode per statement. GS-specific features beyond ANSI SQL — search the docs for syntax when you need them: - Per-column temporal parameters — e.g. kwh(range(last=100)), temperature(cycleId='hour', stat='avg'). Each stream column can have independent time semantics. THE primary GS-specific feature; covered in detail in §6. - Catalogs — multi-namespace layer (Snowflake/Databricks-style). Reference as catalog.table.column. DDL/syntax in gsql_llm_ddl_reference.html. - VIEW / MATERIALIZED VIEW — saved queries used as tables. Internal (over org data) or external (over a JDBC connection). Materialized views cache results; non-materialized re-run on each reference. Syntax in gsql_llm_ddl_reference.html. - EXTERNAL_QUERY(connectionId, remoteSql) — inline federated query against a JDBC database. Remote SQL is in the remote DB's native dialect, not GS SQL. Search gsqlgrammar.html for syntax. - JSON_TABLE / CSV_TABLE — parse JSON / CSV into rows. Search gsqlgrammar.html for syntax. - FK JOINs — streams configured as foreign keys can be JOINed in TEQ. See §3. - Window temporal parameters — lag, lead, sliding window (slide + slidestat), cumulative (running) — operate directly on sample arrays. See §6e. Faster than SQL OVER(). - FK-resolved derivations — derived stream variables can resolve targets via SQL with automatic temporal segmentation across relationship-change boundaries. Configuration in gsql_llm_ddl_reference.html and help_fk_deps.html. - User-defined session variables — SET @var = expr or SET @var = (subquery). Session-scoped, case-insensitive. Search docs for full rules. - Virtual columns / virtual views — standard SQL inline expressions (SELECT col*1.1 AS x) and FROM-subqueries (FROM (SELECT ...) v). Both fully supported. - Cost-based query planner — per-org table statistics drive index/join choices. ANALYZE STATS DDL refreshes synchronously; auto-refresh on JDBC import and on a 30-min cycle.

2. Conventions and Quirks

- Case insensitive: table names, column names, functions, reserved words. - Comments: -- single-line, /* */ multi-line. - Datetimes are epoch-millisecond Longs, not PG timestamp values. EXTRACT(EPOCH FROM ts) returns milliseconds, NOT PG-style seconds (divide by 1000 for PG-compatible behavior). - NOW keyword for current time. - Missing optional values are '' (empty string), not NULL. - System columns are underscore-prefixed snake_case: _component_name, _component_uid, _component_id, _component_created_date, _folder_path, _name, _uid, _id, _last_value, _start_date, _end_date, _stream_type, _template_uid, _sample, _time, _time_sd, _time_ed. - LIKE / ILIKE: LIKE is case-sensitive (ANSI/PG), ILIKE is case-insensitive (PG). Both accept */? in addition to %/_; GLOB is the same as ILIKE. - REGEXP: Java java.util.regex.Pattern syntax. String literals process backslash escapes — use '\\d+' (double backslash) for \d+, or character classes like '[0-9]+'. - Stream IDs that are SQL keywords must be quoted: "select", [order], "group". - No transactions over the PG wire adapter (BEGIN/COMMIT/ROLLBACK are no-ops). - No PG OVER() window functions, no IS DISTINCT FROM, no PG roles/grants. See §7. - public. schema prefix is silently stripped on table refs. - || precedence sits at the OR slot rather than between additive and IS — minor divergence; rarely matters. May 2026 changes you may not have in training: - || repurposed from logical OR to ANSI/PG string concat. Use AND/OR keywords for logical ops. - && no longer accepted as logical AND. Use AND. - LIKE is now case-sensitive (ANSI/PG). ILIKE added for case-insensitive matching. Saved queries are auto-rewritten to ILIKE so existing semantics are preserved. - :: postfix cast added (e.g. '42'::LONG, price::NUMERIC). - Type aliases accepted everywhere a type name is: TEXT/VARCHAR=STRING; BOOL=BOOLEAN; INT8/BIGINT/INTEGER/INT4/SMALLINT/INT2/TIMESTAMP/TIMESTAMPTZ=LONG; FLOAT8/FLOAT4/REAL=DOUBLE; NUMERIC/DECIMAL=BIGDECIMAL. - EXTRACT(field FROM expr) keyword form (in addition to function-style date_part). - TEQ now supports full ANSI relational semantics over template tables (UNION/INTERSECT/EXCEPT, HAVING, subqueries, CASE, aggregate DISTINCT, all JOIN types).

3. TEQ - Template Table Queries

Template ID = table name. Stream IDs = columns. Each component = one row. Default returns lastValue per stream. Full ANSI relational semantics over template tables: JOINs, GROUP BY, HAVING, UNION/INTERSECT/EXCEPT, DISTINCT, subqueries (correlated and non-correlated), CASE, aggregate DISTINCT — all work normally. System columns (always available): _component_name, _component_uid, _component_id, _component_created_date, _folder_path. Stream column syntax: temperature -- returns lastValue s.temperature -- qualified with alias temperature(range(last=100)) -- with temporal params (returns multiple rows per component) s.temperature(range(sd=-1d)) -- alias + temporal params Mixed temporal mode: Each stream column has independent temporal parameters. Columns without params return lastValue. Columns with params return time-series data. Mixing is supported and common — the engine LEFT JOINs the sampled subquery with a lastValue pivot subquery. FK JOINs: Streams configured as foreign keys can JOIN template tables. SELECT m._component_name, c.region FROM meter m JOIN customer c ON m.customerUid = c._component_uid Examples: SELECT _component_name, kwh FROM meter SELECT _component_name, kwh(range(last=100)) FROM meter SELECT _component_name, kwh(range(sd=-1d)), voltage FROM meter -- mixed temporal SELECT _component_name, kwh(cycleId='day', stat='sum', range(currentCycle='year')) FROM meter SELECT c.region, AVG(m.kwh) FROM meter m JOIN customer c ON m.customerUid = c._component_uid GROUP BY c.region SELECT _component_name, kwh(running='sum', range(currentCycle='month')) FROM meter -- Catalogs (multi-namespace) SELECT _component_name, kwh FROM utility.meter

4. TDQ - System Tables

system.stream — primary system table. One row per stream, NOT one per component. Use DISTINCT _component_name when you want unique components. Indexed columns (prefer in WHERE for performance): _component_name, _component_uid, _component_id, _uid, _id, _name, _folder_path, _template_uid. Composite index: (_component_name, _name). Index operators: =, >, >=, <, <=, BETWEEN, LIKE 'prefix%', ILIKE 'prefix%', STARTSWITH('prefix'). String indexes are stored case-folded (lowercase), so case-sensitive LIKE / =, BETWEEN, etc. still use the index but read a small superset that the runtime filter narrows. ILIKE and STARTSWITH match the index encoding directly. Indexed string values are capped at 256 bytes for the index entry only — values longer than that still index and query correctly (the runtime always re-checks the full value); only their selectivity may be reduced. Other columns: description, timeZoneId, unitUid, _stream_type (intvl_stream/rdm_stream/point_stream), _derivation_type, baseCycleUid, rollupCalendarUid, timeFilterUid, defaultRollupMethod, _sample_value_type, _start_date (Long), _end_date (Long), completedDate (Long), lastUpdated (Long), _last_value (varies), _last_value_type, _component_created_date. Sample columns (apply temporal params here): _sample (varies), _time, _time_sd, _time_ed. Other system tables (read-only): system.active_event, system.user_notification, system.system_notification, system.job_notification. Search gsqltables.html for column schemas. DML on system.sample only. system.stream and notification tables are read-only. Performance: The cost-based planner uses per-org statistics. After bulk imports or large component additions, run ANALYZE STATS to refresh synchronously. Examples: SELECT DISTINCT _component_name FROM system.stream SELECT _component_name, _name, _last_value FROM system.stream WHERE _component_id = 'meter1' SELECT _sample(range(sd=-1d)) FROM system.stream WHERE _component_id='gen1' AND _id='kw' SELECT _sample(cycleId='month', stat='sum', range(currentCycle='year')) FROM system.stream WHERE _component_id='gen1' AND _id='kw'

5. DML Operations

Permission settings (org-level; defaults shown): gsqlAllowInsert=true, gsqlAllowUpdate=true, gsqlAllowDelete=false, gsqlAllowInsertEntity=true, gsqlAllowUpdateEntity=true, gsqlAllowDeleteEntity=false. DDL permissions documented in gsql_llm_ddl_reference.html.

5a. INSERT - system.sample (TDQ)

INSERT OR REPLACE INTO system.sample [WITH (options)] (columns) VALUES (...) | SELECT ... OR REPLACE is mandatory. Required columns: (_COMPONENT_UID and _UID) or (_COMPONENT_ID and _ID), plus _TIME (or _TIME_SD/_TIME_ED for intervals), plus _SAMPLE. WITH options for auto-creating components when using IDs (case-insensitive, accept underscore or camelCase, parentheses optional): COMP_TMPL_ID, FOLDER_PATH (must start with /Components), CNAME, DESCRIPTION, TIMEZONE_ID, DERIVE_MAX_STREAMS, DERIVE_TIMEOUT. If a component or stream doesn't exist when IDs are used, it is auto-created. INSERT OR REPLACE INTO system.sample (_COMPONENT_UID, _UID, _TIME, _SAMPLE) VALUES('a1db...', '2e75...', 1736488800000, 2.0); INSERT OR REPLACE INTO system.sample WITH (COMP_TMPL_ID='accountTemplate') (_COMPONENT_ID, _ID, _TIME, _SAMPLE) VALUES('acct_100', 'balance', 1736488800000, 5000.00); INSERT OR REPLACE INTO system.sample WITH (COMP_TMPL_ID='tmpl1') (...) SELECT ... FROM another_table WHERE ...;

5b. INSERT - Template tables (TEQ)

INSERT INTO template_table (columns) VALUES (...) | SELECT ... NO "OR REPLACE" for template inserts. _component_id is mandatory. WITH clause supports CNAME, FOLDER_PATH, DESCRIPTION, TIMEZONE_ID, LATITUDE, LONGITUDE, ELEVATION. INSERT INTO sensor (_component_id) VALUES ('s1'); INSERT INTO sensor (_component_id, temperature) VALUES ('s2', 42.0); INSERT INTO sensor WITH (LATITUDE=40.7128, LONGITUDE='-74.0060') (_component_id) VALUES ('nyc_sensor'); INSERT INTO sensor (_component_id, temperature) SELECT name, reading FROM csv_table('...');

5c. UPDATE - system.sample (TDQ)

UPDATE system.sample SET _sample = value WHERE _uid = 'stream_uid' AND _time = epoch_ms Modifies a sample at a specific timestamp (INSERT OR REPLACE semantics internally).

5d. UPDATE - Template tables (TEQ)

UPDATE template_table SET column = value [, ...] [WHERE ...] Appends new stream values at NOW for matching components. Does NOT modify history. Only stream columns can be SET (not system columns). WHERE optional — omitting it updates ALL components. UPDATE sensor SET temperature = 100.0 WHERE _component_name = 'Sensor_A'; UPDATE sensor SET status = (SELECT k.status FROM knowledge k WHERE k.category = sensor.category) WHERE sensor.category IN (SELECT category FROM knowledge); UPDATE...FROM is NOT supported. Use correlated subqueries for SET-from-other-table.

5e. DELETE - system.sample (TDQ)

DELETE FROM system.sample WHERE _uid = '...' [AND _time >= start AND _time <= end] Without time conditions, all samples for that stream are deleted. Stream metadata preserved.

5f. DELETE - Template tables (TEQ)

DELETE FROM template_table WHERE ... Cascading delete: removes the component AND all its streams, samples, and index entries.

5g. DDL

All DDL (CREATE/ALTER/DROP for templates, cycles, calendars, views, materialized views, dashboards, catalogs, entity diagrams, connectors, runnables, etc.) is documented in gsql_llm_ddl_reference.html.

6. Sample Temporal Parameters (SELECT result columns ONLY)

CRITICAL RULE: Temporal parameters can ONLY appear on result columns in the SELECT clause. NEVER in WHERE, GROUP BY, ORDER BY, HAVING, or any other clause. The highest-leverage feature in GS SQL. Each stream column has independent time semantics, sample aggregation, gap filling, and window operations — applied directly on sample arrays. Faster than SQL OVER() / WINDOW / GROUP BY date_trunc(...) patterns. Use these for time-series queries. See §7 for composed examples. Forms: TDQ: SELECT _sample(params) FROM system.stream TEQ: SELECT streamId(params) FROM template Range — filters samples by time. Keywords: sd (start, inclusive), ed (end, exclusive), last=N (last N from ed), currentCycle='cycleId', previousCycle='cycleId'. Values: ISO8601 strings, epoch ms, now, streamSd, streamEd, or modifier expressions. Missing sd → streamSd; missing ed → streamEd; if sd uses a modifier, the modifier is applied relative to ed's datetime. Datetime modifiers (case-sensitive): s=second, m=minute, h=hour, d=day, w=week, M=month, Y=year. Capital M for month vs. lowercase m for minute is the easy confusion. CycleId / Stat — aggregates samples into intervals matching the cycle. Stat options (case insensitive): FIRST, LAST, MIN, MAX, AVG, SUM, MINOCCURRENCE, MAXOCCURRENCE, GAPCOUNT, NONGAPCOUNT, INTVLCOUNT, MILLISECCOUNT, NONGAPMILLISECCOUNT, STDDEVP, STDDEVA, SAMPLECOUNT, SUMSIN, SUMCOS. Most are pre-calculated during rollup for fast retrieval. TimeFilterId — applies a GS Time Filter; filtered intervals stay as gaps when GapFill is also used. GapFill — fills gaps in interval streams. Applied after temporal logic, before WHERE/aggregation. Subkeys: head (start gaps; NEXTVAL or literal), middle (middle gaps; PREVIOUSVAL, NEXTVAL, MIN, MAX, AVG, SPLINE, or literal), tail (end gaps; PREVIOUSVAL or literal). Process order: HEAD, TAIL, MIDDLE. String options quoted (head='NEXTVAL'); numeric values unquoted (head=0). Window parameters — operate directly on sample arrays. Applied after GapFill, before WHERE/aggregation. Positional: lag=N, lead=N, lagdefault=val, leaddefault=val (work with any type). Aggregate: slide=N + slidestat='fn' (sliding), running='fn' (cumulative); supported fns are sum, avg, min, max, first, last, count; require numeric types. Rules: lag/lead are mutually exclusive; slide/running are mutually exclusive; cannot mix positional with aggregate on the same column. Window params respect both the stream's reference time filter and the query-time TimeFilterId — filtered samples are skipped (lag/lead count only non-filtered positions; slide/running exclude filtered samples; filtered positions produce NULL output). Multi-sample merge — multiple _sample() result columns sharing range/cycle merge into one row per bucket per component (org setting gsqlMergeSampleCols; ON by default). When OFF, each _sample() emits a separate tuple set with NULLs in the other sample columns.

7. Temporal Parameters - Composed Examples

A few examples cover most usages. Aggregation (cycleId + stat) and Window operations (lag/lead/slide/running) are mutually exclusive, so they're shown separately. All-in-one aggregation — Range + CycleId + Stat + TimeFilterId + GapFill in one statement: -- TDQ: hourly weekend-afternoon sum for the current year, with gap fill SELECT _sample(Range(currentCycle='year'), cycleId='hour', stat='sum', timeFilterId='weekendAfternoon', GapFill(head='nextval', middle='avg', tail=0)) FROM system.stream WHERE _id='kw' -- TEQ: same idea per component SELECT _component_name, kwh(Range(currentCycle='year'), cycleId='hour', stat='sum', timeFilterId='weekendAfternoon', GapFill(head='nextval', middle='avg', tail=0)) FROM meter Side-by-side stats — multiple sample columns sharing range/cycle merge into one row per bucket: SELECT _component_name, _time, _sample(range(sd=-1d, ed=now), CycleId='hour', Stat='avg') AS hourly_avg, _sample(range(sd=-1d, ed=now), CycleId='hour', Stat='max') AS hourly_peak FROM system.stream WHERE _component_id='gen1' AND _id='kw' Window operations — direct on sample arrays, in place of SQL OVER(): -- Cumulative sum for the current month (TEQ) SELECT _component_name, kwh(running='sum', Range(currentCycle='month')) FROM meter -- 5-sample moving average over the last day (TEQ) SELECT _component_name, kwh(slide=5, slidestat='avg', Range(sd=-1d)) FROM meter -- Delta from previous reading (TDQ) SELECT _sample - _sample(lag=1, Range(last=100)) AS delta FROM system.stream WHERE _component_id='gen1' AND _id='kw' Mixed temporal mode (TEQ) — one column with temporal params, one without; engine LEFT JOINs sampled subquery against lastValue pivot: SELECT _component_name, kwh(range(sd=-1d)), voltage FROM meter -- kwh returns last-day samples (multiple rows); voltage returns lastValue (single)

8. Anti-Patterns (Common LLM Mistakes)

Common errors when generating GS SQL. Each shows the WRONG attempt (PG/ANSI habit, hallucinated syntax) and the CORRECT GS SQL form. Window functions: WRONG: SELECT kwh, ROW_NUMBER() OVER (ORDER BY _time) FROM meter WRONG: SELECT kwh, AVG(kwh) OVER (PARTITION BY customerUid) FROM meter CORRECT: GS SQL has no OVER() clause. Use temporal Window parameters: SELECT kwh(running='sum', range(currentCycle='month')) FROM meter SELECT kwh(lag=1, range(last=100)) FROM meter SELECT kwh(slide=5, slidestat='avg', range(sd=-1d)) FROM meter Temporal parameters in WHERE / GROUP BY / ORDER BY / HAVING: WRONG: SELECT * FROM system.stream WHERE _sample(range(last=5)) > 0 WRONG: SELECT * FROM meter ORDER BY kwh(range(last=10)) CORRECT: Temporal parameters appear ONLY in SELECT result columns. To filter by time, use _time conditions on system.sample or wrap the temporal expression in a subquery. IS DISTINCT FROM: WRONG: SELECT * FROM meter WHERE category IS DISTINCT FROM 'A' CORRECT: GS SQL doesn't support IS DISTINCT FROM. Use: WHERE (category != 'A' OR category IS NULL) NULL vs empty string for missing values: WRONG: SELECT * FROM meter WHERE optional_tag IS NULL CORRECT: Missing optional stream values are '' (empty string), NOT NULL. Use: WHERE optional_tag = '' OR optional_tag IS NULL (cover both, since some columns can also be NULL). Datetime literals: WRONG: SELECT * FROM system.sample WHERE _time > '2026-01-01' WRONG: SELECT * FROM system.sample WHERE _time > TIMESTAMP '2026-01-01' CORRECT: Datetimes are epoch-millisecond Longs. Use toEpochMillis('2026-01-01') for direct comparison, or pass an ISO8601 string to a Range parameter: _sample(range(sd='2026-01-01', ed=now)) EXTRACT(EPOCH FROM ts): WRONG: Assuming EXTRACT(EPOCH FROM ts) returns seconds (PG behavior). CORRECT: In GS, EPOCH returns milliseconds. Divide by 1000 for PG-compatible seconds: EXTRACT(EPOCH FROM ts) / 1000 Transactions: WRONG: BEGIN; INSERT...; COMMIT; CORRECT: GS SQL has no transactions over the PG wire adapter. BEGIN/COMMIT/ROLLBACK are no-ops. Operate as if each statement is its own atomic unit. Mixing TEQ and TDQ in one query: WRONG: SELECT m.kwh, s._name FROM meter m JOIN system.stream s ON m._component_uid = s._component_uid CORRECT: Cannot mix template tables with system tables in the same query. Pick one mode. To get stream metadata for components in a template, do two queries. Forgetting underscore prefix on system columns: WRONG: SELECT component_name FROM meter WRONG: SELECT component_uid FROM system.stream CORRECT: All system columns are underscore-prefixed snake_case — see §2. system.stream row counts: WRONG: SELECT _component_name FROM system.stream -- expects one row per component CORRECT: system.stream has one row per STREAM, not per component. Use DISTINCT or filter by _id: SELECT DISTINCT _component_name FROM system.stream Template inserts without _component_id: WRONG: INSERT INTO sensor (temperature) VALUES (50) CORRECT: Template inserts require _component_id: INSERT INTO sensor (_component_id, temperature) VALUES ('s1', 50) Sample inserts without OR REPLACE: WRONG: INSERT INTO system.sample (_component_uid, _uid, _time, _sample) VALUES (...) CORRECT: INSERT OR REPLACE is mandatory on system.sample. Per-user RBAC is automatic — do not add filters for it: Per-user permissions are enforced inside the query engine across every access path (UI, GS SQL, ODBC/JDBC, OData, AI assistant). Unauthorized data is absent from results — not just hidden. Do NOT add WHERE clauses to filter by the calling user's permissions; the engine has already done it.