Impressive! An entire article about semantic layers, artfully avoids ever defining what a semantic layer is.
Let me take a swipe at it: a semantic layer helps express queries and their results in terms the end-consumers will care about / prefer to reason in, instead of whatever extremely correct and efficient atrocities the database nerds came up with.
Sounds good to me! Semantic layers help expose a more user-friendly view of the data, so it is easier to ask business questions and get accurate results. More technically, it brings modularity and reusability to SQL. Things like joins, aggregate functions, and dimensional expressions are encapsulated as new fields/objects. Typically this logic is rendered at query time rather than pre-computed and materialized. The advantage of that is more flexible iteration and composability. In essence they are highly glorified SQL templating engines.
I love a semantic layer as much as the next guy...
Pivoting a decent sized BI shop toward using one instead of splashing the same SQL all over the place is *tough*. It's one of those: "the analyst could have been building important report for director and you want them to create re-usable logic??? we'll do that later, get report done now. Just copy/paste that SQL over here"
This is how you end up with the the 1000 model, "the numbers don't match up", hot mess situations that gain momentum and are hard to slow down.
That tracks. The semantic layer is like a capital investment that pays off over time. So it can be hard to justify the initial investment if people don't grok the payoff.
The flip side is, you often don’t know what needs to be reusable until you’ve had some iterations. Wrong abstractions can be way worse, and also gain their own momentum.
The problem is that often these quick or maybe not reusable are written in such a haste that there's no breadcrumbs left to do the right thing whenever you are done getting that urgent thing out (most likely never because "everything is urgent" :( )
Yeah, I think it's great that there are ARD formats and you can access bytes via low level s3 like protocol. This enables interesting tools like DuckDB which can abstract away some stuff, and be fastish and "serverless". However, clearly there is also some kind of marketing hype train and jargon built around it, and it seems like a concerted movement to displace some other "boring" and "uncool" products and technologies. I actually think it's great to displace proprietary services with open formats and protocols. I hope it takes out "data lakes" and co, but I'd love to keep MVC and not invent completely new terms, APIs and ORMs, for things that have been working fine, for a long time.
Semantic Layer is about decomposing views into dimensions and aggregates, then letting downstream apps/users compose their own views on top without having to redefine/re-calculate business level metrics.
This makes data analyis more flexible than sql views which are hardcoded on particular groupings.
It's a lot more. A SQL VIEW is just a saved query, where a semantic layer defines the shared meaning of the data, and helps enforce consistent metrics, joins, and logic across tools. You'd be surprised at how many ways "active customer" can be represented as SQL.
We built a transformation library[0] (think a simpler, more performant dbt) for duckdb and I'd really like to create a semantic layer as an extension for it at some point.
Limiting support to only duckdb would make some really useful features trivial to implement. e.g. duckdb has a `json_serialize_sql` function that would handle a lot of the tedious parts of building a semantic layer.
Nothing to do with linear, meaningful projections on embedding spaces, and everything to do with efficient maintenance of legacy data reporting systems.
I think Common Logic ( https://en.m.wikipedia.org/wiki/Common_Logic - ISO/IEC 24707:2007) would be a good addition to any effort trying to add a semantic layer to any database.
This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.
Impressive! An entire article about semantic layers, artfully avoids ever defining what a semantic layer is.
Let me take a swipe at it: a semantic layer helps express queries and their results in terms the end-consumers will care about / prefer to reason in, instead of whatever extremely correct and efficient atrocities the database nerds came up with.
Did I get that right?
Sounds good to me! Semantic layers help expose a more user-friendly view of the data, so it is easier to ask business questions and get accurate results. More technically, it brings modularity and reusability to SQL. Things like joins, aggregate functions, and dimensional expressions are encapsulated as new fields/objects. Typically this logic is rendered at query time rather than pre-computed and materialized. The advantage of that is more flexible iteration and composability. In essence they are highly glorified SQL templating engines.
I love a semantic layer as much as the next guy...
Pivoting a decent sized BI shop toward using one instead of splashing the same SQL all over the place is *tough*. It's one of those: "the analyst could have been building important report for director and you want them to create re-usable logic??? we'll do that later, get report done now. Just copy/paste that SQL over here"
This is how you end up with the the 1000 model, "the numbers don't match up", hot mess situations that gain momentum and are hard to slow down.
That tracks. The semantic layer is like a capital investment that pays off over time. So it can be hard to justify the initial investment if people don't grok the payoff.
this is why I liked Looker. The only way to build reports was from the semantic layer which was easy to use and built into the BI.
we took the same approach when we started https://www.definite.app/.
The flip side is, you often don’t know what needs to be reusable until you’ve had some iterations. Wrong abstractions can be way worse, and also gain their own momentum.
The problem is that often these quick or maybe not reusable are written in such a haste that there's no breadcrumbs left to do the right thing whenever you are done getting that urgent thing out (most likely never because "everything is urgent" :( )
Yeah, I think it's great that there are ARD formats and you can access bytes via low level s3 like protocol. This enables interesting tools like DuckDB which can abstract away some stuff, and be fastish and "serverless". However, clearly there is also some kind of marketing hype train and jargon built around it, and it seems like a concerted movement to displace some other "boring" and "uncool" products and technologies. I actually think it's great to displace proprietary services with open formats and protocols. I hope it takes out "data lakes" and co, but I'd love to keep MVC and not invent completely new terms, APIs and ORMs, for things that have been working fine, for a long time.
Is a "semantic layer" nothing more than a fancy name for a SQL VIEW in a NoSQL?
No, it's more than that.
Semantic Layer is about decomposing views into dimensions and aggregates, then letting downstream apps/users compose their own views on top without having to redefine/re-calculate business level metrics.
This makes data analyis more flexible than sql views which are hardcoded on particular groupings.
Kind of annoying the article writes "What is [a semantic layer] anyway?" twice but never defines it directly.
It's a lot more. A SQL VIEW is just a saved query, where a semantic layer defines the shared meaning of the data, and helps enforce consistent metrics, joins, and logic across tools. You'd be surprised at how many ways "active customer" can be represented as SQL.
We built a transformation library[0] (think a simpler, more performant dbt) for duckdb and I'd really like to create a semantic layer as an extension for it at some point.
Limiting support to only duckdb would make some really useful features trivial to implement. e.g. duckdb has a `json_serialize_sql` function that would handle a lot of the tedious parts of building a semantic layer.
0 - https://github.com/definite-app/crabwalk
How do you compare your library to SQLMesh?
if you're looking for something like SQLMesh, then I'd stay away from crabwalk.
it's purely meant to run SQL transformations in DuckDB in a reliable way with data lineage.
Nothing to do with linear, meaningful projections on embedding spaces, and everything to do with efficient maintenance of legacy data reporting systems.
I like the idea of a semantic Layer but don't think defining it in yaml is the right way to go about it.
Semantic Layer needs proper language and tooling support which Malloy provides.
Anything but a markup language / JSON.
I think Common Logic ( https://en.m.wikipedia.org/wiki/Common_Logic - ISO/IEC 24707:2007) would be a good addition to any effort trying to add a semantic layer to any database.
This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.
OT, but I really like the design of their site.