Data + Model + View

July 6, 2010

K/Kdb — scalable array processing language and database

Filed under: commercial products,in-database SML,languages,talks — Daisy @ 12:04 am

Arthur Whitney gave a talk a while back (in March) at Parlab Seminar, talking about his array processing language — K programming language and the database system kdb that supports this language and its more general version: the Q programming language.

According to Arthur, Q is similar to SQL but is simpler and more expressive. Q implements all relational capabilities as well as time series analysis. Q efficiently supports atoms, lists, maps, tables and functions. Tables are stored as lists of associative arrays (stored in columns). K and Q languages are designed with financial applications in mind.  The company kx systems has implemented kdb/tick and kdb/taq financial applications over kdb.

The main features of the k programming language is its conciseness, and it has built-in primitives for array operations and parallelism. In terms of parallel computation, k does it by data partitioning embedded in “f each x” statement. The attribute to partition data on can be specified by the user.

The main selling point of kdb is its scalability and efficiently to deal with millions record per second of real-time data and billions of historic data in the financial applications for analytic workloads. kdb is one of the early in-memory column-based database system, predate the recent C-Store work.

I think kdb is head-to-head competing with streaming database in the financial sector. It might be more efficient by getting rid of all the database overhead, although they do not have any comparisons or benchmarks. R can be used on top of Kx to do more sophisticated analytics, while kdb only support simple functions, data selection and “f each x” clause.

One interesting note is kx systems started with one customer and build itself up without any venture capital funding. And now it has many big financial institutes as its customers. Ras brought up the overhead of parallelization — load data to GPU (multicore) — so it is better have many operations chained up for the same data.

Advertisement

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.