diff --git a/README.md b/README.md index f10a122..1914ff6 100644 --- a/README.md +++ b/README.md @@ -39,18 +39,14 @@ For more information about Kuzu itself, see its $ nimble install kuzu +The current version of this library is built for Kuzu v0.8.2. + ## Usage +See the [Usage documentation](USAGE.md). -> [!TODO]- Human readable usage docs! -> -> ... The nim generated source isn't great when pulling in -> the C wrapper auto-gen stuff. -> -> If you're here and reading this before I have proper docs written, see the -> tests/ for some working examples. - +You can also find a bunch of working examples in the tests. ## Contributing diff --git a/USAGE.md b/USAGE.md new file mode 100644 index 0000000..12e7c95 --- /dev/null +++ b/USAGE.md @@ -0,0 +1,501 @@ + +# Usage + +This document is a quick guide for how to use this library. If you've cloned +this repository, you can: + +> % nimble docs + +To auto-generate API docs -- with the C wrappers, it's a lot and it's hard to +know where to start. + + +## Prior Reading + +If you're just starting with Kuzu or graph databases, it's probably a good idea +to familiarize yourself with the [Kuzu Documentation](https://docs.kuzudb.com/) +and the [Cypher Language](https://docs.kuzudb.com/tutorials/cypher/). This +library won't do much for you by itself without a basic understanding of Kuzu usage. + + +## Checking Compatibility + +This is a wrapper (with some additional niceties) for the system-installed Kuzu +shared library. As such, the version of this library might not match with what +you currently have installed. + +Check the [README](README.md), the [History](History.md), and the following +table to ensure you're using the correct version for your Kuzu +installation. I'll make a modest effort for backwards compatibility, and other +versions might work. Don't count too heavily on it. :-) + +| Kuzu Library Version | Nim Kuzu Version | +| -------------------- | ---------------- | +| v0.8.2 | v0.1.0 | + +You can use the `kuzuVersionCompatible()` function (along with the +`KUZU_VERSION` and `KUZU_LIBVERSION` constants) to quickly check if things are +looking right. + +```nim +import kuzu + +echo KUZU_VERSION #=> "0.1.0" +echo KUZU_LIBVERSION #=> "0.8.2" +echo kuzuVersionCompatible() #=> true +``` + + +## Connecting to a Database + +Just call `newKuzuDatabase()`. Without an argument (or with an empty string), +the database is in-memory. Any other argument is considered a filesystem path +-- it will create an empty database if the path is currently non-existent, or +open an existing database otherwise. + +```nim +# "db" is in-memory and will evaporate when the process ends. +var db = newKuzuDatabase() +``` + +```nim +# "db" is persistent, stored in the directory "data". +var db = newKuzuDatabase("data") +``` +The database path is retained, and can be recalled via `db.path`. + +```nim +db.path #=> "data" +``` + +### Database Configuration + +The database is configured with default options by default. You can see them +via: + +```nim +echo $db.config +#=> (buffer_pool_size: 23371415552, max_num_threads: 16, ... + +# Is compression enabled? +if db.config.enable_compression: + echo "Yes!" +``` + +You can alter configuration options when connecting by passing a `kuzuConfig` +object as the second argument to `newKuzuDatabase()`: + +```nim +# Open a readonly handle. +var db = newKuzuDatabase( "data", kuzuConfig( read_only=true ) ) +``` + +### The Connection + +All interaction with the database is performed via a connection object. There +are limitations to database handles and connection objects -- see the +[Kuzu Concurrency](https://docs.kuzudb.com/concurrency/) docs for details! + +Call `connect` on an open database handle to create a new connection: + +```nim +var conn = db.connect +``` + +You can set a maximum query lifetime, and interrupt any running queries (thread +shutdown, ctrl-c, etc): + +```nim +# Set a maximum ceiling on how long a query can run, in milliseconds. +conn.queryTimeout( 10 * 1000 ) # 10 seconds + +# Cancel a running query. +conn.queryInterrupt() +``` + +## Performing Queries + +You can perform a basic query via the appropriately named `query()` function on +the connection. Via this method, queries are run immediately. A +`KuzuQueryResult` is returned - this is the object you'll be interacting with to +see results. + +A `KuzuQueryResult` can be turned into a string to quickly see the column +headers and all tuple results: + +```nim +var res = conn.query( """RETURN "Hello world", 1234, [1,2,3]""" ) + +echo $res #=> +# Hello world|1234|LIST_CREATION(1,2,3) +# Hello world|1234|[1,2,3] +``` + +Additionally, various query metadata is available for introspection: + +```nim +var res = conn.query( """ +RETURN + "Hello world" AS hi, + 1234 AS pin, + [1,2,3] AS list +""" ) + +echo res.num_columns #=> 3 +echo res.num_tuples #=> 1 +echo res.compile_time #=> 14.028 +echo res.execution_time #=> 1.624 + +# Return the column names as a sequence. +echo res.column_names #=> @["hi", "pin", "list"] + +# Return the column data types as a sequence. +echo res.column_types #=> @[KUZU_STRING, KUZU_INT64, KUZU_LIST] +``` + +### Prepared Statements + +If you're supplying an argument to a query, or you're running a query +repeatedly, it's safer and faster to create a prepared statement via `prepare()` +on the connection. These statements are only compiled once, and execution is +deferred until you call `execute()`. + +```nim +var stmt = conn.prepare( """ +RETURN + "Hello world" AS hi, + 1234 AS pin, + [1,2,3] AS list +""" ) + +# This returns a KuzuQueryResult, just like `conn.query()`. +var res = stmt.execute() +``` + +Arguments are labeled variables (prefixed with `$`) within the query. +Parameters are matched by providing a Nim tuple argument to `execute()` - a +simple round trip example: + +```nim +var stmt = conn.prepare( """ +RETURN + $message AS message, + $digits AS digits, + LIST_CREATION($list) AS list +""" ) + +var res = stmt.execute( (message: "Hello", digits: 1234, list: "1,2,3") ) + +echo $res #=> +# message|digits|list +# Hello|1234|[1,2,3] +``` + +#### Type Conversion + +When binding variables to a prepared statement, most Nim types are automatically +converted to their respective Kuzu types. + +```nim +var stmt = conn.prepare( """RETURN $num AS num""" ) +var res = stmt.execute( (num: 12) ) + +echo res.column_types[0] #=> KUZU_INT32 +``` + +This might not necessarily be what you want - sometimes you'd rather be strict +with typing, and you might be inserting into a column that has a different type +than the default. + +You can use [integer type suffixes](https://nim-lang.org/docs/manual.html#lexical-analysis-numeric-literals), or casting to be explicit as usual: + +```nim +var stmt = conn.prepare( """RETURN $num AS num""" ) +var res: KuzuQueryResult + +res = stmt.execute( (num: 12'u64) ) +echo res.column_types[0] #=> KUZU_UINT32 + +res = stmt.execute( (num: 12.float) ) +echo res.column_types[0] #=> KUZU_DOUBLE +``` + +#### Kuzu Specific Types + +In the example above, you may have noticed the `LIST_CREATION($list)` in the +prepared query, and that we passed a string `1,2,3` as the `$list` parameter. + +This is a useful way to easily use most Kuzu types without needing corresponding +Nim ones -- if you're inserting into a table that is using a custom type, you +can cast it using the query itself during insertion! + +This has the additional advantage of letting Kuzu error check the validity of +the content, and it works with the majority of types. + +An extended example: + +```nim +import std/sequtils +import kuzu + +var db = newKuzuDatabase() +var conn = db.connect + +var res: KuzuQueryResult + +# Create a node table. +# +res = conn.query """ +CREATE NODE TABLE Example ( + id SERIAL, + num UINT8, + done BOOL, + comment STRING, + karma DOUBLE, + thing UUID, + created DATE, + activity TIMESTAMP, + PRIMARY KEY(id) +) +""" + +# Prepare a statement for adding a node. +# +var stmt = conn.prepare """ +CREATE (e:Example { + num: $num, + done: $done, + comment: $comment, + karma: $karma, + thing: UUID($thing), + created: DATE($created), + activity: TIMESTAMP($activity) +}) +""" + +# Add a node row that contains specific Kuzu types. +# +res = stmt.execute(( + num: 2, + done: true, + comment: "Types!", + karma: 16.7, + thing: "e0e7232e-bec9-4625-9822-9d1a31ea6f93", + created: "2025-03-29", + activity: "2025-03-29" +)) + +# Show the current contents. +res = conn.query( """MATCH (e:Example) RETURN e.*""" ) +echo $res #=> +# e.id|e.num|e.done|e.comment|e.karma|e.thing|e.created|e.activity +# 0|2|True|Types!|16.700000|e0e7232e-bec9-4625-9822-9d1a31ea6f93|2025-03-29|2025-03-29 00:00:00 + +# Show column names and their Kuzu types. +for pair in res.column_names.zip( res.column_types ): + echo pair #=> + # ("e.id", KUZU_SERIAL) + # ("e.num", KUZU_UINT8) + # ("e.done", KUZU_BOOL) + # ("e.comment", KUZU_STRING) + # ("e.karma", KUZU_DOUBLE) + # ("e.thing", KUZU_UUID) + # ("e.created", KUZU_DATE) + # ("e.activity", KUZU_TIMESTAMP) +``` + +## Reading Result Sets + +So far we've just been showing values by converting the entire `KuzuQueryResult` +to a string. Convenient for quick examples and debugging, but not much else. + +A `KuzuQueryResult` is an iterator. You can use regular Nim functions that yield +each `KuzuFlatTuple` -- essentially, each row that was returned in the set. + +```nim +var res = conn.query """ + UNWIND [1,2,3] AS items + UNWIND ["thing"] AS thing + RETURN items, thing +""" + +# KuzuFlatTuple can be stringified just like the result set. +for row in res: + echo row #=> + # 1|thing + # 2|thing + # 3|thing +``` + +Once iteration has reached the end, it is automatically rewound for reuse. + +You can manually get the next `KuzuFlatTuple` via `getNext()`. Calling +`getNext()` after the last row results in an error. Use `hasNext()` to check +before calling. + +```nim +var res = conn.query """ + UNWIND [1,2,3] AS items + RETURN items +""" + +# Get the first row. +if res.hasNext: + var row = res.getNext + echo row #=> 1 + +echo res.getNext #=> 2 +echo res.getNext #=> 3 +echo res.getNext #=> KuzuIndexError exception! +``` + +Manually rewind the `KuzuQueryResult` via `rewind()`. + + +## Working with Values + +A `KuzuFlatTuple` contains the entire row. You can index a value at its column +position, returning a `KuzuValue`. + +```nim +var res = conn.query """ +RETURN + 1 AS num, + true AS done, + "A comment" AS comment, + 12.84 AS karma, + UUID("b41deae0-dddf-430b-981d-3fb93823e495") AS thing, + DATE("2025-03-29") AS created +""" + +var row = res.getNext + +for idx in ( 0 .. res.num_columns-1 ): + var value = row[idx] + echo res.column_names[idx], ": ", value, " (", value.kind, ")" #=> + # num: 1 (KUZU_INT64) + # done: True (KUZU_BOOL) + # comment: A comment (KUZU_STRING) + # karma: 12.840000 (KUZU_DOUBLE) + # thing: b41deae0-dddf-430b-981d-3fb93823e495 (KUZU_UUID) + # created: 2025-03-29 (KUZU_DATE) +``` + +### Types + +A `KuzuValue` can always be stringified, irrespective of its Kuzu type. You can +check what type it is via the 'kind' property. + +```nim +var res = conn.query """RETURN "hello"""" +var value = res.getNext[0] + +echo value.kind #=> KUZU_STRING +``` + +A `KuzuValue` has conversion methods for Nim base types. You'll likely want to +convert it for regular Nim usage: + +```nim +var res = conn.query( "RETURN 2560" ) +var value = res.getNext[0] + +echo value + 1 #=> Type error! + +echo $value #=> "2560" +echo value.toInt64 + 1 #=> 2561 +``` + + +### Lists + +A `KuzuValue` of type `KUZU_LIST` can be converted to a Nim sequence of +`KuzuValues` with the `toList()` function: + +```nim +import std/sequtils +import kuzu + +var res = conn.query """ +RETURN [10, 20, 30] +""" + +var value = res.getNext[0] + +var list = value.toList +echo list #=> @[10,20,30] + +echo list.map( func(v:KuzuValue): int = v.toInt64 * 10 ) #=> @[100,200,300] +``` + + +### Struct-like Objects + +Various Kuzu types can act like a struct - this includes `KUZU_NODE`, +`KUZU_REL`, and of course explicit `KUZU_STRUCT` itself, among others. + +Convert a `KuzuValue` to a `KuzuStructValue` with `toStruct()`. For +convenience, this is also aliased to `toNode()` and `toRel()`. + +Once converted, you can access struct values by passing the key name to `[]`: + +```nim +var res = conn.query """ +RETURN {movie: "The Fifth Element", year: 1997} +""" + +var value = res.getNext[0] + +var struct = value.toStruct +echo struct["movie"], " was released in ", struct["year"], "." #=> +# "The Fifth Element was released in 1997." +``` + +Here's a much more complicated example, following a node paths: + +```nim +import + std/sequtils, + std/strformat +import kuzu + +var db = newKuzuDatabase() +var conn = db.connect + +var res = conn.query """ + CREATE NODE TABLE Person ( + id SERIAL, + name STRING, PRIMARY KEY (id) + ); + CREATE REL TABLE Knows ( + FROM Person TO Person, + since INT + ); + + CREATE (p:Person {name: "Bob"}); + CREATE (p:Person {name: "Alice"}); + CREATE (p:Person {name: "Bruce"}); + CREATE (p:Person {name: "Tom"}); + + CREATE (a:Person {name: "Bruce"})-[r:Knows {since: 1997}]->(b:Person {name: "Tom"}); + CREATE (a:Person {name: "Bob"})-[r:Knows {since: 2009}]->(b:Person {name: "Alice"}); + CREATE (a:Person {name: "Alice"})-[r:Knows {since: 2010}]->(b:Person {name: "Bob"}); + CREATE (a:Person {name: "Bob"})-[r:Knows {since: 2003}]->(b:Person {name: "Bruce"}); +""" + +res = conn.query """ + MATCH path = (a:Person)-[r:Knows]->(b:Person) + WHERE r.since > 2000 + RETURN r.since as Since, nodes(path) as People + ORDER BY r.since +""" + +# Who knows who since when? +# +for row in res: + var since = row[0] + var people = row[1].toList.map( proc(p:KuzuValue):KuzuStructValue = p.toNode ) + echo &"""{people[0]["name"]} has known {people[1]["name"]} since {since}.""" + +``` + diff --git a/experiments/imdb/Makefile b/experiments/imdb/Makefile new file mode 100644 index 0000000..f96c207 --- /dev/null +++ b/experiments/imdb/Makefile @@ -0,0 +1,8 @@ +build: + nim c -d:release imdb_import.nim + nim c -d:release imdb_find_actor_path.nim + +clean: + rm -f *.csv + rm -f *.tsv.gz + diff --git a/experiments/imdb/imdb-results.pdf b/experiments/imdb/imdb-results.pdf new file mode 100644 index 0000000..5a092e5 Binary files /dev/null and b/experiments/imdb/imdb-results.pdf differ diff --git a/experiments/imdb/imdb_find_actor_path.nim b/experiments/imdb/imdb_find_actor_path.nim new file mode 100644 index 0000000..991c00e --- /dev/null +++ b/experiments/imdb/imdb_find_actor_path.nim @@ -0,0 +1,159 @@ +# vim: set et sta sw=4 ts=4 : +# +# Find links between two actors, via other actors +# they've worked with across movies. +# +# Outputs to screen, and generates a dot file with +# results for graphviz visualization. +# +# Compile: +# % nim c -d:release imdb_find_actor_path.nim +# +# Run the "imdb_import" utility before using this! +# + +import + std/math, + std/os, + std/sequtils, + std/strutils, + std/strformat +import kuzu + +const DB = "imdb" +const DOT = "imdb-results.dot" + +if not DB.dirExists: + echo """Cowardly refusing to run without an imdb database. + (see: imdb_import in this directory.)""" + quit 1 + +if paramCount() < 2: + echo "I require 2 actor names, in quotes." + quit 1 + +var stmt: KuzuPreparedStatement +var res: KuzuQueryResult + +var fromActor = paramStr(1) +var toActor = paramStr(2) + +var db = newKuzuDatabase( "imdb" ) +var conn = db.connect + +echo "Database opened: ", db.path + +stmt = conn.prepare( "MATCH (a:Actor {name:$actor}) RETURN count(a)" ) +res = stmt.execute( (actor: fromActor) ) +if res.getNext[0].toInt64 < 1: + echo "Couldn't find actor ", &"\"{fromActor}\"." + quit 1 + +stmt = conn.prepare( "MATCH (a:Actor {name:$actor}) RETURN count(a)" ) +res = stmt.execute( (actor: toActor) ) +if res.getNext[0].toInt64 < 1: + echo "Couldn't find actor ", &"\"{toActor}\"." + quit 1 + +stmt = conn.prepare """ + MATCH (a:Actor {name:$fromActor}) + MATCH (b:Actor {name:$toActor}) + MATCH path = (a)-[r:ActedIn* ALL SHORTEST]-(b) + RETURN DISTINCT nodes(path) AS nodes, length(path) AS hops + ORDER BY hops +""" +stdout.write &"Finding paths from {fromActor} to {toActor}... " +stdout.flushFile +res = stmt.execute( (fromActor: fromActor, toActor: toActor) ) + +if res.num_tuples == 0: + echo "Unable to find any paths!" + quit 1 + + +let dotFile = DOT.open( fmWrite ) +dotFile.write "strict digraph {\n" +dotfile.write &""" +graph[ + rankdir=LR, + size="8.5,11", + margin=0.4, + label="Tracing from \"{fromActor}\" to \"{toActor}\"" +]; +node [ + fontname=Arial +];""", "\n\n" + +var fastestPath = res.getNext[1].toInt64 +echo &"{round(res.execution_time / 1000, 2)} seconds, ", + "fastest path in ", fastestPath, " hop(s).\n" +res.rewind + +# 1st pass, get all nodes for per-node styles +# +var nodes: seq[ tuple[kind: char, label: string] ] = @[] +for row in res.items: + for rawNode in row[0].toList: + var node = rawNode.toNode + var kind = $node["_LABEL"] + case kind: + of "Actor": + nodes.add( (kind: 'a', label: $node["name"]) ) + of "Movie": + nodes.add( (kind: 'm', label: $node["title"]) ) + else: + discard + +# Pre-define unique node styles +# +for node in nodes.deDuplicate(): + case node.kind: + of 'a': + dotFile.write( &"\"{node.label}\" " ) + if node.label == fromActor or node.label == toActor: + dotFile.write """[shape=box, style=rounded, penwidth=3.0];""", "\n" + else: + dotFile.write """[shape=box, style=rounded];""", "\n" + of 'm': + dotFile.write( &"\"{node.label}\" " ) + dotFile.write """[shape=note];""", "\n" + else: + discard + +# 2nd pass: emit relations to stdout and create dot links +# +dotFile.write "\n\n" +if res.num_tuples > 1: + echo res.num_tuples, " tied ", fastestPath, " hop paths:" +for row in res.items: + var output: string + var pathLen = row[1].toInt64 + var pathStep = 0 + + for rawNode in row[0].toList: + pathStep += 1 + var node = rawNode.toNode + if $node["_LABEL"] == "Actor": + output.add $node["name"] + dotFile.write &""""{$node["name"]}"""" + if pathStep == 1: + output.add " was in " + dotFile.write " -> " + elif pathStep < pathLen: + output.add " who was in " + dotFile.write " -> " + dotFile.write "\n" + else: + dotFile.write ";\n" + elif $node["_LABEL"] == "Movie": + output.add &""""{$node["title"]}"""" + output.add " with " + dotFile.write &""""{$node["title"]}"""" + dotFile.write " -> \n" + echo &"{output}." + +dotFile.write "}\n" +dotFile.close +echo "\n\nYou can run 'dot -Tpdf < imdb-results.dot > imdb-results.pdf' if you have graphviz installed." + + diff --git a/experiments/imdb/imdbimport.nim b/experiments/imdb/imdb_import.nim similarity index 95% rename from experiments/imdb/imdbimport.nim rename to experiments/imdb/imdb_import.nim index 788873b..13a9166 100644 --- a/experiments/imdb/imdbimport.nim +++ b/experiments/imdb/imdb_import.nim @@ -7,7 +7,7 @@ # directors, and TV shows are intentionally omitted. # # Compile: -# % nim c -d:release imdbdata.nim +# % nim c -d:release imdb_import.nim # # Sourced from: https://datasets.imdbws.com/ # See: https://developer.imdb.com/non-commercial-datasets/ @@ -124,8 +124,8 @@ if not DB.dirExists: """CREATE NODE TABLE Movie (movieId INT64, title STRING, year UINT16, durationMins INT, PRIMARY KEY (movieId))""", """CREATE REL TABLE ActedIn (FROM Actor TO Movie)""" ]: - var result = conn.query( schema ) - duration += result.execution_time.int + var q = conn.query( schema ) + duration += q.execution_time.int echo &"Created database schema in {duration}ms." duration = 0 @@ -136,8 +136,8 @@ if not DB.dirExists: """COPY ActedIn FROM "./title.principals.csv" (header=true, ignore_errors=true)""" ]: echo dataload - var result = conn.query( dataload ) - duration += result.execution_time.int + var q = conn.query( dataload ) + duration += q.execution_time.int echo &"Imported data in {duration / 1000}s." echo "Done!" diff --git a/kuzu.nimble b/kuzu.nimble index e06a162..744746c 100644 --- a/kuzu.nimble +++ b/kuzu.nimble @@ -23,6 +23,8 @@ task clean, "Remove all non-repository artifacts.": exec "fossil clean -x" task docs, "Generate automated documentation.": - exec "nim doc --project --outdir:docs src/kuzu.nim" exec "nim md2html --project --outdir:docs README.md" + exec "nim md2html --project --outdir:docs History.md" + exec "nim md2html --project --outdir:docs USAGE.md" + exec "nim doc --project --outdir:docs src/kuzu.nim" diff --git a/src/kuzu/queries.nim b/src/kuzu/queries.nim index 2aac566..76d32a3 100644 --- a/src/kuzu/queries.nim +++ b/src/kuzu/queries.nim @@ -166,4 +166,5 @@ iterator items*( query: KuzuQueryResult ): KuzuFlatTuple = ## Iterate available tuples, yielding to the block. while query.hasNext: yield query.getNext + query.rewind diff --git a/src/kuzu/tuple.nim b/src/kuzu/tuple.nim index ac8fb61..54ad857 100644 --- a/src/kuzu/tuple.nim +++ b/src/kuzu/tuple.nim @@ -15,7 +15,7 @@ func `$`*( tpl: KuzuFlatTuple ): string = result.removeSuffix( "\n" ) -func `[]`*( tpl: KuzuFlatTuple, idx: int ): KuzuValue = +func `[]`*( tpl: KuzuFlatTuple, idx: int|uint64 ): KuzuValue = ## Returns a KuzuValue at the given *idx*. result = new KuzuValue diff --git a/tests/queries/t_auto_rewinds_the_iterator.nim b/tests/queries/t_auto_rewinds_the_iterator.nim new file mode 100644 index 0000000..6bca303 --- /dev/null +++ b/tests/queries/t_auto_rewinds_the_iterator.nim @@ -0,0 +1,16 @@ +# vim: set et sta sw=4 ts=4 : + +import kuzu + +let db = newKuzuDatabase() +let conn = db.connect + +var q = conn.query """ +UNWIND [1,2,3] AS items +RETURN items +""" + +for row in q: discard + +assert q.hasNext == true +