a global dataset version control system (GDVCS) built on the distributed web
qri is a global dataset version control system (GDVCS) built on the distributed web
Breaking that down:
- global so that if anyone, anywhere has published work with the same or similar datasets, you can discover it.
- Specific to datasets because data deserves purpose-built tools
- version control to keep data in sync, attributing all changes to authors
- On the distributed web to make all of the data published on qri simultaneously available, letting peers work on data together.
If you’re unfamiliar with version control, particularly the distributed kind, well you’re probably viewing this document on github — which is a version control system intended for code. Its underlying technology – git – popularized some magic sauce that has inspired a generation of programmers and popularized concepts at the heart of the distributed web. Qri is applying that family of concepts to four common data problems:
- Discovery Can I find data I’m looking for?
- Trust Can I trust what I’ve found?
- Friction Can I make this work with my other stuff?
- Sync How do I handle changes in data?
Because qri is global and content-addressed, adding data to qri also checks the entire network to see if someone has added it before. Since qri is focused solely on datasets, it can provide meaningful search results. Every change on qri is associated with a peer, creating an audit-able trail you can use to quickly see what has changed and who has changed it. All datasets on qri are automatically described at the time of ingest using a flexible schema that makes data naturally inter-operate. Qri comes with tools to turn all datasets on the network into a JSON API with a single command. Finally, all changes in qri are tracked & synced.
Qri is comprised of many specialized packages. Below you will find a summary of each package.
|Package||Go Docs||Go Report Card||Description|
||functions that call to the repo to carry out tasks|
||user accessible layer, primarily made for communication with our frontend webapp|
||our command line interface|
||user configuration details, includes peer’s profile|
||takes arguments from the cmd and api layer and forms proper requests to call to the action layer|
||the peer to peer communication layer of qri|
||the repository: saving, removing, and storing datasets, profiles, and the config|
||the blueprint for a dataset, the atoms that make up qri|
||the blueprint for a registry: the service that allows profiles to be unique and datasets to be searchable|
||brings starlark into qri to be used in transforms, adds qri specific functionality|
||the starlark standard library available for qri transform scripts|
||“qri file sytem” is Qri’s file system abstraction for getting & storing data from different sources|
||package to handle in, out, and error streams: gives us better control of where we send output and errors|
||the dataset diffing package|
||used to describe the structure of a dataset, so we can validate datasets and determine dataset interop|
The following packages are not under Qri, but are important dependencies, so we display their latest versions for convenience.
Building From Source
To build qri you’ll need the go programming language on your machine.
$ go get github.com/qri-io/qri $ cd $GOPATH/src/github.com/qri-io/qri $ make build $ go install
If you are building from source by cloning the repo, make sure to clone the repo to your go path:
make build command will have a lot of output. That’s good! Its means it’s working 🙂
It’ll take a minute, but once everything’s finished a new binary
qri will appear in the
$GOPATH/bin directory. You should be able to run:
and see help output.