Ingesting Data
Ingesting Data
RDWATCH_MODEL_RUN_API_KEY
Check the ./dev/.env.docker-compose
environment file or the .env
file in the root of the repository for the presence of the RDWATCH_MODEL_RUN_API_KEY
variable.
This is a special key used for services outside of the standard Django login to be able to push data into the system.
The key will be used in the scripts and in the headers for pushing data into the system. Copy the value from that file when running the below script against a local instance. When running against a production deployment, you'll need to acquire an API key for that instance and use that instead.
Loading Ground Truth Data
Within the scripts
directory is a python script named loadGroundTruth.py
. This file can be used in conjunction with the ground truth annotations located in the annotation Repo:
Annotation Repo
Running a command like:
1 |
|
will load all of the annotations for the ground truth along with the regions. using --skip_regions
will skip loading the region geometry
Loading Single Model Runs
Within the scripts
directory is a python script named loadModelRuns.py
. This can be used to load a folder filled with geojson data into the system by using a command like:
1 |
|
By default, this command uploads to the RGD server hosted at http://localhost:8000
, but that can be changed by passing an optional --rgd-endpoint
argument to the command.
Be sure that the system is up and running before running the commands.
The above command will load the data that matches the provided glob expression and give it the title 'Test_Eval_12'. The eval_num
and eval_run_num
aren't required unless the scoring database is going to be connected to the system.
Scoring
The Metrics and Test Framework can be used in addition with RGD to display scores from results.
In development mode a scoring Database is automatically initialized at URI: postgresql+psycopg2://scoring:secretkey@localhost:5433/scoring
To score data:
- Clone the Metrics and Test Framework repo.
- In the Metrics and Test Framework repo:
- Copy the alembic_example.ini
to alembic.ini
and set the sqlalchemy.url = postgresql+psycopg2://scoring:secretkey@localhost:5433/scoring
- Run pip install -e .
to install the metrics-and-test-framework package
- Run alembic upgrade head
to initialize the scoring database schema
- Execute the scoring code from inside the metrics and test framework:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
rm_path
and sm_dir
should be your test annotations.
- ground truth annotations can be retrieved from the Annotation Repo
- be sure to set the val_num
and eval_run_num
and remember them when ingesting data into RGD. The region
, eval_num
, eval_run_num
and performer
are used to connect data loaded in RGD to the scoring data.
- For Scoring data with points execute the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Manually Loading
Create a model-run
A model-run
is a grouping of site evaluations. This grouping can contain outputs of a machine learning model, ground truth outputs, etc. In order to ingest data, it must be associated with a model-run
.
You can view and create model-runs
on the /api/model-runs
endpoint.
- GET
/api/model-runs
: list all - GET
/api/model-runs/{id}
: retrieve a single instance - POST
/api/model-runs
: create an instance
Prior to creating a model run, you may have to create a performer to associate it with. RD-WATCH comes pre-configured with some performers by default; you can send a request to the /api/performers/
endpoint to check the available performers:
1 |
|
To create a new performer, you can make a separate POST request to the API.
The following JSON is an example of data to be used to create a performer
:
1 2 3 4 5 |
|
To create this performer:
1 2 3 4 5 6 |
|
Once you've ensured the desired performer exists, you can create a model run.
The following JSON is an example of data to be used to create a model-run
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
To create this model-run
:
1 2 3 4 5 6 |
|
You'll get the newly created model-run
as a response:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Add data to a model-run
You can POST
a Site Model Specification JSON to the endpoint /api/model-runs/{id}/site-model/
or a Region Model Specification JSON to the endpoint /api/model-runs/{id}/region-model/
.
Following the above example, lets POST a Site Model Specification JSON file in the current working directory named "site.json" to the newly created model-run
:
1 2 3 4 5 6 |
|
Ensure the JSON correctly validates against the Site Model Specification. While many validation errors are reported, a malformed JSON will not report helpful errors. For example, the specification mandates each 'Observation' feature must include a current_phase
string, but some data in the wild is not compliant with this and instead includes "current_phase": null
. This is a malformed JSON and will not able to be parsed.