PreciDatos specifications
The proposed solution outlined below attempts to economically disincentivise bad-faith actors, and economically incentivise good-faith actors. Further to this, the proposed solution also aims to support a wide variety of different use cases, as well as number of actors.
PreciDatos is a data reporting system to ensure validity (i.e., data is complete and in a good format) and truthfulness (i.e., data was not corrupted by the reporter) of the data. In itself, PreciDatos does not incentivise commitments, nor does it check for commitment fulfillment. Its expected output is to guarantee that uploaded data is valid and truthful, with an auditable trail of verifications.
Datasets may be of any type, e.g., emissions, energy production, cleanups etc. We discuss specific use cases here.
An actor submits a report by committing to the following pieces:
- Data resources (any format)
- Report metadata (type of report, level of detail)
- Stake
- Committee members
The actor commits to the report by digitally signing it on the blockchain.
Ideally, the stake should broadly correspond to the amount and granularity of the data: larger reports take longer to process or find flaws, while in the worst case of fraudulent data reports investigations are more costly.
To incentivise the use of PreciDatos by actors, we advocate for the payment of some form of interest on the stake, given out to honest reporters and validators. Modalities of interest payment however are not part of the core specifications of PreciDatos. We thus expand on them in a separate note.
- The stake of honest, committed data reporters uploading valid and truthful data is returned, potentially with interest.
- The stake of honest, uncommitted data reporters uploading invalid but truthful data is used to incentivise standardisation by validators. Whatever is left is returned.
- The stake of dishonest data reporters uploading falsified data is slashed and used to pay for whistleblowers and investigations into the challenging claims.
We define ideal standard data format for each report type. Note that companies may decide to not follow the standard data format, using their staked funds or potential reward to pay for standardisation.
This data schema should have a sub-schema within it that can differ for various use cases. For example, plastics clean up report would have a different sub-schema from a solar power generation report.
The approximate structure (pseudocode style) of the schema and its fields would look like this:
- Date
- Digital signature
- Staked amount
- Report
- Sub-schema
- List of claims
- Would be most simple if only considering emission reduction commitments as opposed to other commitment types (i.e., renewable energy installation, energy efficiency) but this system could start with emission reductions and then be scaled to other commitment types.
- Emission reduction commitment (percent reduction, base year, target year, boundary (e.g., location of emission reductions, specific sectors that are included/not included)
- Progress on emission reduction commitment (inventory year emissions disaggregated by Scope (Scope 1, 2, 3) (must be at least 1 year later than the base year inventory emissions)
- Proofs of claims
- Electricity bills, emission factors being used, CDP disclosure report or any disclosure report sent to one of the reporting networks that current collects data and for which the actor currently participates in, inventory audit
The report and invoice data may make use homomorphic encryption to allow for secrecy, while still being able to compute aggregate data.
The data may be challenged on the grounds that it is invalid, in the case of an honest but uncommitted actor.
All parties participating in this consortium, including reporters and validators, need to first validate a submission. This includes the following:
- 1.Digital signature verification
- 1.The signature has been signed by the reporter
- 2.The signature is valid for the data submitted
- 2.Schema
- 1.That the format of the submission is valid according to the schema
- 2.That the format of select parts of the submission is valid for the specific sub-schema for the use case used by members of this consortium
- 3.Application of business rules
- 1.Various parts of the submitted data need to be verified according to business rules specific to the use case used by this consortium
- For example, the calculations used to work out for much is payable per unit of work used
- 2.Various parts of the submitted data need to tally with the invoiced amount
- 3.Any automatic verification of proofs of claims
- For example, comparing to know sources of data such as weather reports, satellite data, and any custom sensors that have been installed
Any party participating in the consortium, upon completing validation, is expected to, in most cases, discover that a submission is valid. However, in some cases, it may discover that that submission is in fact invalid. In this latter scenario, that party may decide to issue a challenge.
A party that has performed validation, found a submission to be invalid, and decided to challenge it, will signal its challenge to all members of the consortium. The intended effect of signalling this challenge is to state that the submission can not be accepted by the consortium, and thus to trigger an investigation into the claims.
The approximate structure (pseudo code style) of the schema and its fields of the signal should look as follows:
- Date
- Zero-knowledge proof of membership
- Staked amount
- Report
- Sub-schema
- Proofs of contradictions of claims
In order to avoid unnecessary or frivolous investigations - since these are assumed to require significant time or resource investments - a staking mechanism is introduced. The party submitting the challenge also deposits an amount as part of the challenge signal. This deposit amount from the challenging party, along with the amount staked by the submitting party, is locked up in an escrow mechanism.
As members of the consortium do not necessarily want each other to know who has issued the challenge signals, the challenge proof will contain a zero knowledge proof that verifies that the submitter of the challenge is a member of the set of parties in the consortium, without the need to reveal their own identity.
The investigation is completed using real world (non-computer) techniques. When the investigation is completed, the result of that investigation is input back into the computer system. This input’s primary purpose is to release the amounts locked up in escrow pertaining to this investigation. The input would apportion the fraction of the amount to the challenging party and the submitting party. If the submission was deemed to be fraudulent, the challengers get all of the deposited amount. If the submission was deemed to be correct, the submitter gets all, or some, of the deposited amount. As with other escrow systems, a portion of the deposited amount will go to neither the submitter nor the challenger, and this is used to pay for the investigation.
The exact fraction of the deposit which will be paid out, and to whom, can be programmatically configured to reflect an optimal game-theoretic equilibrium.
A whistleblower is different from a validator-challenge in the following ways:
- An individual, rather than an organisation
- Is able to provide proofs of contradictory evidence
When a submission contains a proof for a particular claim, and the whistleblower has evidence that this proof is fraudulent in some way, that person may decide to signal a challenge containing details of said proofs, using the same procedure that is available to the organisations. Just like a validator challenge, the whistleblower challenge also deposits some amount.
As a whistleblower is likely to be an individual within an organisation (that is itself a member of the consortium), and quite likely to be an employee or a contractor, they have a strong incentive not to whistleblow out of fear of retribution, such as the cancellation of employment or contract. In order to protect against this, the signal submitted by the whistleblower will contain a zero knowledge proof that verifies that the whistleblower is a member associated with a member of the consortium, without the need to reveal their own identity.
A limitation of this system that we should acknowledge here is that this creates a perverse incentive for organisations to minimise the count of individuals who are designated as members of the organisation. Perhaps, even more cynically, also influence organisations to select certain categories of individuals to be designated as members of the organisation.
To address this limitation, we propose that an alternative path could be taken in the implementation of the whistleblowing process. This is where we continue to allow people who are able to prove that they are a member of the organisations that are members of the consortium to whistleblow, without revealing their identity. In addition to this, we also allow a signal by any person - without the need to prove their membership - for consideration by the members of the consortium. However, unlike a signal where there is proof of membership, in this case, the signal must be “relayed” by members of the consortium in order to be able to trigger an investigation. Should an insufficient number of consortium members do this, this particular signal is effectively ignored.
On the flip side, if the public is educated about the whistleblowing mechanism, organisations with a high number of potential whistleblowers but relatively low number of reports are likely to be viewed in a good light. If corporations value such social standing, then it would be in their interest to have as many of their members be registered in the system and also to report data truthfully.
Each consortium will pre-agree upon the quorum required to trigger an investigation. It may also pre-agree upon different quora required for different levels or tiers of investigation. In this scenario, it may even pre-agree upon different minimum required deposits for each tier of investigation. Further to this, the consortium may also agree upon different quora requirements for organisation submitted challenges and whistleblower submitted challenges.
These pre-agreed quora values start off as part of the initial consortium configuration, and may change beyond the initial configuration upon consensus by all parties to the consortium.
All submissions should contain proofs. Identity is in the clear for submissions, as well as the proof data.
All challenge signals should also contain proofs. Identity is not in the clear for challenges, but rather in the form of zero knowledge proofs that show membership of a set. The proof data is in the clear.
When a challenge results in an investigation, that investigation should be carried out by an independent auditor. That auditor, or the auditor selection process, should be pre-agreed upon by the parties in the consortium. Ideally, these auditors should be randomly selected from a pool of qualified auditors.
Proofs are key in minimising fraudulent data, and thus are central in any implementation of this system. As proofs are costly to acquire, they also need to be adequately compensated, and this comes in the form of incentives for honest (correct) submissions, and disincentives for dishonest (incorrect) submissions.
- The schema used in the data formats above are highly non specific. An actual implementation would need:
- Schema formats that are automatically verifiable.
- For example XML using XSD or JSON using JSON-Schema.
- Data that validates according to the schema should be readily processable within the limitations of the execution environment.
- For example, if the execution environment is the Ethereum Virtual Machine, it is likely that some pre-processing will be necessary.
- A key challenge and limitation of this system is that of real world data crossing the boundary into a computer system.
- If that point of entry is corruptible, it introduces a potential source of failure for the entire system.
- Care must be taken to ensure proper digital security, for example key management practices.
- Likewise, care must be taken to physically secure the input devices and sensors used to collect data. The best digital security cannot overcome lapses in meatspace security, as garbage input results in garbage output.
- Agnostic implementation.
- The technology described here, uses terminology frequently associated with cryptocurrency networks.
- The demonstration implementation, also makes use of a cryptocurrency network.
- However, it is worth noting that the ideas and processes described here do not have any intrinsic need to be implemented using a cryptocurrency network - this could also be implemented without smart contracts or cryptocurrency payments, by using a database, application server, and an alternative payment network.
Last modified 3yr ago