跳到主要内容
版本:v0.20

Domain Layer Schema

Summary

This document describes Apache DevLake's domain layer schema.

Referring to DevLake's architecture, the data in the domain layer is transformed from the data in the tool layer. The tool layer schema is based on the data from specific tools such as Jira, GitHub, GitLab, Jenkins, etc. The domain layer schema can be regarded as an abstraction of tool-layer schemas.

DevLake Dataflow

Use Cases

  1. All metrics from pre-built dashboards are based on this data schema.
  2. As a user, you can create your own customized dashboards based on this data schema.
  3. As a contributor, you can refer to this data schema while working on the ETL logic when adding/updating data source plugins.

Data Models

This is the up-to-date domain layer schema for DevLake. Tables (entities) are categorized into 5 domains.

  1. Issue tracking: Jira issues, GitHub issues, GitLab issues, etc.
  2. Source code management: Git/GitHub/GitLab commits and refs(tags and branches), etc.
  3. Code review: GitHub PRs, GitLab MRs, etc.
  4. CI/CD: Jenkins jobs & builds, etc.
  5. Code Quality: SonarQube issues, hotspots, file metrics, etc.
  6. Cross-domain: entities that map entities from different domains to break data isolation.

Schema Diagram

Domain Layer Schema

When reading the schema, you'll notice that many tables' primary key is called id. Unlike auto-increment id or UUID, id is a string composed of several parts to uniquely identify similar entities (e.g. repo) from different platforms (e.g. GitHub/GitLab) and allow them to co-exist in a single table.

Tables that end with WIP are still under development.

Naming Conventions

  1. The name of a table is in plural form. E.g. boards, issues, etc.
  2. The name of a table which describe the relation between 2 entities is in the form of [BigEntity in singular form]_[SmallEntity in plural form]. E.g. board_issues, sprint_issues, pull_request_comments, etc.
  3. Value of the field in enum type are in capital letters. E.g. table.issues.type has 3 values, REQUIREMENT, BUG, INCIDENT. Values that are phrases, such as 'IN_PROGRESS' of table.issues.status, are separated with underscore '_'.

How to Customize Data Models

Apache DevLake provides 2 plugins:

  • customize: to create/delete columns in the domain layer schema with the data extracted from raw layer tables
  • dbt: to transform data based on the domain layer schema and generate new tables

DWD Entities - (Data Warehouse Detail)

Domain 1 - Issue Tracking

issues

An issue is the abstraction of GitHub/GitLab/BitBucket/Jira/TAPD/Zentao... issues.

fieldtypelengthdescriptionkey
idvarchar255An issue's id is composed of < plugin >:< Entity >:< PK0 >[:PK1]..."
  • For Github issues, a Github issue's id is like "github:GithubIssues:< GithubIssueId >". E.g. 'github:GithubIssues:1049355647'
  • For Jira issues, a Github repo's id is like "jira:JiraIssues:< JiraSourceId >:< JiraIssueId >". E.g. 'jira:JiraIssues:1:10063'. < JiraSourceId > is used to identify which jira source the issue came from, since DevLake users can import data from several different Jira instances at the same time.
PK
issue_keyvarchar255The key of this issue. For example, the key of this Github issue is 1145.
urlvarchar255The url of the issue. It's a web address in most cases.
titlevarchar255The title of an issue
descriptionlongtextThe detailed description/summary of an issue
typevarchar100The standard type of this issue. There are 3 standard types:
  • REQUIREMENT: this issue is a feature
  • BUG: this issue is a bug found during test
  • INCIDENT: this issue is a bug found after release
The 3 standard types are transformed from the original types of an issue. The transformation rule is set in the '.env' file or 'config-ui' before data collection. For issues with an original type that has not mapped to a standard type, the value of type will be the issue's original type.
original_typevarchar100The original type of an issue.
statusvarchar100The standard statuses of this issue. There are 3 standard statuses:
  • TODO: this issue is in backlog or to-do list
  • IN_PROGRESS: this issue is in progress
  • DONE: this issue is resolved or closed
The 3 standard statuses are transformed from the original statuses of an issue. The transformation rule:
  • For Jira issue status: transformed from the Jira issue's statusCategory. Jira issue has 3 default status categories: 'To Do', 'In Progress', 'Done'.
  • For Github issue status:
    • open -> TODO
    • closed -> DONE
original_statusvarchar100The original status of an issue.
story_pointdoubleThe story point of this issue. Only certain types(e.g. story) of Jira or TAPD issues has story points
priorityvarchar255The priority of the issue
urgencyvarchar255The urgency of the issue
componentvarchar255The component a bug-issue affects. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
severityvarchar255The severity level of a bug-issue. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
parent_issue_idvarchar255The id of its parent issue
epic_keyvarchar255The key of the epic this issue belongs to. For tools with no epic-type issues such as Github and GitLab, this field is default to an empty string
original_estimate_minutesintThe original estimation of the time allocated for this issue
time_spent_minutesintThe original estimation of the time allocated for this issue
time_remaining_minutesintThe remaining time to resolve the issue
creator_idvarchar255The id of issue creator
creator_namevarchar255The name of the creator
assignee_idvarchar255The id of issue assignee.
  • For Github issues: this is the last assignee of an issue if the issue has multiple assignees
  • For Jira issues: this is the assignee of the issue at the time of collection
assignee_namevarchar255The name of the assignee
created_datedatetime3The time issue created
updated_datedatetime3The last time issue gets updated
resolution_datedatetime3The time the issue changes to 'DONE'.
lead_time_minutesintDescribes the cycle time from issue creation to issue resolution.
  • For issues whose type = 'REQUIREMENT' and status = 'DONE', lead_time_minutes = resolution_date - created_date. The unit is minute.
  • For issues whose type != 'REQUIREMENT' or status != 'DONE', lead_time_minutes is null
original_projectvarchar255The name of the original project this issue belongs to. Transformed from a Jira project's name, a TAPD workspace's name, etc.
icon_urlvarchar255The url of the issue icon.
x_custom_field_1It depends on the type of the converted field-The value of the custom field. This field is available when utilizing the customize plugin to convert Jira's raw layer fields to the domain layer fields.

issue_assignees

This table shows the assignee(s) of issues. Multiple entries can exist per issue, as a GitHub/TAPD issue may have multiple assignees at the same time. This table can be used to get the detailed information of all issue assignees.

fieldtypelengthdescriptionkey
issue_idvarchar255Issue IDFK_issues.id
assignee_idvarchar255Assignee IDFK_accounts.id
assignee_namevarchar255Assignee Name

issue_labels

This table shows the labels of issues. Multiple entries can exist per issue. This table can be used to filter issues by label name.

fieldtypelengthdescriptionkey
label_namevarchar255Label name. Collect from GitHub issue labels or Jira issue labels
issue_idvarchar255Issue IDFK_issues.id

issue_comments

This table shows the comments of issues. Only GitHub and TAPD issue comments are collected. Issues with multiple comments are shown as multiple records. This table can be used to calculate metric - issue response time.

fieldtypelengthdescriptionkey
idvarchar255The unique id of a commentPK
issue_idvarchar255Issue IDFK_issues.id
account_idvarchar255The id of the account who made the commentFK_accounts.id
bodylongtextThe body/detail of the comment
created_datedatetime3The creation date of the comment
updated_datedatetime3The update date of the comment

issue_changelogs

This table shows the changelogs of issues. Only Jira issue changelogs are collected for now. Issues with multiple changelogs are shown as multiple records. This is transformed from Jira or TAPD changelogs.

fieldtypelengthdescriptionkey
idvarchar255The unique id of an issue changelogPK
issue_idvarchar255Issue IDFK_issues.id
author_idvarchar255The id of the user who made the changeFK_accounts.id
author_namevarchar255The id of the user who made the change
field_idvarchar255The id of changed field
field_namevarchar255The id of changed field
original_from_valuelongtextThe original value of the changed field
original_to_valuelongtextThe new value of the changed field
from_valuelongtextThe transformed/standardized original value of the changed field
to_valuelongtextThe transformed/standardized new value of the changed field
created_datedatetime3The creation date of the changelog

issue_worklogs

This table shows the work logged under issues. Only Jira issue worklogs are collected for now. Usually, an issue has multiple worklogs logged by different developers.

fieldtypelengthdescriptionkey
idvarchar255The id of the worklog.PK
author_idvarchar255The id of the author who logged the workFK_accounts.id
commentlongtext255The comment made while logging the work.
time_spent_minutesintThe time logged. The unit of value is normalized to minute. E.g. 1d =) 480, 4h30m =) 270
logged_datedatetime3The time of this logging action
started_datedatetime3Start time of the worklog
issue_idvarchar255Issue IDFK_issues.id

issue_relationships

This table shows the metadata of information about relationships between issues.

fieldtypelengthdescriptionkey
idvarchar255Issue IDPK
source_issue_idintID of the source issue in the relationship
target_issue_idintID of the target issue in the relationship
original_typevarchar255Type of relationship between the source and target issues

issue_repo_commits

This table shows the metadata of commits made to a code repository associated with specific issues.

fieldtypelengthdescriptionkey
issue_idvarchar255Issue IDPK
repo_urlvarchar255The URL of the code repositoryPK
commit_shavarchar255The SHA of the commit.PK
hostvarchar255The hostname
namespacevarchar255The namespace of the code repository
repo_namevarchar255The name of the code repository.

issue_custom_array_fields

The table below presents the custom fields of issues in an 'array' type. This table is available when utilizing the customize plugin to convert Jira's raw layer fields to the domain layer fields. It is important to note that custom fields of other types will be displayed as 'x_custom_field_1' in the issues table.

fieldtypelengthdescriptionkey
issue_idvarchar255Issue IDPK
field_idvarchar255The ID of the array field of the issue. It starts with 'x_', e.g. x_product_lines.
field_valuevarchar255The value of the array field. E.g. DevLake, DevSea, DevPond.

You can refer to the following SQL to use this table.

-- query issue count by product lines
select
count(*) as issue_count, field_value
from issues
join issue_custom_array_fields on issues.id = issue_custom_array_fields.issue_id
where field_id = 'x_product_lines'
group by field_value;

boards

A board is an issue list or a collection of issues. It's the abstraction of a Jira board, a Jira or TAPD project, a GitHub repo's issue list or a GitLab repo's issue list. This table can be used to filter issues by the boards they belong to.

fieldtypelengthdescriptionkey
idvarchar255A board's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
  • For a Github repo's issue list, the board id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >".
    E.g. "github:GithubRepo:384111310"
  • For a Jira Board, the id is like "< jira >:< JiraSourceId >< JiraBoards >:< ConnectionId >:< JiraBoardsId >".
    E.g. "jira:1:JiraBoards:1:12"
PK
namevarchar255The name of the board. Note: the board name of a Github repo 'apache/incubator-devlake' is 'apache/incubator-devlake', representing the default issue list.
descriptionvarchar255The description of the board.
urlvarchar255The url of the board. E.g. https://github.com/apache/incubator-devlake
created_datedatetime3Board creation time
typevarchar255Identify scrum and non-scrum board

board_issues

This table shows the relation between boards and issues. This table can be used to filter issues by board.

fieldtypelengthdescriptionkey
board_idvarchar255Board idFK_boards.id
issue_idvarchar255Issue idFK_issues.id

sprints

A sprint is the abstraction of Jira sprints, TAPD iterations and GitHub milestones. A sprint contains a list of issues.

fieldtypelengthdescriptionkey
idvarchar255A sprint's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
  • A sprint in a Github repo is a milestone, the sprint id is like "< github >:< GithubRepos >:< GithubRepoId >:< milestoneNumber >".
    Eg. The id for this sprint is "github:GithubRepo:384111310:5"
  • For a Jira Board, the id is like "< jira >:< JiraSourceId >< JiraBoards >:< JiraBoardsId >".
    Eg. "jira:1:JiraBoards:12"
PK
namevarchar255The name of sprint.
For Github projects, the sprint name is the milestone name. For instance, 'v0.10.0 - Introduce Temporal to DevLake' is the name of this sprint.
urlvarchar255The url of sprint.
statusvarchar255There're 3 statuses of a sprint:
  • CLOSED: a completed sprint
  • ACTIVE: a sprint started but not completed
  • FUTURE: a sprint that has not started
  • SUSPENDED: a sprint that has been suspended
started_datedatetime3The start time of a sprint
ended_datedatetime3The planned/estimated end time of a sprint. It's usually set when planning a sprint.
completed_datedatetime3The actual time to complete a sprint.
original_board_iddatetime3The id of board where the sprint first created. This field is not null only when this entity is transformed from Jira sprints.
In Jira, sprint and board entities have 2 types of relation:
  • A sprint is created based on a specific board. In this case, board(1):(n)sprint. This field original_board_id is used to show the relation.
  • A sprint can be mapped to multiple boards, a board can also show multiple sprints. In this case, board(n):(n)sprint. This relation is shown in table.board_sprints
FK_boards.id

sprint_issues

This table shows the relation between sprints and issues that have been added to sprints. This table can be used to show metrics such as 'ratio of unplanned issues', 'completion rate of sprint issues', etc

fieldtypelengthdescriptionkey
sprint_idvarchar255Sprint idFK_sprints.id
issue_idvarchar255Issue idFK_issues.id

board_sprints

fieldtypelengthdescriptionkey
board_idvarchar255Board idFK_boards.id
sprint_idvarchar255Sprint idFK_sprints.id

Domain 2 - Source Code Management

repos

GitHub, GitLab or BitBucket repositories.

fieldtypelengthdescriptionkey
idvarchar255A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github repo's id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >". E.g. 'github:GithubRepos:1:384111310'
PK
namelongtextThe name of repo. For DevLake, it's 'apache/incubator-devlake'
descriptionlongtextThe description of repo.
urllongtextThe url of repo. E.g. https://github.com/apache/incubator-devlake
owner_idvarchar255The id of the owner of repoFK_accounts.id
languagevarchar255The major language of repo. E.g. The language for apache/incubator-devlake is 'Go'
forked_fromlongtextEmpty unless the repo is a fork in which case it contains the id of the repo the repo is forked from.
deletedtinyint10: repo is active 1: repo has been deleted
created_datedatetime3Repo creation date
updated_datedatetime3Last full update was done for this repo

repo_commits

The commits belong to the history of a repository. More than one repo can share the same commits if one is a fork of the other.

fieldtypelengthdescriptionkey
repo_idvarchar255Repo idFK_repos.id
commit_shachar40Commit shaFK_commits.sha

refs

A ref is the abstraction of a branch or tag.

fieldtypelengthdescriptionkey
idvarchar255A ref's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github ref is composed of "github:GithubRepos:< GithubRepoId >:< RefUrl >". E.g. The id of release v5.3.0 of PingCAP/TiDB project is 'github:GithubRepos:384111310:refs/tags/v5.3.0' A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
PK
namevarchar255The name of the ref. E.g. 'refs/tags/v0.9.3' or 'origin/main'
repo_idvarchar255The id of repo this ref belongs toFK_repos.id
commit_shachar40The commit this ref points to at the time of collection
is_defaulttinyint1
  • 0: not the default branch
  • 1: the ref is the default branch. By the definition of Github, the default branch is the base branch for pull requests and code commits.
ref_typevarchar64There are 2 typical types:
  • BRANCH
  • TAG

commits_diffs

This table shows the commits added in a new commit compared to an old commit. This table can be used to support tag-based and deploy-based metrics.

The records of this table are computed by RefDiff plugin. The computation should be manually triggered after using GitRepoExtractor to collect commits and refs. The algorithm behind is similar to this.

fieldtypelengthdescriptionkey
new_commit_shachar40The commit new ref/deployment points to at the time of collectionPK
old_commit_shachar40The commit old ref/deployment points to at the time of collectionPK
commit_shachar40One of the added commits in the new ref compared to the old ref/deploymentPK
sorting_indexbigintAn index for debugging, please skip it

ref_commits

fieldtypelengthdescriptionkey
new_ref_idvarchar255The new ref's id for comparisonPK
old_ref_idvarchar255The old ref's id for comparisonPK
new_commit_shachar40The commit new ref points to at the time of collection
old_commit_shachar40The commit old ref points to at the time of collection

commits

fieldtypelengthdescriptionkey
shachar40One of the added commits in the new ref compared to the old refFK_commits.sha
messagevarchar255Commit message
author_namevarchar255The value is set with command git config user.name xxxxx commit
author_emailvarchar255The value is set with command git config user.email xxxxx author
authored_datedatetime3The date when this commit was originally made
author_idvarchar255The id of commit authorFK_accounts.id
committer_namevarchar255The name of committer
committer_emailvarchar255The email of committer
committed_datedatetime3The last time the commit gets modified.
For example, when rebasing the branch where the commit is in on another branch, the committed_date changes.
committer_idvarchar255The id of committerFK_accounts.id
additionsbigintAdded lines of code
deletionsbigintDeleted lines of code
dev_eqintA metric that quantifies the amount of code contribution. The data can be retrieved from AE plugin.

commit_files

The files that have been changed by commits.

fieldtypelengthdescriptionkey
idvarchar255The id is composed of "< Commit_sha >:< file_path >"FK_commits.sha
commit_shachar40Commit shaFK_commits.sha
file_pathvarchar255Path of a changed file in a commit
additionsbigintThe added lines of code in this file by the commit
deletionsbigintThe deleted lines of code in this file by the commit

components

The components of files extracted from the file paths. This can be used to analyze Git metrics by component.

fieldtypelengthdescriptionkey
repo_idvarchar255The repo idFK_repos.id
namevarchar255The name of component
path_regexvarchar255The regex to extract components from this repo's paths

commit_file_components

The relationship between commit_file and component_name.

fieldtypelengthdescriptionkey
commit_file_idvarchar255The id of commit fileFK_commit_files.id
component_namevarchar255The component name of a file

commit_parents

The parent commit(s) for each commit, as specified by Git.

fieldtypelengthdescriptionkey
commit_shachar40commit shaFK_commits.sha
parent_commit_shachar40Parent commit shaFK_commits.sha

Domain 3 - Code Review

pull_requests

Pull requests are the abstraction of GitHub pull requests, GitLab merge requests, BitBucket pull requests, etc.

fieldtypelengthdescriptionkey
idvarchar255A pull request's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..." E.g. For 'github:GithubPullRequests:1347'
titlelongtextThe title of pull request
descriptionlongtextThe body/description of pull request
statusvarchar100The PR/MR statuses are standardized to 'OPEN', 'MERGED' and 'CLOSED'. Learn how each plugin's PR statuses are standardized.
original_statusvarchar100The original status of pull requests.
parent_pr_idvarchar255The id of the parent PR
pull_request_keyvarchar255The key of PR. E.g. 1536 is the key of this PR
base_repo_idvarchar255The repo that will be updated.
head_repo_idvarchar255The repo containing the changes that will be added to the base. If the head repository is NULL, this means that the corresponding project had been deleted when DevLake processed the pull request.
author_namevarchar100The author's name of the pull request
author_idvarchar100The author's id of the pull request
urlvarchar255the web link of the pull request
typevarchar255The work-type of a pull request.For example: feature-development, bug-fix, docs,etc.
componentvarchar255The component this PR affects.
The value is transformed from Github/GitLab pull request labels by configuring GITHUB_PR_COMPONENT in .env file during installation.
created_datedatetime3The time PR created.
merged_datedatetime3The time PR gets merged. Null when the PR is not merged.
closed_datedatetime3The time PR closed. Null when the PR is not closed.
merge_commit_shachar40the merge commit of this PR. By the definition of Github, when you click the default Merge pull request option on a pull request on Github, all commits from the feature branch are added to the base branch in a merge commit.FK_commits.sha
base_refvarchar255The branch name in the base repo that will be updated
head_refvarchar255The branch name in the head repo that contains the changes that will be added to the base
base_commit_shachar40The base commit of this PR.
head_commit_shachar40The head commit of this PR.

pull_request_labels

This table shows the labels of pull request. Multiple entries can exist per pull request. This table can be used to filter pull requests by label name.

fieldtypelengthdescriptionkey
label_namevarchar255Label name
pull_request_idvarchar255Pull request IDFK_pull_requests.id

pull_request_commits

A commit associated with a pull request.

The list is additive. This means if a rebase with commit squashing takes place after the commits of a pull request have been processed, the old commits will not be deleted.

fieldtypelengthdescriptionkey
pull_request_idvarchar255Pull request idFK_pull_requests.id
commit_shachar40Commit shaFK_commits.sha
commit_author_namevarchar255The name of the person who authored the commit
commit_author_emailvarchar255The email address of the person who authored the commit.
commit_authored_datevarchar255The date and time when the commit was authored.

pull_request_comments

Normal comments, review bodies, reviews' inline comments of GitHub's pull requests or GitLab's merge requests.

fieldtypelengthdescriptionkey
idvarchar255Comment idPK
pull_request_idvarchar255Pull request idFK_pull_requests.id
bodylongtextThe body of the comments
account_idvarchar255The account who made the commentFK_accounts.id
created_datedatetime3Comment creation time
positionintDeprecated
typevarchar255- For normal comments: NORMAL
- For review comments, ie. diff/inline comments: DIFF
- For reviews' body (exist in GitHub but not GitLab): REVIEW
review_idvarchar255Review_id of the comment if the type is REVIEW or DIFF
statusvarchar255Status of the comment

Domain 4 - CI/CD

cicd_scopes

The entity to filter or group 'cicd_pipelines'.

  • For GitHub: a GitHub repo is converted to a cicd_scope
  • For GitLab: a GitLab project is converted to a cicd_scope
  • For Jenkins: a Jenkins job is converted to a cicd_scope
  • For Bamboo CI: a Bamboo plan is converted to a cicd_scope
fieldtypelengthdescriptionkey
idvarchar255A cicd_scope's id is composed of "< plugin >:< Entity >:< PluginConnectionId >[:PK1]..."
For example, a GitHub cicd_scope's id is "github:GithubRepos:< GithubConnectionId >:< GithubRepoId >", a Bamboo cicd_scope's id is 'bamboo:BambooPlan:< BambooConnectionId >:< BambooPlanKey >'
PK
namevarchar255The name of cicd_scope.
descriptionlongtextThe description of cicd_scope.
urlvarchar255The url of cicd_scope. E.g. https://github.com/apache/incubator-devlake or https://jenkins.xxx.cn/view/PROD/job/OPS_releasev2/
created_datedatetime3Creation date of the cicd_scope, nullable
updated_datedatetime3Updation date of the cicd_scope, nullable

cicd_pipelines

A cicd_pipeline is the abstraction of a top-level CI/CD execution, e.g. a GitHub workflow run, a GitLab pipeline, a BitBucket pipeline, a Jenkins build, a Bamboo plan build, etc. A cicd_pipeline contains one or more of cicd_tasks.

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
namevarchar255For gitlab, as there is no name for pipeline, so we use projectId, others have their own name
resultvarchar100The result of the pipeline. It will be standardized to 'SUCCESS', 'FAILURE' in DevLake based on each plugin's possible results.
statusvarchar100The status of the pipeline. It will be standardized to 'DONE', 'IN_PROGRESS' in DevLake based on each plugin's possible statues.
typevarchar100The value will be set to 'DEPLOYMENT' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
environmentvarchar255The value will be set to 'PRODUCTION' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
duration_secbigint unsignedhow long does this pipeline take
created_datedatetime3When this pipeline created/started
finished_datedatetime3When this pipeline finished
cicd_scope_idlongtextThe id of cicd_scope this pipeline belongs toFK_cicd_scopes.id

cicd_pipeline_commits

fieldtypelengthdescriptionkey
pipeline_idvarchar255This key is generated based on details from the original pluginPK
commit_shavarchar255The commit that triggers this pipelinePK
branchvarchar255The branch that triggers this pipeline
repovarchar255
repo_idvarchar255The repo that this pipeline belongs to

cicd_tasks

A cicd_task is the abstraction of the bottom-level CI/CD execution.

  • For GitHub: a cicd_task is a GitHub job run in a GitHub workflow run.
  • For GitLab: a cicd_task is a GitLab job run of a GitLab pipeline run.
  • For Jenkins: a cicd_task is a subtask of a Jenkins build. If a build does not have subtask(s), then the build will also be saved as a cicd_task in this table.
  • For Bamboo CI: a cicd_task is a Bamboo job build in a Bamboo plan build.
fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
namevarchar255
pipeline_idvarchar255The id of the cicd_pipeline it belongs to
resultvarchar100The result of the task. It will be standardized to 'SUCCESS', 'FAILURE' in DevLake based on each plugin's possible.
statusvarchar100The status of the task. It will be standardized to 'DONE', 'IN_PROGRESS' in DevLake based on each plugin's possible statues.
typevarchar100The value will be set to 'DEPLOYMENT' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
environmentvarchar255The value will be set to 'PRODUCTION' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
duration_secbigint unsignedHow long does this task take
started_datedatetime3When this task started
finished_datedatetime3When this task finished
cicd_scope_idlongtextThe id of cicd_scope this task belongs toFK_cicd_scopes.id

cicd_deployments

A cicd_deployment refers to a deployment at the project level. In the case where a pipeline run or build deploys across three distinct repositories, it will be categorized as ONE cicd_deployment while being recorded as THREE separate cicd_deployment_commits. It may come from several sources:

  • Domain layer cicd_pipelines, such as GitHub workflow runs, GitLab pipelines, Jenkins builds and BitBucket pipelines, etc. Deployments from cicd_pipelines will be transformed according to the regex configuration set in the Blueprint transformation before adding to this table.
  • Tool layer deployments: in v0.20, only the BitBucket\Bamboo\GitLab and GitHub(Use GraphQL APIs) plugins collect the independent deployment entity which you can find in table._tool_bitbucket_deployments and _tool_bamboo_deploy_builds, there will be more in the future.
  • Deployments pushed directly from webhooks

Additional Notes

  • CICD deployments are recorded at the project level, not the repository level.
  • CICD deployment commits are individual commits that represent a single deployment across multiple repositories.
fieldtypelengthdescriptionkey
idvarchar255The deployment_id of this deployment. The value will be set with id when it comes from webhooks.PK
cicd_scope_idvarchar255The id of cicd_scope this deployment belongs toFK_cicd_scopes.id
namevarchar255The name of the deployment
resultvarchar100The result of the deployment, e.g. SUCCESS, FAILURE
statusvarchar100The status of this deployment, e.g. IN_PROGRESS, DONE
environmentvarchar255The environment to deploy, only 'PRODUCTION' deployment will appear in v0.17
created_datedatetime3The created time of the deployment. Deprecated.
started_datedatetime3The started time of the deployment.
finished_datedatetime3The finished time of the deployment
duration_secbigintThe time this deployment takes

cicd_deployment_commits

A cicd_deployment_commit is a deployment in a specific repo. A deployment may come from several sources:

  • Domain layer cicd_pipelines, such as GitHub workflow runs, GitLab pipelines, Jenkins builds and BitBucket pipelines, etc. Deployments from cicd_pipelines will be transformed according to the regex configuration set in the Blueprint transformation before adding to this table.
  • Tool layer deployments: in v0.18, only the BitBucket and Bamboo plugins collect the independent deployment entity which you can find in table._tool_bitbucket_deployments and _tool_bamboo_deploy_builds, but there will be more in the future.
  • Deployments pushed directly from webhooks

You can query deployments from this table by SELECT DISTINCT cicd_deployment_id FROM cicd_deployments_commits.

Normally, one deployment only deploy to one repo. But in some cases, one deployment may deploy in multiple repos with different commits. In these cases, there will be multiple pairs of deployment-commit-repo, appeared in multiple entries in this table.

fieldtypelengthdescriptionkey
idvarchar255This key is the combination of the deployment's id and repo_url, e.g.
- from a GitHub workflow run: github:GithubRun:1:384111310:3521097091:https://github.com/apache/incubator-devlake
- from a Jenkins build, jenkins:JenkinsBuild:1:deploy#7:https://github.com/apache/incubator-devlake
- from a webhook, webhook:1:90489d3951711d72:e6bde456807818c5c78d7b265964d6d48b653af6
PK
cicd_scope_idvarchar255The id of cicd_scope this deployment_commit belongs toFK_cicd_scopes.id
cicd_deployment_idvarchar255The deployment_id of this deployment_commit. The value will be set with id when it comes from webhooks.
namevarchar255The name of the deployment
resultvarchar100The result of the deployment, e.g. SUCCESS, FAILURE
statusvarchar100The status of this deployment, e.g. IN_PROGRESS, DONE
environmentvarchar255The environment to deploy
created_datedatetime3The created time of the deployment. Deprecated.
started_datedatetime3The started time of the deployment.
finished_datedatetime3The finished time of the deployment
duration_secbigintThe time this deployment takes
commit_shachar40The commit sha that triggers the deployment
ref_namevarchar255The ref (branch/tag) name of the commit
repo_idvarchar255-
repo_urlvarchar191The url of the repo
prev_success_deployment_commit_idvarchar255The last successful deployment_commit_id before this one in the same cicd_scope, repo and environment, which is used to calculate the new commits deployed by this deployment, thereby measuring DORA - change lead time.

Domain 5 - Code Quality

The names of tables in the 'Code Quality' domain will start with a prefix cq_

cq_projects

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
namevarchar255The name of the project in SonarQube
qualifiervarchar255The type of project. Examples include "TRK" for regular projects and "VW" for views
visibilityvarchar64The visibility of the project. Examples include "public" and "private"
last_analysis_datedatatime3The date and time of the most recent analysis of the project
commit_shavarchar128It represents the version number or code version identifier of a project

cq_issues

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
rulevarchar255The key of the rule that the issue is violating
severityvarchar255The severity level of the issue
componentvarchar255The name of the component where the issue was found
project_keyvarchar255The key of the project that the issue belongs to
linebigintThe line number where the issue was found
statusvarchar255The status of the issue
messagelongtextThe message associated with the issue
debtbigintThe estimated time required to fix the issue
effortbigintThe effort required to fix the issue
commit_author_emailvarchar255The email address of the author of the commit that introduced the issue
assigneevarchar255The person assigned to fix the issue
hashvarchar255A hash code for the issue
tagsvarchar255Any tags associated with the issue
typevarchar255The type of the issue
scopevarchar128The scope of the issue
start_linebigint255The starting line of the issue
end_linebigint255The ending line of the issue
start_offsetbigint255The starting offset of the issue
end_offsetbigint255The ending offset of the issue
vulnerability_probabilityvarchar100The probability of the issue being a vulnerability
security_categoryvarchar100The security category of the issue
created_datedatetime3The time when the issue was created
updated_datedatetime3The time when the issue was last updated

cq_issue_code_blocks

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
issue_keyvarchar255A string that stores the key of the issue that the code block is associated with
componentvarchar255A string that stores the name of the component that the code block is associated with
start_linebigint255An integer that stores the line number where the code block starts
end_linebigint255An integer that stores the line number where the code block ends
start_offsetbigint255An integer that stores the offset where the code block starts
end_offsetbigint255An integer that stores the offset where the code block ends
msglongtextA long text field that stores the message associated with the code block

cq_file_metrics

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
project_keyvarchar255The key of the project that the issue belongs toPK
file_namelongtextlongtext fields that store the name of the file
file_pathlongtextlongtext fields that store the path of the file
file_languagelongtextlongtext fields that store the language of the file
code_smellsbigintCode smells of this file
sqale_indexbigintSqale index of the file
sqale_ratingdoubleSqale rating of the file
bugsbigintBugs rating of the file
reliability_ratinglongtextReliability rating of the file
vulnerabilitiesbigintVulnerabilities of the file
security_ratinglongtextSecurity rating of the file
security_hotspotsbigintSecurity hotspots of the file
security_hotspots_revieweddoubleSecurity hotspots reviewed of the file
security_review_ratinglongtextSecurity review rating of the file
nclocbigintNcloc of the file
coveragedoubleNcoverage of the file
lines_to_coverbigintLines to cover of the file
duplicated_lines_densitydoubleDuplicated lines density of the file
duplicated_blocksbigintDuplicated blocks of the file
duplicated_filesbigintDuplicated files of the file
duplicated_linesbigintDuplicated lines of the file
effort_to_reach_maintainability_rating_abigintEffort to reach maintainability rating a of the file
complexitybigintComplexity of the file
cognitive_complexitybigintCognitive complexity of the file
num_of_linesbigintNum of lines of the file

Domain 6 - Cross-Domain Entities

These entities are used to map entities between different domains. They are the key players to break data isolation.

There are low-level entities such as issue_commits, users, and higher-level cross domain entities such as board_repos

issue_commits

A low-level mapping between "issue tracking" and "source code management" domain by mapping issues and commits. Issue(n): Commit(n).

The original connection between these two entities lies in either issue tracking tools like Jira or source code management tools like GitLab. You have to use tools to accomplish this.

For example, a common method to connect Jira issue and GitLab commit is a GitLab plugin Jira Integration. With this plugin, the Jira issue key in the commit message written by the committers will be parsed. Then, the plugin will add the commit urls under this jira issue. Hence, DevLake's Jira plugin can get the related commits (including repo, commit_id, url) of an issue.

fieldtypelengthdescriptionkey
issue_idvarchar255Issue idFK_issues.id
commit_shachar40Commit shaFK_commits.sha

pull_request_issues

This table shows the issues closed by pull requests. It's a medium-level mapping between "issue tracking" and "source code management" domain by mapping issues and commits. Issue(n): Commit(n).

The data is extracted from the body of pull requests conforming to certain regular expression. The regular expression can be defined in GITHUB_PR_BODY_CLOSE_PATTERN in the .env file

fieldtypelengthdescriptionkey
pull_request_idvarchar255Pull request idFK_pull_requests.id
issue_idvarchar255Issue idFK_issues.id
pull_request_numbervarchar255Pull request key
issue_numbervarchar255Issue key

board_repos (Deprecated)

A way to link "issue tracking" and "source code management" domain by mapping boards and repos. Board(n): Repo(n).

fieldtypelengthdescriptionkey
board_idvarchar255Board idFK_boards.id
repo_idvarchar255Repo idFK_repos.id

accounts

This table stores of user accounts across different tools such as GitHub, Jira, GitLab, etc. This table can be joined to get the metadata of all accounts. metrics, such as 'No. of Issue closed by contributor', 'No. of commits by contributor',

fieldtypelengthdescriptionkey
idvarchar255An account's id is the identifier of the account of a specific tool. It is composed of "< Plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github account's id is composed of "< github >:< GithubAccounts >:< GithubUserId >)". E.g. 'github:GithubUsers:14050754'
PK
emailvarchar255Email of the account
full_namevarchar255Full name
user_namevarchar255Username, nickname or Github login of an account
avatar_urlvarchar255
organizationvarchar255User's organization(s)
created_datedatetime3User creation time
statusint0: default, the user is active. 1: the user is not active

users

fieldtypelengthdescriptionkey
idvarchar255id of a personPK
emailvarchar255the primary email of a person
namevarchar255name of a person

user_accounts

fieldtypelengthdescriptionkey
user_idvarchar255users.idComposite PK, FK
account_idvarchar255accounts.idComposite PK, FK

teams

fieldtypelengthdescriptionkey
idvarchar255id from the data sources, decided by DevLake usersPK
namevarchar255name of the team. E.g. team A, team B, etc.
aliasvarchar255alias or abbreviation of a team
parent_idvarchar255teams.id, default to nullFK
sorting_indexint255the field to sort team

team_users

fieldtypelengthdescriptionkey
team_idvarchar255Full name of the team. E.g. team A, team B, etc.Composite PK, FK
user_idvarchar255users.idComposite PK, FK

project

fieldtypelengthdescriptionkey
namevarchar255name for projectPK
descriptionlongtextdescription of the project
created_atdatetime3created time of project
updated_atdatetime3last updated time of project

project_metric_settings

fieldtypelengthdescriptionkey
project_namevarchar255name for projectPK
plugin_namevarchar255name for pluginPK
plugin_optionlongtextcheck if metric plugins have been enabled by the project
enabletinyint1if the metric plugins is enabled

project_mapping

fieldtypelengthdescriptionkey
project_namevarchar255name for projectPK
tablevarchar255the table name of ScopePK
row_idvarchar255the row_id in the Scope tablePK

project_pr_metrics

fieldtypelengthdescriptionkey
idvarchar255Id of PRPK
project_namevarchar100The project that this PR belongs toPK
first_review_idlongtextThe id of the first review on this pr
first_commit_shalongtextThe sha of the first commit
pr_coding_timebigintThe time it takes from the first commit until a PR is issued
pr_pickup_timebigintThe time it takes from when a PR is issued until the first comment is added to that PR
pr_review_timebigintThe time it takes to complete a code review of a PR before it gets merged
deployment_idlongtextThe id of cicd_task which deploy the commits of this PR
pr_deploy_timebigintThe time it takes from when a PR is merged to when it is deployed
pr_cycle_timebigintThe total time from the first commit to when the PR is deployed

project_issue_metrics

fieldtypelengthdescriptionkey
idvarchar255Id of IssuePK
project_namevarchar100The project that this Issue belongs toPK
deployment_idlongtextThe id of cicd_task which cause an incident

refs_issues_diffs

This table shows the issues fixed by commits added in a new ref compared to an old one. The data is computed from table.commits_diffs, table.pull_requests, table.pull_request_commits, and table.pull_request_issues.

This table can support tag-based analysis, for instance, 'No. of bugs closed in a tag'.

fieldtypelengthdescriptionkey
new_ref_idvarchar255The new ref's id for comparisonFK_refs.id
old_ref_idvarchar255The old ref's id for comparisonFK_refs.id
new_ref_commit_shachar40The commit new ref points to at the time of collection
old_ref_commit_shachar40The commit old ref points to at the time of collection
issue_numbervarchar255Issue number
issue_idvarchar255Issue idFK_issues.id

Get Domain Layer Models in Developer Mode

When developing a new plugin, you need to refer to domain layer models, as all raw data should be transformed to domain layer data to provide standardized metrics across tools. Please use the following method to access the domain data models.

import "github.com/apache/incubator-devlake/models/domainlayer/domaininfo"

domaininfo := domaininfo.GetDomainTablesInfo()
for _, table := range domaininfo {
// do something
}

If you want to learn more about plugin models, please visit PluginImplementation