User:Dimawik/Conventional Commits Specification

Conventional Commits Specification (CCS) is the specification formalizing the categorization of commits in version control systems. This classification system distinguishes code changes based on their purpose—such as features, bug fixes, or documentation updates—to facilitate automated processes like changelog generation and semantic versioning.[note 1]

Background

Tracking codebase history relies on the descriptions attached to version control updates and is made easier by standardizing these comments. E. B. Swanson's 1976 taxonomy separated software maintenance into adaptive, corrective, and perfective tasks, but the practicalities dictated more granular classification.[note 2]

The Conventional Commits Specification (CCS) is a widely adopted standard that mandates strict formatting for version control descriptions:

<type>[optional scope]: <description>
[optional body]
[optional footer(s)]

The required <type> element categorizes the commit into one of ten specific groups, ensuring the change log can be parsed programmatically.[note 3] An optional footer is also commonly used to flag backwards-incompatible code changes via a BREAKING CHANGE identifier, which external scripts and developers rely on to manage semantic version bumps safely.[note 4]

Classification types

The definitions of the ten categories were originally linked to the Angular but stricter outlines to prevent categorical overlap were eventually proposed:[note 5]

  • Feature (feat): Code additions that implement new capabilities, whether intended for end-users or internal development.
  • Fix (fix): Patches designed to resolve defects and errors in the software.
  • Performance (perf): Adjustments made to optimize resource usage, like speeding up execution or lowering memory footprint, without altering the program's primary function.
  • Style (style): Aesthetic or formatting alterations (such as correcting indentation, fixing linting warnings, or renaming variables) that do not influence execution.
  • Refactor (refactor): Modifications that reorganize existing code to boost overall maintainability, strictly excluding any changes that qualify merely as stylistic or performance-based.
  • Documentation (docs): Updates applied to textual resources, including comments within the code, README files, and typo corrections.
  • Test (test): The creation or revision of automated testing scripts.
  • Continuous Integration (ci): Adjustments to continuous integration and deployment pipelines or scripts (e.g., GitHub Actions setups).
  • Build (build): Alterations to external dependencies, compilation configurations, and project build tools.
  • Chore (chore): Any remaining development tasks that do not align with the specific labels above.

This system splits modifications by purpose (the intent behind a change) and object (the target file or framework being changed). When a change bridges multiple concepts, the underlying intent takes priority. For instance, rewriting a testing script to run more efficiently falls under "refactor" rather than "test".[note 6]

Adoption and usage

The standard's footprint is steadily expanding across open-source ecosystems. In 2025, an academic review of several thousand highly-rated GitHub repositories noted that in 2023 over a hundred projects had formally integrated the specification into their workflows.[note 7] Integration generally takes one of two forms:

  • Document declaration: Explicitly mandating the rules within contributor documentation.
  • Integrated automation: Actively enforcing compliance through CI/CD hooks or formatting software like commitlint.[note 8]

The practice is deeply entrenched within the NPM registry. A review of 381 top-tier NPM libraries revealed that nearly 95% housed commits formatted via CCS guidelines. Moreover, over half of these libraries maintained an 80% or higher compliance rate throughout their commit history, illustrating robust enforcement in JavaScript settings.[note 9] Adherence is also growing informally; roughly one in ten commits in non-mandating projects voluntarily applied the formatting in 2023.[note 10]

Challenges

Contributors encounter periodic friction when categorizing their work manually. A review of developer sentiment across platforms like Stack Overflow and GitHub highlighted four primary hurdles:[note 11]

  • Type confusion: Representing the majority of reported problems, users frequently struggle to select the correct label. Ambiguities often arise between differentiating structural improvements, optimizations, and aesthetic changes, or deciding between a new capability and a maintenance task.
  • Type aliases: Contributors sometimes ask to substitute official terms with their preferred vocabulary (for instance, swapping "fix" for "patch").
  • Changing types: Users occasionally propose expanding the vocabulary with new labels like "security" or eliminating existing ones.
  • Lack of definitions: Because the official standard relies heavily on the somewhat broad guidelines of the Angular framework, programmers frequently advocate for a stricter, more absolute glossary of terms.[note 12]

Automated classification

To ease the burden of manual categorization, researchers have investigated machine learning solutions. Older BERT algorithms excel at the traditional three-part maintenance taxonomy but face difficulties mapping commits to the ten specific CCS labels.[note 13]

Conversely, fine-tuned implementations of code-specific Large Language Models (such as CodeLlama) show significant promise, yielding higher accuracy and macro F1 scores than baseline architectures or proprietary systems like GPT-4.[note 14] However, distinguishing the "refactor" and "chore" labels continues to be a machine learning challenge because their scopes are heavily context-dependent and residual.[note 15]

Notes

  1. ^ "<type> prefix ... facilitates automated processes like release note generation ... and semantic version bump", "CCS utilizes ... <type> prefix" (p. 2277)[1]
  2. ^ "the industry has shifted towards a more fine-grained commit category", "In 1976, Swanson ... categorized commit activities into three fundamental types: perfective, adaptive, and corrective" (p. 2277)[1]
  3. ^ "CCS ... offers a uniform commit convention ... enhancing the readability for ... automated tools", "CCS utilizes ... <type> prefix to categorize commits into ten distinct types" (p. 2277)[1]
  4. ^ "According to Conventional Commits, BREAKING CHANGE tokens are documented in commit messages to indicate that the corresponding commits contain BCs" (p. 111:2)[2]
  5. ^ "we provide more detailed and less overlapping type definitions ... [using] the Angular project's definitions as a foundation" (p. 2283)[3]
  6. ^ "CCS types bifurcate commits based on two dimensions: purpose ... and object", "in cases of overlap, the commit type is determined by its purpose. For example, a commit that refactors test code would be categorized as a 'refactor' commit." (pp. 2283–2284)[4]
  7. ^ "we analyzed a total of 3,058 state-of-the-art open-source projects", "in 2023, 116 out of 3,058 most popular projects officially announced their adoption of CCS" (pp. 2280–2281)[5]
  8. ^ "categorized two modes of CCS usage ... Document Declaration and Integrated with Automatic Tools", "projects utilize various automation tools ... Common used tools include GitHub Actions, commit lint ... and git-cliff" (p. 2281)[6]
  9. ^ "360 out of 381 repositories contain commits that conform to Conventional Commits and 198 projects have over 80% commits that follow Conventional Commits" (p. 111:6)[7]
  10. ^ "In projects that have not adopted CCS, approximately 10% of the commits submitted by developers in 2023 adhered to CCS." (p. 2281)[6]
  11. ^ "analyzing 194 issues from GitHub and 100 questions from Stack Overflow", "identified four distinct categories of challenges developers have encountered" (p. 2282)[8]
  12. ^ "CCS does not provide a definitive list of type definitions but refers users to the Angular project's definitions ... This reliance leads to debates and confusion" (pp. 2282–2283)[9]
  13. ^ "The BERT model ... is widely used in prior studies ... achieving state-of-the-art performance ... classifying commit into Swanson's proposed categories", "methods from previous works are not specifically designed for CCS, which could pose challenges when directly applied" (p. 2284)[10]
  14. ^ "our approach surpasses all baselines in both macro metrics and accuracy", "demonstrates a near 7% improvement in accuracy and macro F1 over ChatGPT4" (p. 2285)[11]
  15. ^ "the 'chore' and 'refactor' categories demonstrate weaker performances", "may be attributed to our definitions, wherein 'chore' encompasses miscellaneous types ... 'refactor' is defined as refactoring excluding 'perf' and 'style'" (p. 2285)[11]

References

  1. ^ a b c Zeng et al. 2025, p. 2277.
  2. ^ Kong et al. 2025, p. 111:2.
  3. ^ Zeng et al. 2025, p. 2283.
  4. ^ Zeng et al. 2025, pp. 2283–2284.
  5. ^ Zeng et al. 2025, pp. 2280–2281.
  6. ^ a b Zeng et al. 2025, p. 2281.
  7. ^ Kong et al. 2025, p. 111:6.
  8. ^ Zeng et al. 2025, p. 2282.
  9. ^ Zeng et al. 2025, pp. 2282–2283.
  10. ^ Zeng et al. 2025, p. 2284.
  11. ^ a b Zeng et al. 2025, p. 2285.

Sources

  • Zeng, Qunhong; Zhang, Yuxia; Qiu, Zhiqing; Liu, Hui (2025). A First Look at Conventional Commits Classification (PDF). 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). Ottawa, ON, Canada: IEEE. pp. 2277–2289. doi:10.1109/ICSE55347.2025.00011.
  • Kong, Dezhen; Liu, Jiakun; Bao, Lingfeng; Lo, David (2025). "Toward Better Comprehension of Breaking Changes in the NPM Ecosystem". ACM Transactions on Software Engineering and Methodology. 34 (4). New York, NY, USA: Association for Computing Machinery: 111:1–111:23. doi:10.1145/3702991.

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.