Duplication

Qlty is able to detect duplicated code (aka copy/pasted code) across your entire repository for all of our supported languages. Code duplication occurs when identical or similar code blocks appear in multiple places within a codebase. While it might seem harmless at first glance, duplication can lead to several significant issues in software development:

  1. Slower Comprehension: Duplication can make the codebase more complex and harder to understand. Developers need to grasp multiple copies of the same logic, which can obscure the overall structure and flow of the application.
  2. Bug Propagation: If a bug exists in a duplicated code block, it will likely be present in all copies. This can lead to widespread issues and make debugging more challenging.
  3. Reduced Reusability: Code duplication reduces the modularity and reusability of code. Instead of having a single, reusable function or module, duplicated code fragments limit opportunities for efficient reuse and refactoring.

By identifying and addressing duplicated code, developers can enhance the maintainability, readability, and quality of their codebase, leading to more robust and efficient software development practices.

Duplication detection is provided by Qlty CLI and Qlty Cloud.

How Duplication Detection Works

Qlty implements it’s own duplication engine in Rust based on the Tree-Sitter parser. At its heart, uses a fairly simple algorithm to decide which parts of your code are duplicated.

  1. First, source files are parsed into abstract syntax trees (ASTs)
  2. Nodes are recursively “fingerprinted” in a way that retains it’s structural information (node type and descendants) but discards it’s literal information (like variable names)
  3. We build an index of all nodes organized by their fingerprints
  4. For any fingerprint which matches multiple nodes, we have identified duplication
  5. We remove duplication matches nested within other duplicates to output the largest matches
  6. Finally, we apply a filter to exclude duplication which is on small sections of code

See Also