cmd: rd /s for recursive directory removal fails intermittently and is asynchronous #456

Open
opened 2026-01-30 21:52:47 +00:00 by claunia · 0 comments
Owner

Originally created by @mklement0 on GitHub (Nov 20, 2018).

Note: .NET Core's System.IO.Directory.Delete() and PowerShell's Remove-Item are equally affected: see here and here.

The Windows API functions DeleteFile() and RemoveDirectory() functions are inherently asynchronous (emphasis added):

The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED."

The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed."

rd /s fails to account for this asynchronous behavior, which has two implications:

  • Problem (a): Trying to delete a nonempty directory (which invariably requires recursive deletion of its content first) can fail - infrequently, but it does happen, and the failure is not reflected in the exit code (although a message is issued to stderr).

  • Problem (b): Trying to recreate a successfully deleted directory or a file immediately afterwards may fail - intermittently (easier to provoke than the other problem, but a more exotic use case overall).

    • Note: You could argue that it's not rd's responsibility to create a synchronous experience in this case, given that the underlying system API is asynchronous; however, it certainly makes for more robust, predictable programs if synchronicity is ensured - as is the case on Unix-like platforms, where the system APIs for file removal are synchronous.

Problem (a) is due to using depth-first recursion without accounting for the asynchronous deletion behavior; the problem, along with the solution, is described in this YouTube video (starts at 7:35).

Important:

  • For convenience, Windows PowerShell is used to reproduce the problem below.
  • The tests create temp. dir. $HOME/tmpDir - remove it manually afterwards, if still present.

Steps to reproduce

Setup:

  • Open a PowerShell console.
  • Paste and submit the code with the two function definitions below at the prompt to define the functions used in the tests.

Problem (a):

Assert-ReliableDirRemoval cmd

Problem (b):

Assert-SyncDirRemoval cmd

Expected behavior

The functions should loop indefinitely (terminate them with Ctrl+C), emitting a . in each iteration.

Actual behavior

Eventually - and that there is no predictable time frame is indicative of the problem - an error will occur, on the order of minutes or even longer.

Problem (a):

Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using cmd for removal.
.................................tmpDir\sub - The directory is not empty.
Deletion failed quietly or with exit code 0, tmpDir still has content: sub

That is, recursive removal of the target dir's content failed due to async timing issues; while a stderr message was issued, the exit code did not reflect failure (was still 0), and both the target directory and remnants inside it lingered.

Problem (b):

Synchronous dir-removal test: Repeatedly creating and removing a directory, using cmd for removal.
...................New-Item : Access is denied
At line:25 char:15
+       $null = New-Item -Type Directory tmpDir
+               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : PermissionDenied: (C:\Users\jdoe\tmpDir:String) [New-Item], UnauthorizedAccessException
    + FullyQualifiedErrorId : ItemExistsUnauthorizedAccessError,Microsoft.PowerShell.Commands.NewItemCommand

That is, recreating the target dir failed, because the prior removal hadn't yet completed.

Aux. function definitions:

Functions Assert-ReliableDirRemoval and Assert-SyncDirRemoval

You can paste the entire code at the prompt in order to define these functions.

function Assert-ReliableDirRemoval {
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  # Remove a preexisting directory first, if any.
  Remove-Item -EA Ignore -Recurse tmpDir
  While ($true) {
    Write-Host -NoNewline .
    # Create a subdir. tree.
    $null = New-Item -Type Directory tmpDir, tmpDir/sub, tmpDir/sub/sub, tmpDir/sub/sub/sub
    # Fill each subdir. with 1000 empty test files.
    "tmpDir", "tmpDir/sub", "tmpDir/sub/sub", "tmpDir/sub/sub/sub"| ForEach-Object {
      $dir = $_
      1..1e3 | ForEach-Object { $null > "$dir/$_" }
    }
    # Now remove the entire dir., which invariably involves deleting its contents
    # recursively first.
    switch ($method) {
      'ps'  { Remove-Item -Recurse tmpDir }
      '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
      'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
      # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
      # !! does NOT report a nonzero exit code. We detect this case below.
    }
    # Does the dir. unexpectedly still exist?
    # This can happen for two reasons:
    #  - The removal of the top-level directory itself has been requested, but isn't complete yet.
    #  - Removing the content of the top-level directory failed quietly.
    while (Test-Path tmpDir) {
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Throw "Deletion failed quietly or with exit code 0, tmpDir still has content: $output"
      } else {
        # Top-level directory removal isn't complete yet.
        # Wait for removal to complete, so we can safely recreate the directory in the next iteration.
        # This loop should exit fairly quickly.
        Write-Host -NoNewline !; Start-Sleep -Milliseconds 100
      }      
    }    
  }
}

function Assert-SyncDirRemoval { 
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Synchronous dir-removal test: Repeatedly creating and removing a directory, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  While ($true) {
    Write-Host -NoNewline .
    # Remove the test dir., which invariably involves deleting its contents recursively first.
    # Note: This could itself fail intermittently, but with just 10 files and no subdirs. is unlikely to.
    if (Test-Path tmpDir) {
      switch ($method) {
        'ps' { Remove-Item -Recurse tmpDir }
        '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
         'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
         # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
         # !! does NOT report a nonzero exit code. We detect this case below.
      }
    }
    # (Re-)create the dir. with 10 empty test files.
    # If the previous removal hasn't fully completed yet, this will fail.
    # Note: [System.IO.Directory]::Delete() could have quietly failed to remove the contents of the dir.
    #       due to async timing accidents, but, again, this is unlikely with 1- files and 
    try {
      $null = New-Item -Type Directory tmpDir
    } catch {
      # Handle the case where removal failed
      # quietly due to async vagaries while removing the dir's *content*, which
      # can happen with [System.IO.Directory]::Delete().
      # Note that if removal succeeded but is pending - the scenario we're trying to
      # get to occur - Get-ChildItem reports an access-denied error, which we ignore
      # in favor of reporting the original error from the dir. re-creation attempt.
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Write-Warning "Ignoring failed content removal, retrying..."
        # Simply try again
        Start-Sleep -Milliseconds 500; Remove-item tmpDir -Recurse
        continue
      }
      # Re-creation failed due to async removal.
      Throw $_
    }
    1..10 | ForEach-Object { $null > tmpDir/$_ }
  }
}

Environment data

cmd /c ver output:

Microsoft Windows [Version 10.0.17134.407]
Originally created by @mklement0 on GitHub (Nov 20, 2018). <sup>Note: .NET Core's `System.IO.Directory.Delete()` and PowerShell's `Remove-Item` are equally affected: see [here](https://github.com/dotnet/corefx/issues/33603) and [here](https://github.com/PowerShell/PowerShell/issues/8211).</sup> **The Windows API functions `DeleteFile()` and `RemoveDirectory()` functions are _inherently asynchronous_** (emphasis added): > The DeleteFile function **marks a file for deletion** on close. Therefore, the file **deletion does not occur until the last handle to the file is closed**. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED." > The RemoveDirectory function **marks a directory for deletion on close**. Therefore, the directory is **not removed until the last handle to the directory is closed.**" `rd /s` fails to account for this asynchronous behavior, which has two implications: * Problem (a): **Trying to delete a nonempty directory (which invariably requires recursive deletion of its _content_ first) can _fail_** - infrequently, but it does happen, and the failure is _not_ reflected in the _exit code_ (although a message is issued to stderr). * Problem (b): **Trying to _recreate_ a successfully deleted directory or a file _immediately afterwards_ may fail** - intermittently (easier to provoke than the other problem, but a more exotic use case overall). * Note: You could argue that it's not `rd`'s responsibility to create a synchronous experience in this case, given that the underlying system API is asynchronous; however, it certainly makes for more robust, predictable programs if synchronicity is ensured - as is the case on _Unix_-like platforms, where the system APIs for file removal are synchronous. Problem (a) is due to using depth-first recursion without accounting for the asynchronous deletion behavior; the problem, along with the solution, is described in [this YouTube video](https://youtu.be/uhRWMGBjlO8?t=455) (starts at 7:35). Important: * For convenience, Windows PowerShell is used to reproduce the problem below. * The tests create temp. dir. `$HOME/tmpDir` - remove it manually afterwards, if still present. Steps to reproduce ------------------ Setup: * Open a PowerShell console. * Paste and submit the code with the two function definitions below at the prompt to define the functions used in the tests. Problem (a): ```powershell Assert-ReliableDirRemoval cmd ``` Problem (b): ```powershell Assert-SyncDirRemoval cmd ``` Expected behavior ----------------- The functions should loop _indefinitely_ (terminate them with `Ctrl+C`), emitting a `.` in each iteration. Actual behavior --------------- **_Eventually_** - and that there is no _predictable_ time frame is indicative of the problem - an error will occur, on the **order of _minutes_ or even longer**. Problem (a): ```none Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using cmd for removal. .................................tmpDir\sub - The directory is not empty. Deletion failed quietly or with exit code 0, tmpDir still has content: sub ``` That is, recursive removal of the target dir's _content_ failed due to async timing issues; while a stderr message was issued, the exit code did _not_ reflect failure (was still `0`), and both the target directory and remnants inside it lingered. Problem (b): ```none Synchronous dir-removal test: Repeatedly creating and removing a directory, using cmd for removal. ...................New-Item : Access is denied At line:25 char:15 + $null = New-Item -Type Directory tmpDir + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : PermissionDenied: (C:\Users\jdoe\tmpDir:String) [New-Item], UnauthorizedAccessException + FullyQualifiedErrorId : ItemExistsUnauthorizedAccessError,Microsoft.PowerShell.Commands.NewItemCommand ``` That is, recreating the target dir failed, because the prior removal hadn't yet completed. Aux. function definitions: ----------------------- ### Functions `Assert-ReliableDirRemoval` and `Assert-SyncDirRemoval` You can paste the entire code at the prompt in order to define these functions. ```powershell function Assert-ReliableDirRemoval { param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method) Write-Host "Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using $method for removal." # Treat all errors as fatal. $ErrorActionPreference = 'Stop' Set-Location $HOME # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not. # Remove a preexisting directory first, if any. Remove-Item -EA Ignore -Recurse tmpDir While ($true) { Write-Host -NoNewline . # Create a subdir. tree. $null = New-Item -Type Directory tmpDir, tmpDir/sub, tmpDir/sub/sub, tmpDir/sub/sub/sub # Fill each subdir. with 1000 empty test files. "tmpDir", "tmpDir/sub", "tmpDir/sub/sub", "tmpDir/sub/sub/sub"| ForEach-Object { $dir = $_ 1..1e3 | ForEach-Object { $null > "$dir/$_" } } # Now remove the entire dir., which invariably involves deleting its contents # recursively first. switch ($method) { 'ps' { Remove-Item -Recurse tmpDir } '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) } 'cmd' { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } } # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but # !! does NOT report a nonzero exit code. We detect this case below. } # Does the dir. unexpectedly still exist? # This can happen for two reasons: # - The removal of the top-level directory itself has been requested, but isn't complete yet. # - Removing the content of the top-level directory failed quietly. while (Test-Path tmpDir) { if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content? Throw "Deletion failed quietly or with exit code 0, tmpDir still has content: $output" } else { # Top-level directory removal isn't complete yet. # Wait for removal to complete, so we can safely recreate the directory in the next iteration. # This loop should exit fairly quickly. Write-Host -NoNewline !; Start-Sleep -Milliseconds 100 } } } } function Assert-SyncDirRemoval { param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method) Write-Host "Synchronous dir-removal test: Repeatedly creating and removing a directory, using $method for removal." # Treat all errors as fatal. $ErrorActionPreference = 'Stop' Set-Location $HOME # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not. While ($true) { Write-Host -NoNewline . # Remove the test dir., which invariably involves deleting its contents recursively first. # Note: This could itself fail intermittently, but with just 10 files and no subdirs. is unlikely to. if (Test-Path tmpDir) { switch ($method) { 'ps' { Remove-Item -Recurse tmpDir } '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) } 'cmd' { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } } # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but # !! does NOT report a nonzero exit code. We detect this case below. } } # (Re-)create the dir. with 10 empty test files. # If the previous removal hasn't fully completed yet, this will fail. # Note: [System.IO.Directory]::Delete() could have quietly failed to remove the contents of the dir. # due to async timing accidents, but, again, this is unlikely with 1- files and try { $null = New-Item -Type Directory tmpDir } catch { # Handle the case where removal failed # quietly due to async vagaries while removing the dir's *content*, which # can happen with [System.IO.Directory]::Delete(). # Note that if removal succeeded but is pending - the scenario we're trying to # get to occur - Get-ChildItem reports an access-denied error, which we ignore # in favor of reporting the original error from the dir. re-creation attempt. if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content? Write-Warning "Ignoring failed content removal, retrying..." # Simply try again Start-Sleep -Milliseconds 500; Remove-item tmpDir -Recurse continue } # Re-creation failed due to async removal. Throw $_ } 1..10 | ForEach-Object { $null > tmpDir/$_ } } } ``` Environment data ---------------- `cmd /c ver` output: ```none Microsoft Windows [Version 10.0.17134.407] ```
claunia added the Issue-BugArea-InteropResolution-Won't-FixProduct-Cmd.exe labels 2026-01-30 21:52:47 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#456