cmd: rd /s for recursive directory removal fails intermittently and is asynchronous #458

Closed
opened 2026-01-30 21:52:52 +00:00 by claunia · 2 comments
Owner

Originally created by @mklement0 on GitHub (Nov 20, 2018).

Note: .NET Core's System.IO.Directory.Delete() and PowerShell's Remove-Item are equally affected: see here and here.

The Windows API functions DeleteFile() and RemoveDirectory() functions are inherently asynchronous (emphasis added):

The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED."

The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed."

rd /s fails to account for this asynchronous behavior, which has two implications:

  • Problem (a): Trying to delete a nonempty directory (which invariably requires recursive deletion of its content first) can fail - infrequently, but it does happen, and the failure is not reflected in the exit code (although a message is issued to stderr).

  • Problem (b): Trying to recreate a successfully deleted directory or a file immediately afterwards may fail - intermittently (easier to provoke than the other problem, but a more exotic use case overall).

    • Note: You could argue that it's not rd's responsibility to create a synchronous experience in this case, given that the underlying system API is asynchronous; however, it certainly makes for more robust, predictable programs if synchronicity is ensured - as is the case on Unix-like platforms, where the system APIs for file removal are synchronous.

Problem (a) is due to using depth-first recursion without accounting for the asynchronous deletion behavior; the problem, along with the solution, is described in this YouTube video (starts at 7:35).

Important:

  • For convenience, Windows PowerShell is used to reproduce the problem below.
  • The tests create temp. dir. $HOME/tmpDir - remove it manually afterwards, if still present.

Steps to reproduce

Setup:

  • Open a PowerShell console.
  • Paste and submit the code with the two function definitions below at the prompt to define the functions used in the tests.

Problem (a):

Assert-ReliableDirRemoval cmd

Problem (b):

Assert-SyncDirRemoval cmd

Expected behavior

The functions should loop indefinitely (terminate them with Ctrl+C), emitting a . in each iteration.

Actual behavior

Eventually - and that there is no predictable time frame is indicative of the problem - an error will occur, on the order of minutes or even longer.

Problem (a):

Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using cmd for removal.
.................................tmpDir\sub - The directory is not empty.
Deletion failed quietly or with exit code 0, tmpDir still has content: sub

That is, recursive removal of the target dir's content failed due to async timing issues; while a stderr message was issued, the exit code did not reflect failure (was still 0), and both the target directory and remnants inside it lingered.

Problem (b):

Synchronous dir-removal test: Repeatedly creating and removing a directory, using cmd for removal.
...................New-Item : Access is denied
At line:25 char:15
+       $null = New-Item -Type Directory tmpDir
+               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : PermissionDenied: (C:\Users\jdoe\tmpDir:String) [New-Item], UnauthorizedAccessException
    + FullyQualifiedErrorId : ItemExistsUnauthorizedAccessError,Microsoft.PowerShell.Commands.NewItemCommand

That is, recreating the target dir failed, because the prior removal hadn't yet completed.

Aux. function definitions:

Functions Assert-ReliableDirRemoval and Assert-SyncDirRemoval

You can paste the entire code at the prompt in order to define these functions.

function Assert-ReliableDirRemoval {
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  # Remove a preexisting directory first, if any.
  Remove-Item -EA Ignore -Recurse tmpDir
  While ($true) {
    Write-Host -NoNewline .
    # Create a subdir. tree.
    $null = New-Item -Type Directory tmpDir, tmpDir/sub, tmpDir/sub/sub, tmpDir/sub/sub/sub
    # Fill each subdir. with 1000 empty test files.
    "tmpDir", "tmpDir/sub", "tmpDir/sub/sub", "tmpDir/sub/sub/sub"| ForEach-Object {
      $dir = $_
      1..1e3 | ForEach-Object { $null > "$dir/$_" }
    }
    # Now remove the entire dir., which invariably involves deleting its contents
    # recursively first.
    switch ($method) {
      'ps'  { Remove-Item -Recurse tmpDir }
      '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
      'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
      # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
      # !! does NOT report a nonzero exit code. We detect this case below.
    }
    # Does the dir. unexpectedly still exist?
    # This can happen for two reasons:
    #  - The removal of the top-level directory itself has been requested, but isn't complete yet.
    #  - Removing the content of the top-level directory failed quietly.
    while (Test-Path tmpDir) {
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Throw "Deletion failed quietly or with exit code 0, tmpDir still has content: $output"
      } else {
        # Top-level directory removal isn't complete yet.
        # Wait for removal to complete, so we can safely recreate the directory in the next iteration.
        # This loop should exit fairly quickly.
        Write-Host -NoNewline !; Start-Sleep -Milliseconds 100
      }      
    }    
  }
}

function Assert-SyncDirRemoval { 
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Synchronous dir-removal test: Repeatedly creating and removing a directory, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  While ($true) {
    Write-Host -NoNewline .
    # Remove the test dir., which invariably involves deleting its contents recursively first.
    # Note: This could itself fail intermittently, but with just 10 files and no subdirs. is unlikely to.
    if (Test-Path tmpDir) {
      switch ($method) {
        'ps' { Remove-Item -Recurse tmpDir }
        '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
         'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
         # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
         # !! does NOT report a nonzero exit code. We detect this case below.
      }
    }
    # (Re-)create the dir. with 10 empty test files.
    # If the previous removal hasn't fully completed yet, this will fail.
    # Note: [System.IO.Directory]::Delete() could have quietly failed to remove the contents of the dir.
    #       due to async timing accidents, but, again, this is unlikely with 1- files and 
    try {
      $null = New-Item -Type Directory tmpDir
    } catch {
      # Handle the case where removal failed
      # quietly due to async vagaries while removing the dir's *content*, which
      # can happen with [System.IO.Directory]::Delete().
      # Note that if removal succeeded but is pending - the scenario we're trying to
      # get to occur - Get-ChildItem reports an access-denied error, which we ignore
      # in favor of reporting the original error from the dir. re-creation attempt.
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Write-Warning "Ignoring failed content removal, retrying..."
        # Simply try again
        Start-Sleep -Milliseconds 500; Remove-item tmpDir -Recurse
        continue
      }
      # Re-creation failed due to async removal.
      Throw $_
    }
    1..10 | ForEach-Object { $null > tmpDir/$_ }
  }
}

Environment data

cmd /c ver output:

Microsoft Windows [Version 10.0.17134.407]
Originally created by @mklement0 on GitHub (Nov 20, 2018). <sup>Note: .NET Core's `System.IO.Directory.Delete()` and PowerShell's `Remove-Item` are equally affected: see [here](https://github.com/dotnet/corefx/issues/33603) and [here](https://github.com/PowerShell/PowerShell/issues/8211).</sup> **The Windows API functions `DeleteFile()` and `RemoveDirectory()` functions are _inherently asynchronous_** (emphasis added): > The DeleteFile function **marks a file for deletion** on close. Therefore, the file **deletion does not occur until the last handle to the file is closed**. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED." > The RemoveDirectory function **marks a directory for deletion on close**. Therefore, the directory is **not removed until the last handle to the directory is closed.**" `rd /s` fails to account for this asynchronous behavior, which has two implications: * Problem (a): **Trying to delete a nonempty directory (which invariably requires recursive deletion of its _content_ first) can _fail_** - infrequently, but it does happen, and the failure is _not_ reflected in the _exit code_ (although a message is issued to stderr). * Problem (b): **Trying to _recreate_ a successfully deleted directory or a file _immediately afterwards_ may fail** - intermittently (easier to provoke than the other problem, but a more exotic use case overall). * Note: You could argue that it's not `rd`'s responsibility to create a synchronous experience in this case, given that the underlying system API is asynchronous; however, it certainly makes for more robust, predictable programs if synchronicity is ensured - as is the case on _Unix_-like platforms, where the system APIs for file removal are synchronous. Problem (a) is due to using depth-first recursion without accounting for the asynchronous deletion behavior; the problem, along with the solution, is described in [this YouTube video](https://youtu.be/uhRWMGBjlO8?t=455) (starts at 7:35). Important: * For convenience, Windows PowerShell is used to reproduce the problem below. * The tests create temp. dir. `$HOME/tmpDir` - remove it manually afterwards, if still present. Steps to reproduce ------------------ Setup: * Open a PowerShell console. * Paste and submit the code with the two function definitions below at the prompt to define the functions used in the tests. Problem (a): ```powershell Assert-ReliableDirRemoval cmd ``` Problem (b): ```powershell Assert-SyncDirRemoval cmd ``` Expected behavior ----------------- The functions should loop _indefinitely_ (terminate them with `Ctrl+C`), emitting a `.` in each iteration. Actual behavior --------------- **_Eventually_** - and that there is no _predictable_ time frame is indicative of the problem - an error will occur, on the **order of _minutes_ or even longer**. Problem (a): ```none Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using cmd for removal. .................................tmpDir\sub - The directory is not empty. Deletion failed quietly or with exit code 0, tmpDir still has content: sub ``` That is, recursive removal of the target dir's _content_ failed due to async timing issues; while a stderr message was issued, the exit code did _not_ reflect failure (was still `0`), and both the target directory and remnants inside it lingered. Problem (b): ```none Synchronous dir-removal test: Repeatedly creating and removing a directory, using cmd for removal. ...................New-Item : Access is denied At line:25 char:15 + $null = New-Item -Type Directory tmpDir + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : PermissionDenied: (C:\Users\jdoe\tmpDir:String) [New-Item], UnauthorizedAccessException + FullyQualifiedErrorId : ItemExistsUnauthorizedAccessError,Microsoft.PowerShell.Commands.NewItemCommand ``` That is, recreating the target dir failed, because the prior removal hadn't yet completed. Aux. function definitions: ----------------------- ### Functions `Assert-ReliableDirRemoval` and `Assert-SyncDirRemoval` You can paste the entire code at the prompt in order to define these functions. ```powershell function Assert-ReliableDirRemoval { param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method) Write-Host "Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using $method for removal." # Treat all errors as fatal. $ErrorActionPreference = 'Stop' Set-Location $HOME # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not. # Remove a preexisting directory first, if any. Remove-Item -EA Ignore -Recurse tmpDir While ($true) { Write-Host -NoNewline . # Create a subdir. tree. $null = New-Item -Type Directory tmpDir, tmpDir/sub, tmpDir/sub/sub, tmpDir/sub/sub/sub # Fill each subdir. with 1000 empty test files. "tmpDir", "tmpDir/sub", "tmpDir/sub/sub", "tmpDir/sub/sub/sub"| ForEach-Object { $dir = $_ 1..1e3 | ForEach-Object { $null > "$dir/$_" } } # Now remove the entire dir., which invariably involves deleting its contents # recursively first. switch ($method) { 'ps' { Remove-Item -Recurse tmpDir } '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) } 'cmd' { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } } # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but # !! does NOT report a nonzero exit code. We detect this case below. } # Does the dir. unexpectedly still exist? # This can happen for two reasons: # - The removal of the top-level directory itself has been requested, but isn't complete yet. # - Removing the content of the top-level directory failed quietly. while (Test-Path tmpDir) { if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content? Throw "Deletion failed quietly or with exit code 0, tmpDir still has content: $output" } else { # Top-level directory removal isn't complete yet. # Wait for removal to complete, so we can safely recreate the directory in the next iteration. # This loop should exit fairly quickly. Write-Host -NoNewline !; Start-Sleep -Milliseconds 100 } } } } function Assert-SyncDirRemoval { param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method) Write-Host "Synchronous dir-removal test: Repeatedly creating and removing a directory, using $method for removal." # Treat all errors as fatal. $ErrorActionPreference = 'Stop' Set-Location $HOME # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not. While ($true) { Write-Host -NoNewline . # Remove the test dir., which invariably involves deleting its contents recursively first. # Note: This could itself fail intermittently, but with just 10 files and no subdirs. is unlikely to. if (Test-Path tmpDir) { switch ($method) { 'ps' { Remove-Item -Recurse tmpDir } '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) } 'cmd' { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } } # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but # !! does NOT report a nonzero exit code. We detect this case below. } } # (Re-)create the dir. with 10 empty test files. # If the previous removal hasn't fully completed yet, this will fail. # Note: [System.IO.Directory]::Delete() could have quietly failed to remove the contents of the dir. # due to async timing accidents, but, again, this is unlikely with 1- files and try { $null = New-Item -Type Directory tmpDir } catch { # Handle the case where removal failed # quietly due to async vagaries while removing the dir's *content*, which # can happen with [System.IO.Directory]::Delete(). # Note that if removal succeeded but is pending - the scenario we're trying to # get to occur - Get-ChildItem reports an access-denied error, which we ignore # in favor of reporting the original error from the dir. re-creation attempt. if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content? Write-Warning "Ignoring failed content removal, retrying..." # Simply try again Start-Sleep -Milliseconds 500; Remove-item tmpDir -Recurse continue } # Re-creation failed due to async removal. Throw $_ } 1..10 | ForEach-Object { $null > tmpDir/$_ } } } ``` Environment data ---------------- `cmd /c ver` output: ```none Microsoft Windows [Version 10.0.17134.407] ```
claunia added the Issue-BugArea-InteropResolution-Won't-FixProduct-Cmd.exe labels 2026-01-30 21:52:52 +00:00
Author
Owner

@mklement0 commented on GitHub (Nov 6, 2020):

Note:

  • The following update is based on running the Assert-ReliableDirRemoval .net and Assert-ReliableDirRemoval ps tests above successfully in Windows 10 version 20H2 and the information gleaned from this Stack Overflow post.

  • As of this writing, the documentation of the underlying DeleteFile Windows API makes no mention of the change in behavior.


Starting with Windows 10 version 1909, (at least) build 18363.657 (I don't know that Windows Server version and build that corresponds to; run winver.exe to check your version and build), the DeleteFile Windows API function now exhibits synchronous behavior at the Windows API level, which implicitly solves the problems with PowerShell's Remove-Item and .NET's System.IO.File.Delete / System.IO.Directory.Delete.

However, even on such Windows 10 version rd /s still fails intermittently (that is, Assert-ReliableDirRemoval cmd still fails after a while), suggesting that something other than DeleteFile is used (perhaps it is SetFileInformationByHandle, to which FILE_DISPOSITION_POSIX_SEMANTICS would have to be passed).

This means that rd /s now could be made synchronous on recent enough Windows 10 versions, but - unlike in PowerShell and .NET - that would require additional work.

@mklement0 commented on GitHub (Nov 6, 2020): Note: * The following update is based on running the `Assert-ReliableDirRemoval .net` and `Assert-ReliableDirRemoval ps` tests above successfully in Windows 10 version 20H2 and the information gleaned from [this Stack Overflow post](https://stackoverflow.com/a/60512798/45375). * As of this writing, the documentation of the underlying [`DeleteFile`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-deletefilea) Windows API makes no mention of the change in behavior. --- **Starting with Windows 10 version `1909`, (at least) build `18363.657`** (I don't know that Windows _Server_ version and build that corresponds to; run `winver.exe` to check your version and build), **the `DeleteFile` Windows API function now exhibits _synchronous_ behavior at the Windows API level**, which implicitly solves the problems with PowerShell's `Remove-Item` and .NET's `System.IO.File.Delete` / `System.IO.Directory.Delete`. **However, even on such Windows 10 version `rd /s` _still_ fails intermittently** (that is, `Assert-ReliableDirRemoval cmd` still fails after a while), suggesting that something other than `DeleteFile` is used (perhaps it is `SetFileInformationByHandle`, to which `FILE_DISPOSITION_POSIX_SEMANTICS` would have to be passed). **This means that `rd /s` now _could_ be made synchronous on recent enough Windows 10 versions**, but - unlike in PowerShell and .NET - that **would require additional work.**
Author
Owner

@zadjii-msft commented on GitHub (Dec 7, 2021):

As mentioned in https://github.com/dotnet/runtime/issues/27958 and https://github.com/PowerShell/PowerShell/issues/8211

Starting with Windows 10 version 1909, (at least) build 18363.657 (I don't know that Windows Server version and build that corresponds to; run winver.exe to check your version and build), the DeleteFile Windows API function now exhibits synchronous behavior at the Windows API level, which implicitly solves the problems with PowerShell's Remove-Item and .NET's System.IO.File.Delete / System.IO.Directory.Delete (but, curiously, not with cmd.exe's rd /s).

Unfortunately, we won't be updating cmd.exe anytime soon. If you want a modern, updated shell on Windows, you should try out PowerShell 7.

Sorry to let this one linger on the backlog so long. It was filed long before we really got our github muscles flexed, and got a little lost.

@zadjii-msft commented on GitHub (Dec 7, 2021): As mentioned in https://github.com/dotnet/runtime/issues/27958 and https://github.com/PowerShell/PowerShell/issues/8211 > Starting with Windows 10 version `1909`, (at least) build `18363.657` (I don't know that Windows Server version and build that corresponds to; run `winver.exe` to check your version and build), the `DeleteFile` Windows API function now exhibits synchronous behavior at the Windows API level, which implicitly solves the problems with PowerShell's `Remove-Item` and .NET's `System.IO.File.Delete` / `System.IO.Directory.Delete` (but, curiously, not with `cmd.exe's rd /s`). Unfortunately, [we won't be updating `cmd.exe` anytime soon](https://github.com/microsoft/terminal/blob/main/doc/Niksa.md#cmd). If you want a modern, updated shell on Windows, you should try out [PowerShell 7](https://github.com/PowerShell/PowerShell). Sorry to let this one linger on the backlog so long. It was filed long before we really got our github muscles flexed, and got a little lost.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#458