code

Monday, March 25, 2024

Mastering Problem-Solving and Cultivating a Research Mindset in the ChatGPT Era (and why you still need to RTFM)

 

In this post I'll present a technical problem (some will say it's probably a bug more than it is a feature) I had with a VR app, how I researched the core issue causing it, how I relied too much on chatGPT outputs for my questions and almost gave up on solving it since I was misled it's impossible, how I managed to find the right answer and fix the problem eventually, and most importantly, why you, me and everyone should still need to RTFM even in the era of AI, LLM and chatGPT (and other buzzword). 

*The content of this post isn't cyber security focused this time, but does have some subjects that are related like a bit of OS internals, network and especially the research mindset required to tackle a problem, which is the same I use in my daily work.


OK let's start


So not so long time ago, I bought a new PC and needed to migrate all my storage (several old school HDDs) to my new super fast SSDs  🤘 However, I didn't want to just normally copy it, since I also wanted to save all the original files metadata, especially the time related ones (file created date, last written date, etc.). 

You see, when I'm looking for some file in some folder, I heavily use those fields (sometimes I don't remember a name of a file, but I do remember around what time I last opened it/saved it/downloaded it etc. Weird habit, I know).

To this task, I relied on robocopy.exe to copy all the files from my old PC to my new one, using the /DAT flags which should also copy the file metadata fields, including timestamps:  



I copied all my files using this method some time ago, it seemed to work perfectly, as I could see all my original timestamps for each file copied to the new PC. 

Fast forward last week. I own a Meta Quest 3 VR headset and use it for research and fun time. One cool thing you can do with a VR headset for example, is watching 3D SBS movies (because no one owns a 3D TV anymore I guess). For this purpose,  I have several video players on my headset, but I mostly use a player app called Skybox VR  

The player is connected to my PC using a Windows shared drive through SMB, and can stream media from it. As seen on Windows Explorer, I also expected that sorting the shared drive files by date in Skybox, would work the same way it is on Windows. My expectation was wrong.

Skybox has an option to sort a directory by 'Date', but it doesn't explicitly say what 'date' it is using exactly to sort the files (is it file created date ? file last modified date? something else?) and for some weird reason, the date on ALL files restored from my old to my new PC, was set to the same one - the date I copied them to the new PC. 

So just to be clear here, on Windows, this is how a file metadata looked like


ALL GOOD, it kept the original dates of the original file copied from the old PC. But on Skybox it was set to the date I copied the files with robocopy.

WTF?

OK, I thought to myself, this should be super easy, just find out what file date field this app is actually using for sorting, write a small Python script to go over all the restored files, set each file's date field (unknown at the moment ) to be the same as the 'created' or 'modified' time and then just let Skybox rescan the SMB share and that's it. The files should now be sorted by the actual creation date. 


Yeah, Not So Fast

First of all, I couldn't find that specific date field Skybox is showing with Windows Explorer 'properties' window nor with any known Windows built-in tools like 'dir' or Powershell's Get-Item. 

I also tried some 3rd party tools like exiftool - still nothing. 

So I tried another direction - chatGPT. I tried to ask some questions about how to get ALL the metadata associated to a file on a NTFS filesystem, but it was keep referring to the same 3 date fields I'm already aware of and can see with any of the methods mentioned above: creation time, last access time and last modified time (which turned out to actually be 'last write time' internally in Windows). Looking back on all the answers now, it did gave a hint which I didn't notice at first - after I mentioned the app is using SMB to receive the files, but that the original file system where they're retrieved from is NTFS, it told me I should look at MFT (Master File Table) documentation. 

Forget it, let's try something else!

I could use Wireshark to capture the local SMB network traffic, but I was suspecting the app would probably use SMB3 encrypted streams, which felt like too much of hassle to try to decrypt just to find some field that should be pretty straightforward to find.

I also thought maybe Skybox is keeping track of the date it first saw a shared file internally in the app's storage, and that this obscure date attribute it's using might be not related to SMB at all (turned out to be wrong).

But wait a sec, Skybox is just a VR app, it works on several VR platforms, including Meta Quest (which uses an Android based OS).  I already knew a lot of VR apps are built with Unity framework, so I just looked for Unity apps on GitHub, that have some SMB client related features, to see what date attributes a file object returned from SMB file request might have. It didn't take long to find some, but I couldn't easily find what technology those Unity functions use behind the scenes to communicate with the SMB server and get the share drive files and their metadata. 

OK so what's next?

chatGPT did gave me an idea, whether this app is using Unity or not, eventually they do use some kind of SMB protocol version to get the files from my Windows shared drive, so why not just trying to mimic the app's behavior? I can create my own basic SMB client,, and see what file metadata I might retrieve from the SMB server on the Windows machine, then I can see if this date attribute is coming from the SMB server or not!

The code example it shared for this, actually did work on the first try !

(JK, I fixed some issues but most of them were minor :) 



from smb.SMBConnection import SMBConnection
import os

# SMB server details
server_name = 'SERVER_NAME'
server_ip = 'SERVER_IP'
username = 'USERNAME'
password = 'PASSWORD'
share_name = 'SHARE_NAME'
file_path = 'FILE_PATH'

# Establish SMB connection
conn = SMBConnection(username, password, 'pysmb', server_name, use_ntlm_v2=True)
conn.connect(server_ip, 445)

try:
    # Retrieve file metadata
    file_attributes = conn.getAttributes(share_name, file_path)
    
    # Extract and print metadata
    metadata = {
        'File Size': file_attributes.file_size,
        'Creation Time': file_attributes.create_time,
        'Last Access Time': file_attributes.access_time,
        'Last Modification Time': file_attributes.modify_time,
        # Add more metadata attributes as needed
    }
    
    print("File Metadata:")
    for key, value in metadata.items():
        print(f"{key}: {value}")
        
finally:
    # Close the SMB connection
    conn.close()

 


This Python example uses a lib called pysmb that can be easily installed with pip.

However, based on this chatGPT output alone, we still don't see any additional date attributes SMB might use in addition to the ones we already see on Windows (reminder- creation time, last access time and last modified time).

Running this code with a debugger I was able breakpoint on the value pysmb returned on

conn.getAttributes(share_name, file_path)

Going down the rabbit hole with the debugger, I reached 'SharedFile' class defined in base.py file, where besides those 3 date values there was additional one chatGPT didn't mention:

 

It's called 'last_attr_change_time' in pysmb. 

Could this be the date attribute value Skybox uses to sort the shared SMB files?

This was an easy check, just needed to compare a specific file 'Date' as shown in Skybox app VS the value I received for this 'last_attr_change_time' when running the above Python program for the same file.

And what do you know, it's the exact same date value!

Cool, I found the 'lost' date attribute and made some progress !

Though, I still need to find how to set this file 'last attribute change time' to the creation or last write time, and also find out where this time attribute is even coming from?

After some further research, I found those pysmb class fields are parsed from the

SMB2 CREATE Response packet

In the WinAPI documentation , you can see the full header

 

Trying to match the header's field names to those found in pysmb, I figured out 'last_attr_change_time' is actually ChangeTime in the header above, and this is the name of that time attribute I was looking for all this time.

After some additional research and cups of coffee it turned out, this field is indeed set for any file in NTFS in MFT, and it's updated every time the file's metadata is changed, but since it's stored in a lower layer, it isn't shown by 3rd party tools (at least those I checked), and for some unclear reason, not in Windows built-it tools or Windows Explorer neither🤷.  

This field and the other time fields can be found in various Windows structures in ntifs.h header. For example:

Great, now what?

I had 2 options:

1. Try to contact the devs of Skybox and hope they will understand why from those total 4 file datetime values, taking the ChangeTime is probably not what most people want when they wish to 'sort by date'. Maybe CreationTime or LastWriteTime would be better candidates here. Also, this assumes they actually chose this datetime attribute for sorting on purpose, and it's not just chosen by default by whatever library or framework they use for their SMB feature... 

Either way this could take a really long time, if they even agree with me (I would try to let them know though) and I wanted an immediate solution.

2. Try to solve this on my own by doing what I tried to do from the beginning, and why I used robocopy to keep those original file timestamps - set ChangeTime value to one of the other timestamps. I went with LastWriteTime because that's what Windows explorer uses when you sort a directory by 'last modified' time, and I usually use this field for sorting a directory by date on Windows.

Next, I needed to find a way to get a file's full metadata, including 'ChangeTime' by normally enumerating my share drive files, without relying on SMB functions. 

Fast forward again, chatGPT helped finding GetFileInformationByHandleEx WinAPI function that can receive that FILE_BASIC_INFORMATION structure shown above and a handle to a file, and return the basic info for that file, including those 4 timestamps stored in file's metadata

I haven't set up my Visual Studio IDE and environment yet, so I couldn't write a native program, so I tried using the good old pywin32 Python lib instead, which magically wraps WinAPI functions and structures and let you work with them directly from Python without using horrible creatures like ctypes 😟.

I wrote a small Python program using pywin32 that opened a file, sent its handle to GetFileInformationByHandleEx and then got all those file's timestamps including the elusive ChangeTime. 

Final step - now I just need to set LastWriteTime value to ChangeTime in Python and find a way to update this field on the NTFS file's metadata itself. 

Only one problem (which almost got me giving up and becoming super frustrated and angry that I spent so much time for nothing), chatGPT insisted in several differently worded questions, that we can only use SetFileTime() WinAPI to set a file's time attributes,  but SetFileTime() doesn't have the ChangeTime attribute, only the other 3 again!

 

This is the moment when I ALMOST gave up 

But then I recalled that one thing I'm actually really good at, being incredibly persistent when I'm determined to solve a problem 😎. 

So I went back to RTFM on Microsoft Learn (i.e. WinAPI documentation), and figured 'well if there's 'GetFileInformationByHandle' there gotta also be 'SetFileInformationByHandle' cause why not?

YES! I'm back in the game

It's almost too obvious that this function would also exists, and a great reminder that at the moment we still can't rely on chatGPT to give us correct answers 100% of the times. I found this to be especially correct on technical subjects that are less documented or popular, which makes a lot of sense.

SetFileInformationByHandle actually receives the exact parameters GetFileInformationByHandleEx receives, but since I'm using pywin32, it's a bit different.

After I did a test on one file, by setting ChangeTime to LastWriteTime value, and then rescanned the share drive from Skybox, I did see the date changed to the LastWriteTime! which finally allowed me to sort my files by a timestamp that actually helps!

This is the final code, if for some weird reason someone seeing this in the future encounter the same issue I had:

(I don't take any responsibility for any damage this might do to your files. Back them up first! and If you're not sure what you're doing- don't )





import os
import win32file



time_format = "%d-%m-%Y %H:%M:%S"


def set_change_time_to_last_written_time(file_path):
    try:
        hFile = win32file.CreateFileW(
            file_path,
            win32file.FILE_GENERIC_READ | win32file.FILE_GENERIC_WRITE,
            win32file.FILE_SHARE_READ | win32file.FILE_SHARE_WRITE,
            None,
            win32file.OPEN_EXISTING,
            win32file.FILE_ATTRIBUTE_NORMAL,
            None
        )
    except Exception as e:
        print(f"exception in CreateFileW for file: {file_path} Exception: {e}")
        return

    if hFile == win32file.INVALID_HANDLE_VALUE:
        print(f"Couldn't get handle for {file_path}")
        return

    fileBasicInfo = win32file.GetFileInformationByHandleEx(
        hFile,
        win32file.FileBasicInfo
    )

    # creation_time = fileBasicInfo['CreationTime'].strftime(time_format)
    # last_access_time = fileBasicInfo['LastAccessTime'].strftime(time_format)
    # change_time = fileBasicInfo['ChangeTime'].strftime(time_format)
    # last_write_time = fileBasicInfo['LastWriteTime'].strftime(time_format)

    fileBasicInfo['ChangeTime'] = fileBasicInfo['LastWriteTime'] # set ChangeTime value to LastWriteTime value
    
    try:
        fileBasicInfo = win32file.SetFileInformationByHandle(
            hFile,
            win32file.FileBasicInfo,
            fileBasicInfo
        )
    except Exception as e:
        print(f"exception in SetFileInformationByHandle for file: {file_path} Exception: {e}")
        return

    win32file.CloseHandle(hFile)


file_path = ""
set_change_time_to_last_written_time(file_path)


Until next time.

 







Mastering Problem-Solving and Cultivating a Research Mindset in the ChatGPT Era (and why you still need to RTFM)

  In this post I'll present a technical problem (some will say it's probably a bug more than it is a feature) I had with a VR app, h...