Have you ever needed to format a new hard drive or USB drive and been given options to choose from abbreviations such as FAT, FAT32 or NTFS? Or did you once try to connect an external device, but your operating system could not detect it? Are you sometimes frustrated by how long it takes your operating system to search for a file?
If you have encountered any of the above or just clicked the mouse to find a file or application on your computer, it means that you have learned from your own experience what a file system is.
Many people may not use an explicit methodology for organizing their personal files on a PC (article what is a file_system.docx). However, the abstract concept of organizing files and directories for any device with permanent memory should be very systematic when reading, writing, copying, deleting and interacting with data. This task of the operating system is usually assigned to the file system.
Persistent Data: Files and directories
Modern operating systems are becoming more and more complex and need to manage various hardware resources, process planning, memory virtualization and many other tasks. When it comes to data, many hardware tools, such as cache and RAM, have been designed to speed up access times and ensure that frequently used data is “near” the processor. However, after turning off the computer, only the information stored on permanent devices, such as hard drives (HDD) or solid-state drives (SSD), will remain. Thus, the OS should take special care of these devices and the data inside, since this is where users will store the data they need.
The two most important abstractions developed over time for storage are file and directory. A file is a linear array of bytes, each of which you can read or write. While in the user space we can come up with smart names for our files, there are usually numeric identifiers under the hood to track file names. Historically, this basic data structure often refers to its inode index descriptor. Interestingly, the OS itself doesn't know much about the internal structure of the file (i.e. whether it is an image, video or text file); In fact, all he needs to know is how to write bytes to a file for permanent storage and make sure he can retrieve them later when called.
The second main abstraction is the catalog. Actually, under the hood, a directory is just a file, but it contains a very specific set of data: a list of human-readable names mapped to low-level names. In practice, this means that it contains a list of other directories or files that together can form a directory tree in which all files and directories are stored.
Such an organization is quite expressive and scalable. All you need is a pointer to the root of the directory tree (this will be the first inode index descriptor in the system), and from there you will be able to access any other files on this disk partition. This system also allows you to create files with the same names if they do not have the same path (i.e. they are located in different places in the file system tree).
Technically, you can name the file as you like, but it is usually customary to designate the file type with a dot separation (for example, .jpg in picture.jpg), although not necessarily. Some operating systems, such as Windows, strongly recommend using these conventions to open files in the appropriate application, but the contents of the file itself do not depend on its extension. The extension is just a hint for the operating system on how to interpret the bytes contained in the file.
Once you have the files and directories, you should be able to work with them. In the context of a file system, this means the ability to read and write data, manage files (delete, move, copy, etc.) and manage file permissions (who can perform all of the above operations?). How are modern file systems implemented that allow performing all these operations in a fast and scalable manner?
File system organization
When thinking about the file system, there are usually two aspects to consider. The first is the file system data structures. In other words, what types of disk structures are used by the file system to organize its data and metadata? The second aspect is access methods: how can a process open, read or write to its structures?
Let's start with a description of the general organization on the disk of an elementary file system.
The first thing you need to do is divide your disk into blocks. The commonly used block size is 4 KB. Suppose you have a very small disk with 256 KB of memory. The first step is to divide this space evenly, using the size of your block, and assign a number to each block (in our case, denoting blocks from 0 to 63):
Now let's break these blocks into different regions. Let's set aside most of the blocks for user data and call it a data area. In this example, let's fix blocks 8-63 as our data area:
If you noticed, we have placed the data area in the last part of the disk, leaving the first few blocks for use by the file system for other purposes. In particular, we want to use them to track information about files, such as the location of the file in the data area, its size, owner and access rights, as well as other information. This information is a key part of the file system and is called metadata.
To store this metadata, we will use a special data structure called an index descriptor (inode). In the current example, let's set 5 blocks as indoe and call this area of the disk the inode table:
Inode indexes are usually not very large, for example 256 bytes. Thus, a 4 KB block can contain about 16 inodes, and our simple file system above contains only 80 inodes. This number is actually significant: this means that the maximum number of files in our file system is 80. With a larger disk, you can certainly increase the number of inodes by directly translating them into more files in your filesystem.
There are still a few things left to complete our file system. We need to keep track of whether inodes or data blocks are free or distributed. This distribution structure can be implemented as two separate bit arrays, one for the inode and the other for the data area.
A bit array is a very simple data structure: each bit corresponds to whether an object/block is free (0) or used (1). We can assign an index bit array and a data area bit array to their own block. Although this is unnecessary (a block can be used to track objects up to 32 KB in size, but we only have 80 inodes and 56 data blocks), it is a convenient and easy way to organize our file system.
Finally, for the last remaining block (which, coincidentally, is the first block on our disk), we need a superblock. This superblock is a kind of metadata for metadata: in a block we can store information about the file system, such as the number of inodes (80) and where the inode block is located (block 3) and so on. We can also put some identifier for the file system in a superblock so that it is clear how to interpret the nuances and details of various file systems (for example, we can note that this file system is based on Unix, ext4 file system, or possibly NTFS). When the operating system reads a superblock, it may have a diagram of how to interpret and access various data on the disk.
Inode Index Descriptor
So far, we have mentioned the inode data structure in the file system, but have not yet explained what an important component it is. Inode is short for index node, and is a historical name given in UNIX and earlier filesystems. Almost all modern systems use the concept of inode, but they can be called differently (for example, dnode, fnode, etc.).
In essence, an inode is an indexed data structure, that is, specific information that allows you to go to a specific location (index) and find out how to interpret the next set of bits.
A specific inode is referenced by a number (i-number), and this is a low-level file name. After receiving the i-number, you can view its information by quickly navigating to its location. For example, from the superblock, we know that the inode area starts with an address of 12 KB.
Since the disk is not byte-addressable, we need to know which block to access in order to find our index. Using fairly simple math, we can calculate the block ID based on the i-number, the size of each inode, and the block size. Subsequently, we can find the beginning of the inode inside the block and read the desired information.
The Inode contains almost all the necessary information about the file. For example, is it a regular file or directory? What is its size? How many blocks are allocated to him? What permissions are granted to access the file (i.e. Who is the owner and who can read or write)? When was the file created or was it last accessed? And many other flags or metadata about the file.
One of the most important elements of information stored in an inode is a pointer (or a list of pointers) where the necessary data is located in the data area. They are known as direct pointers. The concept is good, but for very large files you may run out of pointers in a small inode data structure. Therefore, many modern systems have special indirect pointers: instead of going directly to the file data in the data area, you can use an indirect block in the data area to increase the number of direct pointers for your file. Thus, the files can become much larger than the limited set of direct pointers available in the inode data structure.
Unsurprisingly, you can use this approach to support even larger data types by using double or triple indirect pointers. This type of file system is known as having a multi-level index and allows the file system to support large files, for example, in the range of gigabytes or more. Common file systems, such as ext2 and ext3, use multi-level indexing systems. Newer file systems such as ext4 have the concept of extents, which are somewhat more complex pointer schemes.
Although the inode data structure is very popular due to its scalability, a lot of research has been done to understand its effectiveness and the extent to which multi-level indexes are needed. One study showed some interesting measurements in file systems, including:
* Most files are actually very small (2 KB is the most common size)
* The average file size is growing (by almost 200k on average)
* Most bytes are stored in large files (several large files take up most of the space)
* File systems contain many files (on average almost 100k)
* File systems are about half full (even as disks grow, file systems remain ~50% full)
* Directories are usually small (many of them have few entries, 20 or less)
All this points to the versatility and scalability of the inode data structure, as well as how it perfectly supports most modern systems. Many optimizations have been implemented to improve speed and efficiency, but the basic structure has changed little recently.