The object storage¶

1. Definition and terms¶

Object storage, also known as object-based storage, is a computer data storage architecture designed to handle large amounts of unstructured data. Unlike other architectures, it designates data as distinct units, bundled with metadata and a unique identifier that can be used to locate and access each data unit.

Two important terms in object storage: - Buckets : A bucket is a logical container used for storing objects(e.g. files, images, videos, etc.). Buckets act as top-level folders or repositories within these object storage systems where data can be stored and organized. - Objects : An object in object storage is a piece of data that consists of Key(unique identifier), Data and Metadata.

Important properties of Bucket¶

Below are some important properties of A bucket:

-Unique Names: Each bucket in an object storage system must have a unique name within that service. This uniqueness is enforced across the entire storage service to ensure that each bucket can be easily identified.

-Access Control: Bucket-level access control allows users to define who can access the objects within a specific bucket and what level of access they have (read, write, delete, etc.).

-Storage Policies: Storage policies or settings can be applied at the bucket level, defining properties like data redundancy, access permissions, encryption, and lifecycle management (e.g., defining rules for object expiration or moving to less expensive storage tiers).

1.1 Object storage vs. file storage vs. block storage¶

File storage(system)¶

File storage stores and organizes data into folders. It is one of the most common and traditional forms of storing data on computers and other storage devices (e.g. Direct-Attached Storage (DAS), Network-Attached Storage (NAS), Storage Area Network (SAN)).

Direct-Attached Storage (DAS): storage devices directly connects to a computer system (or server) without going through a network. The physical storage devices include hard disk drives (HDDs), solid-state drives (SSDs), which are directly attached to a computer via interfaces like SATA, USB, or PCIe. For example, EXT4, FAT32, NTFS, etc. are file systems which we use in DAS.
Network-Attached Storage (NAS): NAS is a storage device or server connected to a network that provides storage and file system access to multiple clients and servers. It allows multiple users and devices to access shared files simultaneously over a network. NFS is a typical file system which we use in NAS.
Storage Area Network (SAN): SAN is a dedicated high-speed network or subnetwork that connects storage devices to servers. It provides block-level storage accessible to multiple servers and offers features like high performance, scalability, and centralized storage management.

Block storage¶

Block storage improves on the performance of file storage, breaking files into separate blocks and storing them separately. A block-storage system will assign a unique identifier to each chunk of raw data, which can then be used to reassemble them into the complete file when you need to access it. Block storage doesn’t require a single path to data, so you can store it wherever is most convenient and still retrieve it quickly when needed.

Block storage works well for organizations that work with large amounts of transactional data or mission-critical applications that need minimal delay and consistent performance. However, it can be expensive, offers no metadata capabilities, and requires an operating system to access blocks.

1.2 How does object storage work?¶

With object storage, the data blocks of a file are kept together as an object, together with its relevant metadata and a custom identifier, and placed in a flat data environment known as a storage pool.

Data Structure: Object storage organizes data as objects. Each object consists of the data itself, metadata, and a unique identifier. The data and metadata are stored together, making it self-contained. You can also customize metadata, allowing you to add more context that is useful for other purposes, such as retrieval for data analytics.
Access Method: Objects are accessed via RESTful APIs, HTTP, and HTTPS using unique identifiers (or metadata). These APIs allow users to perform CRUD (Create, Read, Update, Delete) operations on the stored objects. Object storage doesn’t require a file hierarchy; instead, it uses a flat structure.

There are many object storage solutions, below are some of the most popular solutions: - Azure blob storage - Amazon s3 - Google Cloud Storage - minio

1.3 What are the benefits of object storage?¶

Compare to other types of storage, object storage has the following advantages:

Scalability: The flat environment enables you to scale quickly, even for petabyte or exabyte loads. Storage pools can be spread across multiple object storage devices and geographical locations, allowing for unlimited scale. You simply add more storage devices to the pool as your data grows.
Metadata: Object storage systems can store extensive metadata (key value pairs) for each object, providing rich information about the stored data.
Reduced complexity: Object storage has no folders or directories, removing much of the complexity that comes with hierarchical systems. The lack of complex trees or partitions makes retrieving files easier as you don’t need to know the exact location.
Redundancy and Durability: Object storage systems often provide redundancy and data durability by replicating objects across multiple locations or using erasure coding techniques.

2 What is MinIO?¶

MinIO is an easy-to-deploy open-source object storage solution. is a Kubernetes-native high performance object store with an S3-compatible API. Onyxia uses it as the official S3 object storage backend.

There are two main ways to install minio - basic linux installation official doc - k8s operator (helm chart) installation official doc

2.1 Main features¶

2.2 Erasure set for high availability and Resiliency¶

You can find more information in below docs on how Minio use Erasure coding to ensure HA and Resiliency. https://min.io/docs/minio/linux/operations/concepts/erasure-coding.html#minio-erasure-coding

https://github.com/minio/minio/blob/master/docs/erasure/README.md

¶

mc sql --recursive --query "select * from S3Object" ALIAS/PATH

mc sql --recursive --query "select * from S3Object" minio/casd/data_science/crime.csv

3 Install minio cluster¶

The minio installation on bare metal can be found here