Kertish-dos is a simple and highly scalable distributed object storage to store and serve billions of files. It is developed to cover the expectation for mass file storage requirements in isolated networks. It does not have security implementation.
Kertish-dos is developed to cover the traditional file storage requirements in a scalable way. Software will use the same path-tree structure virtually to keep the files but will not stuck in hard-drive limits of the machine. It is possible to create a fail-safe, highly available fast and scalable file storage.
Systems where they need huge storage space requirements to keep all the files logically together. For example; common cloud storage for micro services, video storage for a streaming services, and the like…
Kertish-dos does not have any security implementation. For this reason, it is best to use it in a publicly isolated network. Also it does not have file/folder permission integration, so user base limitations are also not available.
Kertish-dos is suitable to use as a back service of front services. It means, it is better not to allow users directly to the file manipulation.
Kertish-dos has 3 vital parts for a working farm. Manager-Node, Head-Node, Data-Node.
Manager-Node and Data-Node must not accept direct request(s) coming from the outside of the Kertish-dos farm. For file/folder manipulation, only the access point should be Head-Node.
Here is a sample scenario;
Kertish-dos allows you to create data farm in distributed locations. Every data farm consist of clusters. Clusters are data banks. Every cluster has one or more data node. Always one data node works as master. Every data node additions to that cluster will be used as backup of the master.
Data is always written to and deleted from the master data node in the cluster. However, reading request will be balanced between slave data nodes. So in read sensitive environments, as much as new data node added to the cluster will help you to increase the response time.
Admin
and File Storage
toolsKertish-dos nodes has different hardware requirements to work flawless.
Manager-Node has redis, mongodb and locking-center TCP connections. In addition to that, it serves REST end-points for head-node and data-node for management feedback requests. For these purposes, a powerful network connection is a must. If you provide minimum 4 or more CPU Cores, it will significantly drop the response time between node communication. According to your Kertish farm setup, you may need 2GB or more memory. If you are serving many small files between 1kb to 16mb, it is better to keep memory not less than 8 GB for 4 clusters with 8 data-nodes working master-slave logic and disk space size is between 350GB to 600GB. It is required to handle synchronization and repair operation handling otherwise, it can fall to swap space which cause slow operation problem and if there is not any swap space configuration, it will lead the service to crash. NOTE Always remember that Manager-Node is not scalable right now. It will work only one instance. I’m working to make it scalable.
Head-Node has mongodb and locking-center TCP connections. Also, it serves REST end-points for file storage manipulations. It means, it is a good idea to have 200mbit or powerful network connection. Head-Node will cache the uploaded file to process. So, if you are uploading raw 32GB file to the Kertish-dos, you should have a powerful memory and swap space to hold the whole file in the memory. For this reason, you should have a powerful SSD Disk with a huge swap space configured. NOTE this logic will be extended in the future releases and will able to cover real-time uploading without caching. If you are configuring the Kertish-dos farm to serve just small files between 1kb to 2GB, 4GB ram with 8GB swap space will be more than enough to cover expectations. On read wise, Head-Node does not cache anything, it transfers the data from the data-node to client. So memory is essential just for file uploads. Remember that, Head-Node is scalable and you can put as much as Head-Node for file manipulation behind the load balancer. CPU is not a big consideration. Minimum 2 or more CPU Cores will be sufficient. Head-Node does not do any serious calculation.
Data-Node has backend custom TCP ports to serve content. It makes REST requests to Manager-Node. For this reason, each node can have 100mbit or powerful network connection to serve files. Data-Node has optional caching feature to cache the most requested files to serve quickly. For a data-node with 500GB disk space without caching, 4GB memory is enough to operate. More memory will not change the response time or performance. If you consider to use caching, just put as much as memory top of 4GB to improve response time. It means, if you have a machine with 16GB memory, 16GB-4GB = 12GB memory can be use for caching. Hard disk is a key point here. Better to use SSD for fast access and serve. HDD will be also okay if you are storing huge files because it will not have many small file chunks stored on the disk. However, small files will create many chunks which will affect seek time of the disk head and that will lead you a slow data-node. On CPU wise, it is not a critical topic. Minimum 2 or more CPU cores will be sufficient to serve files. On the other hand, slave nodes are periodically synchronize content with master and on that operation, CPU usage can raise. So if you provide fast and more CPU core(s), synchronisation will finish quicker.
Kertish-dos farm consist of minimum
Head Node
is for file storage interaction. When the data is wanted to access, the application should
make the request to this node. It works as REST service. File storage command-line tool communicate directly to
head node. Head node is scalable. Check head-node
folder for details.
Manager Node
is for orchestrating the cluster(s). When the system should be setup first time or
manage farm for adding, removing cluster/node, this node will be used. Admin command-line tool
communicate directly with manager node. Manager node is NOT scalable for now. Check manager-node
folder for details.
Data Node
is to keep the data blocks. All the file data particles will be distributed on data nodes in
different clusters.
I’ll setup a farm using
Whole setup will be done on the same machine. This is just for testing purpose. In real world, you will need 6 servers for this setup. It is okay to keep the Mongo DB, Redis DSS, Locking-Center Server and Manager Node in the same machine if it covers the DB and DSS expectations.
You will not find the Mongo DB, Redis DSS, Locking-Center Server setup. Please follow the instruction on their web sites.
The docker hub page is [https://hub.docker.com/r/freakmaxi/kertish-dos]
You can use the sample docker-compose file to kickstart the Kertish-dos farm in docker container with 6 Data-Nodes working in 3 Clusters as Master/Slave
[https://github.com/freakmaxi/kertish-dos/blob/master/docker-compose.yml]
docker-compose up
will make everything ready for you.
Download setup script from [https://github.com/freakmaxi/kertish-dos/blob/master/kertish-docker-setup.sh]
sudo chmod +x kertish-docker-setup
y
and press enter
Your Kertish-dos farm is ready to go.
Put any file using krtfs
file storage tool. Ex:
./krtfs cp local:~/Downloads/demo.mov /demo.mov
Just change the path and file after local:
according to the file in your system. Try to choose a file more than 70 Mb
to see file chunk distribution between clusters. If file size is smaller than 32 Mb, it will be placed only in a cluster.
./krtfs ls -l
will give you an output similar like below
processing... ok.
total 1
- 87701kb 2020 Jun 22 22:07 demo.mov
create_release.sh
shell script file located under
the -build-
folder.You can take a look at Manager-Node page to understand how it is working
kertish-manager
executable to /usr/local/bin
folder on the system.sudo chmod +x /usr/local/bin/kertish-manager
export MONGO_CONN=”mongodb://root:pass@127.0.0.1:27017” # Modify the values according to your setup export REDIS_CONN=”127.0.0.1:6379” # Modify the values according to your setup export LOCKING_CENTER=”127.0.0.1:22119” # Modify the values according to your setup /usr/local/bin/kertish-manager
- Give execution permission to the file `sudo chmod +x [Saved File Location]`
- Execute the saved file.
---
##### Setting Up Head Node
You can take a look at [Head-Node](https://github.com/freakmaxi/kertish-dos/blob/master/head-node) page to understand how it is working
- Copy `kertish-head` executable to `/usr/local/bin` folder on the system.
- Give execution permission to the file `sudo chmod +x /usr/local/bin/kertish-head`
- Create an empty file in your user path, copy-paste the following and save the file
```shell script
#!/bin/sh
export MANAGER_ADDRESS="http://127.0.0.1:9400"
export MONGO_CONN="mongodb://root:pass@127.0.0.1:27017" # Modify the values according to your setup
export LOCKING_CENTER="127.0.0.1:22119" # Modify the values according to your setup
/usr/local/bin/kertish-head
sudo chmod +x [Saved File Location]
You can take a look at Data-Node page to understand how it is working
kertish-data
executable to /usr/local/bin
folder on the system.sudo chmod +x /usr/local/bin/kertish-data
dn-c1n1.sh
, copy-paste the following and save the file
```shell script
#!/bin/shexport BIND_ADDRESS=”127.0.0.1:9430” export MANAGER_ADDRESS=”http://127.0.0.1:9400” export SIZE=”1073741824” # 1gb export ROOT_PATH=”/opt/c1n1” /usr/local/bin/kertish-data
- Give execution permission to the file `sudo chmod +x [Saved File Location]`
- Execute the saved file.
---
- Create an empty file on your user path named `dn-c1n2.sh`, copy-paste the following and save the file
```shell script
#!/bin/sh
export BIND_ADDRESS="127.0.0.1:9431"
export MANAGER_ADDRESS="http://127.0.0.1:9400"
export SIZE="1073741824" # 1gb
export ROOT_PATH="/opt/c1n2"
/usr/local/bin/kertish-data
sudo chmod +x [Saved File Location]
dn-c2n1.sh
, copy-paste the following and save the file
```shell script
#!/bin/shexport BIND_ADDRESS=”127.0.0.1:9432” export MANAGER_ADDRESS=”http://127.0.0.1:9400” export SIZE=”1073741824” # 1gb export ROOT_PATH=”/opt/c2n1” /usr/local/bin/kertish-data
- Give execution permission to the file `sudo chmod +x [Saved File Location]`
- Execute the saved file.
---
- Create an empty file on your user path named `dn-c2n2.sh`, copy-paste the following and save the file
```shell script
#!/bin/sh
export BIND_ADDRESS="127.0.0.1:9433"
export MANAGER_ADDRESS="http://127.0.0.1:9400"
export SIZE="1073741824" # 1gb
export ROOT_PATH="/opt/c2n2"
/usr/local/bin/kertish-data
sudo chmod 777 [Saved File Location]
IMPORTANT: Data nodes sizes in the SAME CLUSTER have to be the same. You may have different servers with
different sized hard-drives. You should use the SIZE
environment variable to align the storage spaces according to the
the server that has the smallest hard-drive size
krtadm
executable to /usr/local/bin
folder on the system.sudo chmod +x /usr/local/bin/krtadm
krtadm -create-cluster 127.0.0.1:9430,127.0.0.1:9431
ok.
- Enter the following command to create the one another cluster
`krtadm -create-cluster 127.0.0.1:9432,127.0.0.1:9433`
- If everything went right, you should see something like this
Cluster Details: 8f0e2bc02811f346d6cbb542c92d118d Data Node: 127.0.0.1:9432 (MASTER) -> 7a758a149e4453b20a40b35f83f3a0e4 Data Node: 127.0.0.1:9433 (SLAVE) -> 6776201a0bb7daafb46c9e3931f0807e
ok.
---
##### Manipulating File Storage
- Copy `krtfs` executable to `/usr/local/bin` folder on the system.
- Give execution permission to the file `sudo chmod +x /usr/local/bin/krtfs`
- Enter the following command
`krtfs ls -l`
output:
processing… ok. total 0
- Put a file from your local drive to dos
`krtfs cp local:/usr/local/bin/krtfs /krtfs`
output:
processing… ok.
- Enter the following command
`krtfs ls -l`
output:
processing… ok. total 1
If you get the same or similar outputs like here, congratulations! you successfully set up your Kertish-dos.
When you setup the cluster and the cluster starts taking data blocks, consider that cluster is as absolute. Deleting
the cluster will cause you data inconsistency and lost. Due to this reason, when you are creating the structure of
your farm, pay attention to your cluster setups the most. If you want to remove the cluster from the farm, consider to
move the cluster first from one point to another using krtadm
client tool.