Posts Tagged 'Data Access'

January 6, 2016

Do You Speak SoftLayer Object Storage?

So you’ve made the decision to utilize object storage at SoftLayer. Great! But are you and your applications fluent in object storage? Do you know how to transfer data to SoftLayer object storage as well as modify and delete objects? How about when to use APIs and when to use storage gateways? If not, you’re not alone.

We’ve found that most IT professionals understand the difference between “traditional” (i.e., file and block) storage and object storage. They have difficulty, however, navigating the methods to interact with SoftLayer’s object storage service that is based on OpenStack Swift. This is understandable because traditional storage systems expose volumes and or shares that can be mounted and consumed via iSCSI, NFS, or SMB protocols.

That’s not the case with object storage, including the object storage service offered by SoftLayer. Data is only accessed via the use of REST APIs and language bindings, third-party applications supporting SFTP, the SoftLayer customer portal, or via storage gateways.

The solutions are outlined below, including guidance on when to utilize each access method. Figure 1 provides a high level overview of the available options and their purpose.



Figure 1: Object storage data access methods

REST APIs and Language Bindings
The first and possibly most flexible method to access SoftLayer object storage is via REST APIs and language bindings. These APIs and bindings give you the ability to interact with SoftLayer object storage via command line or programmatically. As a result, you can create scripts to perform a file upload, download certain objects, and modify metadata related to the object. Additionally, the current support for PHP, Java, Ruby, and Python bindings give application developers the flexibility to support SoftLayer object storage in their applications.

While this method is flexible in terms of capabilities, it does assume the user has knowledge and experience writing scripts, programs, and applications. REST APIs and language bindings aren’t the best methods for IT organizations that want to integrate existing environment backup, archive, and disaster recovery solutions. These solutions typically require traditional storage mount points, which REST APIs and language bindings don’t provide.

Third-Party Applications
The second method is to use third-party applications that support SFTP. This method abstracts the use of REST APIs and gives users the ability to upload, download, and delete objects via a GUI. However, you won’t have the ability to modify metadata when using an SFTP client. Additionally, third-party applications have a 5GB upload limit placed on each object by SoftLayer and OpenStack Swift. If an object greater than 5GB needs to be uploaded, you have to follow the OpenStack method of creating large objects on object storage to assure successful and efficient object upload. Unless you’re comfortable with this methodology, it’s strongly recommended that you use either the REST APIs or storage gateway solutions to access files over 5GB.

SoftLayer Customer Portal
The third method to access SoftLayer object storage is to simply use the SoftLayer customer portal. By using the portal, you have the ability to add containers, add files to containers, delete files from containers, modify metadata, and enable CDN capabilities. As with the SFTP method of accessing the object store, you can upload an unlimited number of files as long as each file does not exceed 20MB in size. Also, there is no bulk upload option within the customer portal; users must select and upload on a per-file basis. While using the portal is simple, it does provide some limitations and is best for users only wanting to upload a few files that occupy 20MB or less.

Storage Gateways
The last method to access and utilize SoftLayer object storage is storage gateways. Unlike other methods, storage gateways are unique. They’re able to expose traditional storage protocols like iSCSI, NFS, CIFS, and SMB and translate the read/write/modify commands into REST API calls against the object storage service. As a result, these devices offer an easier path to consume SoftLayer object storage for businesses looking to integrate their on-premises environment with the cloud. Some storage gateways also have the ability to compress, deduplicate, and encrypt data in-flight and at-rest. Storage gateways work best with organizations looking to integrate existing applications requiring traditional storage access methods (like backup software) with object storage or to securely transfer and store data to cloud object storage.

Summary
While there are many methods to access SoftLayer object storage, it’s important that you select an option that best meets your requirements relating to data access, security, and integration. For example, if you’re writing an application that requires object storage, you would most likely choose to interact with object storage via REST APIs or use language bindings. Or, if you simply need to integrate existing applications in your environment to cloud object storage, storage gateway would be the best option. In all cases, make sure you can meet your requirements with the appropriate method.

Table 1 lists sample requirements and shows whether each option meets the requirements. Use it to help you with your decision making process:



Table 1: Decision making tool

Click here for more information about SoftLayer’s object storage service and click here for FAQs on object storage.

Click here for information about SoftLayer’s REST-APIs and language bindings.

-Daniel De Araujo & Naeem Altaf

Categories: 
August 31, 2015

Data Ingestion and Access Using Object Storage

The massive growth in unstructured data (documents, images, videos, and so on) is one of the greatest problems facing today’s IT personnel. The challenge is storing all the data so that it and its storage solution can grow exponentially. Object storage is an ideal, cost-effective, scale-out solution for storing extensive amounts of unstructured data.

SoftLayer offers object storage based on the OpenStack Swift platform. Object storage provides a fully distributed, scalable, API-accessible storage platform that can be integrated directly into applications. It can be used for storing static data, such as virtual machine (VM) images, photos, emails, and so on. Click here for more information on object storage.

There are two important use cases when working with object storage: data ingestion and data access.

Data ingestion use case
A large medical research company needs to upload a large amount of data into their SoftLayer compute instance. The requirement is for a multi-hundred terabyte image repository that contains hundreds of millions of images. Researchers will then upload code to run on bare metal servers with GPUs to process the images in the repository. The images range from 512KB CT images to 30MB to 50 MB mammograms and are logically grouped into 12 million “studies.” The client wants to onboard the data as quickly as possible.

Recommendations

  • Evenly distribute the objects into approximately 1,000 containers for the initial upload. For the amount of objects the client needs to store, our tests have shown that having a much larger number of containers, or too few objects per container, would incur significant performance penalties. The proposed 1,000 containers allow for a good balance for parallelism in object creation and keeps the container sizes manageable.
  • Concurrently add new objects to all containers using 400 worker threads for small objects (e.g., 512KB CT images) and 40 worker threads for large objects (e.g., 30MB to 50MB mammograms). The ideal number of worker threads is dependent on the workload size. Using a minimal amount of threads results in better response but lower throughput. Using significantly more threads may lower both latency and throughput because the threads start competing for resources.

Data access use case
A large technology company has a mix of GET, PUT, and DELETE operations for which it needs object storage capable of holding billions of small objects (15KB or less). They also want consistent latencies for their operation mix (GET 54%, PUT 33%, and DELETE 13%), which requires optimal tuning for consistent performance. The client’s benchmarking calls for 1,400 operations per second.

Recommendations

  • Use multiple containers (at least 40) to improve the latency for PUT and DELETE objects. As long as the objects are distributed over at least 40 containers with a sufficient number of worker threads, the average latencies for PUT and DELETE objects was well below 100ms in our tests. There may be occasional latency spikes, which are not surprising on shared storage systems, but overall, the latencies should be relatively consistent.
    • The read latency for a GET is very fast—less than 20ms on average for small objects.
  • Use multiple containers if very high throughput is needed. In our tests, we could drive more than 6,000 transactions per second on the production cluster with at least 40 containers.

-Naeem Altaf & Khoa Huynh

Categories: 
Subscribe to data-access