Posts Tagged 'Big Data'

August 18, 2016

Apache Hadoop and Big Data on IBM Cloud

Companies are producing massive amounts of data—otherwise known as big data. There are many options available to manage big data and the analytics associated with it. One of the more popular options is Apache Hadoop, an open source software designed to scale up and down quickly with a high degree of fault tolerance. Hadoop lets organizations gather and examine large amounts of structured and unstructured data.

In the past, large CAPEX and deployment costs made large big data or Hadoop clusters cost prohibitive. Cloud providers, like IBM Cloud, have made it possible to break through the cost barriers. The cloud model, with its utility-type billing and usage charges, makes it possible to build big data clusters, use them for a specific project, then tear them down. IBM Cloud is a great solution for this type of scenario and makes sense for those that require short term or project-based Hadoop clusters. Hadoop on IBM Cloud allows organizations to respond faster to changing business needs and requirements without the upfront CAPEX.

What makes Hadoop on IBM Cloud so compelling are the components that are available in the IBM Cloud offering. Customers have the ability to choose and use the same type of components and standards that they would use in their own data centers. These components include bare metal servers, private and unmetered private networks, and enterprise-grade block and object storage. IBM Cloud also offers GPUs for the most processor-intense big data workloads. Customers don’t have to settle for less when deploying their Hadoop clusters in IBM Cloud.

Hadoop on IBM Cloud supports multiple data centers in different regions across the globe. The diagram below provides the graphical layout of Hadoop clusters across multiple IBM Cloud data centers.

Hadoop clusters across multiple IBM Cloud data centers

For more information, contact your SoftLayer sales representative.

-Kevin McDade

Categories: 
June 23, 2016

Meet the Integrated IBM Cloud Platform: SoftLayer and Bluemix

Did you know that you can complement your SoftLayer infrastructure with IBM Bluemix platform-as-a-service? (Read on—then put these ideas into practice with a special offer at the end.)

When you pair Bluemix with SoftLayer, you can buy, build, access, and manage the production of scalable environments and applications by using the infrastructure and application services together. 

Whether you need insight on the effectiveness of a multimedia campaign, need to process vast amounts of data in real-time, or want to deploy websites and web content for millions of users, you can create a better experience for your customers by combining the power of your SoftLayer infrastructure with Bluemix.

Bluemix solutions and services allow you to:

  • Optimize campaigns in real-time based on customer reaction using Watson Personality Insights and Insights for Twitter.
  • Run scalable analytics using Streaming Analytics to retrieve results in seconds.
  • Improve outcomes with Watson Alchemy API and Retrieve and Rank paired with high performance bare metal servers.
  • Automate hundreds of daily web deployments using SoftLayer and Bluemix APIs.
  • Securely store, analyze, and process big data using Cloudant database service with Apache Spark.

You can see the value of an integrated SoftLayer/Bluemix experience by looking at insights and cognitive, big data and analytics, and web applications.

Insights and Cognitive

Forty-four percent of organizations say customer experience will be the primary way they seek to differentiate from competitors.

The scenario: Marketing organizations and advertising agencies want to release a large, worldwide marketing campaign, complete with embedded ads. With the explosive growth of mobile, social, and video, those ads are often image- and video-intensive. Not only are these enterprises worried about how to run such a high-performing workload where customer data needs to stay in-country, but they have no idea how effective their campaign will be—and whether those receiving it are the users they’re trying to target—until it’s too late.

The solution: A media-rich campaign workload can run on high-performing bare metal servers in SoftLayer data centers. Cognitive services are added to understand in real-time the impact of campaign and target customers, whose personal data is stored in proximity to the user.

  • SoftLayer bare metal servers run media-rich (video, image) campaign workloads.
  • Bluemix’s Insights for Twitter service is used to understand in real-time the impact of the campaign.
  • Watson’s Personality Insights allows you to see, based on 40 calculated attributes, if users viewing ads match the target customers.
  • Globally diverse block storage enables data storage across the world.

Personality portrait

Big Data and Analytics

The value of data decreases over time. On average, it takes two weeks to analyze social data.

The scenario: Customers need to harness vast amounts of data in real-time. The problem is many data streams come too fast to store in a database for later analysis. Further, the analysis needs to be done NOW. From social media, consumer video, and audio, to security cameras, businesses could win or lose by being the first to discover essential patterns from these real-time feeds and act upon them.

The solution:  Customers can use Streaming Analytics and get results in seconds, not hours. Alchemy API and Retrieve and Rank services can improve decisions and outcomes all from bare metal servers with scalable IBM Containers.

•       Streaming Analytics can run scalable analytics solutions and get results in seconds, not hours.

•       Patterns that are found can be stored with the associated stream content in object storage and transferred around the world using CDN to be co-located with their customers.

•       Watson’s Retrieve and Rank service can improve decisions and outcomes.

•       Run services from high-performing, low-latency bare metal servers that can scale as activity swells using IBM Containers.

Hadoop, data warehouse, NOSQL diagram

Web Application

It can take several weeks for a DBMS instance to be provisioned for a new development project, which limits innovation and agility.

The scenario: Customers deploying websites and web content for millions of users need fast infrastructure and services so they can focus on their users, not spend their time managing servers and infrastructure. This is especially true for commerce sites that need to be constantly available for orders. These also need a reliable database to securely store the data. The problem is these customers do not want to manage their database, and need an infrastructure provider that is worldwide, reliable, and screaming fast.

The solution: Customers can host web applications on VMs and bare metal with a broad range of needs, including sites that require deep data analysis. Apache Spark can be used to spin up in-memory computing to analyze Cloudant data and return results 100x faster to the user.

  • Automate hundreds of web deployments using SoftLayer APIs.
  • Cloudant DB offloads DB management, reallocates budget from admins to application developers.
  • Apache Spark analyzes Cloudant data 100 times faster using in-memory computing cluster.
  • Bare metal servers provide a high-performing environment for the most stringent requirements.
  • Load balancers manage traffic, helping to ensure uptime.
  • Virtual servers with the Auto Scale service grow and shrink environment to consistently meet needs of application without unnecessary expenditures.
  • Object storage open APIs speed worldwide delivery via CDN.

Cloudant diagram

Exciting Offer

Put these ideas into practice by trying Bluemix today. To get you started, we are offering you a $200 Bluemix spending credit for 30 days when you link your SoftLayer account with a Bluemix account. When you link your Bluemix and SoftLayer billing accounts, you receive a $200 credit toward Bluemix usage. The credit must be used within 30 days of linking the accounts.

Follow these easy instructions to get started:  

  • Visit the SoftLayer customer portal and log into your account.
  • Open a ticket to request the ability to enable the ability to link your Bluemix account.
  • Once activated, the “Link a Bluemix Account” button will appear at the top of the SoftLayer customer portal page.
  • Click on the “Link a Bluemix Account” button. 
  • Follow the on-screen instructions to link your SoftLayer account to a Bluemix account.

This offer expires on November 30, 2016.

Learn More

Bluemix Intro Demo

Watson Personality Insights

Real Time Streaming Analysis

Hybrid Data Warehouse



 

-Thomas Recchia

June 16, 2016

Larger Virtual Servers Now Available

You asked. We listened. We’re excited to announce that our clients can now provision virtual servers with more cores and more RAM.

Starting today, you’re now empowered to run high compute and in-memory intensive workloads on a public and private cloud with the same quick deployment and flexibility you’ve come to enjoy from SoftLayer. After all, you shouldn’t have to choose between flexibility and power.

Oh, and did we mention it’s all on demand? Deploy these new, larger sizes rapidly and start innovating—right now.

Whether you require a real-time analytics platform for healthcare, financial, or retail, these larger virtual servers provide the capabilities you need to harness and maximize analytics-driven solutions.

Popular use cases for larger virtual servers include real-time big data analytics solutions requiring millisecond execution as needed by organizations processing massive amounts of data, like weather companies. Given the immense amount of meteorological inputs required for any location, at any time, at millisecond speed, larger virtual server sizes power weather forecast responses in real-time.

With SoftLayer virtual servers, you can segment your data across public, private, and management networks for better reliability and speed. You get unmetered bandwidth across our private and management networks at no additional charge, and unmetered inbound bandwidth on our public network. As real-time data-intensive workloads are developed, SoftLayer ensures that our best-in-class network infrastructure can retrieve and move data with speed.

New Sizes

Drum roll, please! Our newest offerings include:

Public virtual servers

Private virtual servers

Public virtual servers will be customizable, but will have limitations on various core/RAM ratios. Private nodes will provide complete customization.

With the introduction of larger virtual servers, SoftLayer will also reconfigure socket/core ratios. The number of cores per socket is reflected below for newly deployed virtual servers:

Core:Socket Ratios

For clients using third-party software on virtual servers, it is recommended that you work with your software vendor to ensure socket-based licensing is properly licensed. 

Data Center Availability

Currently, larger public and private virtual servers will only be available in select data centers, with more coming online in the near future. The following locations will offer public and private virtual server combinations configured with more than 16 cores or more than 64 GB RAM:

Locations of larger public and private virtual servers

For more information on virtual servers and for pricing, read here.

We are always interested to see how you are flying in the cloud and how these larger virtual servers help drive value for your business. Please connect with us on Twitter: @milan3patel and @conradjjohnson.

-Milan Patel

Categories: 
March 24, 2016

future.ready(): 7 Things to Check Off Your Big Data Development List

Frank Ketelaars, Big Data Technical Leader for Europe at IBM, offers a checklist that every developer should have pinned to their board when starting a big data project. Editor’s Note: Does your brain switch off when you hear industryspeak words like “innovation,” “transformation,” “leading edge,” “disruptive,” and “paradigm shift”? Go on, go ahead and admit it. Ours do, too. That’s why we’re launching the future.ready() series—consisting of blogs, podcasts, webinars, and Twitter chats— with content created by developers, for developers. Nothing fluffy, nothing buzzy. With the future.ready() series, we aim to equip you with tools and knowledge that you can use—not just talk and tweet about.

For the first edition, I’ve invited Frank Ketelaars, an expert in high volume data space, to walk us through seven things to check off when starting a big data development project.

-Michalina Kiera, SoftLayer EMEA senior marketing manager

 

This year, big data moves from a water cooler discussion to the to-do list. Gartner estimates that more than 75 percent of companies are investing or planning to invest in big data in the next two years.

I have worked on multiple high volume projects in industries that include banking, telecommunications, manufacturing, life sciences, and government, and in roles including architect, big data developer, and streaming analytics specialist. Based on my experience, here’s a checklist I put together that should give developers a good start. Did I miss anything? Join me on the Twitter chat or webinar to share your experience, ask questions, and discuss further. (See details below.)     

1. Team up with a person who has a budget and a problem you can solve.

For a successful big data project, you need to solve a business problem that’s keeping somebody awake at night. If there isn’t a business problem and a business owner—ideally one with a budget— your project won’t get implemented. Experimentation is important when learning any new technology. But before you invest a lot of time in your big data platform, find your sponsor. To do so, you’ll need to talk to everyone, including IT, business users, and management. Remember that the technical advantages of analytics at scale might not immediately translate into business value.

2. Get your systems ready to collect the data.

With additional data sources, such as devices, vehicles, and sensors connected to networks and generating data, the variety of information and transportation mechanisms has grown dramatically, posing new challenges for the collection and interpretation of data.

Big data often comes from sources outside the business. External data comes at you in a variety of formats (including XML, JSON, and binary), and using a variety of different APIs. In 2016, you might think that everyone is on REST and JSON, but think again: SOAP still exists! The variety of the data is the primary technical driver behind big data investments, according to a survey of 402 business and IT professionals by management consultancy NewVantage Partners[SM1] . From one day to the next, the API might change or a source might become unavailable.

Maybe one day we’ll see more standardization, but it won’t happen any time soon. For now, developers must plan to spend time checking for changes in APIs and data formats, and be ready to respond quickly to avoid service interruptions. And to expect the unexpected.

3. Make sure you have the right to use that data.

Governance is a business challenge, but it’s going to touch developers more than ever before—from the very start of the project. Much of the data they will be handling is unstructured, such as text records from a call center. That makes it hard to work out what’s confidential, what needs to be masked, and what can be shared freely with external developers. Data will need to be structured before it can be analyzed, but part of that process includes working out where the sensitive data is, and putting measures in place to ensure it is adequately protected throughout its lifecycle.

Developers need to work closely with the business to ensure that they can keep data safe, and provide end users with a guarantee that the right data is being analyzed and that its provenance can be trusted. Part of that process will be about finding somebody who will take ownership of the data and attest to its quality.

4. Pick the right tools and languages.

With no real standards in place yet, there are many different languages and tools used to collect, store, transport, and analyze big data. Languages include R, Python, Julia, Scala, and Go (plus the Java and C++ you might need to work with your existing systems). Technologies include Apache Pig, Hadoop, and Spark, which provide massive parallel processing on top of a file system without Hadoop. There’s a list of 10 popular big data tools here, another 12 here, and a round-up of 45 big data tools here. 451 Research has created a map that classifies data platforms according to the database type, implementation model, and technology. It’s a great resource, but its 18-color key shows how complex the landscape has become.

Not all of these tools and technologies will be right for you, but they hint at one way the developer’s core competency must change. Big data will require developers to be polyglots, conversant in perhaps five languages, who specialize in learning new tools and languages fast—not deep experts in one or two languages.

Nota bene: MapReduce and Pig are among the top highest paid technology skills in the US, and other big data skills are likely to be highly sought-after as the demand for them also grows. Scala is a relatively new functional programming language for data preparation and analysis, and I predict it will be in high demand in the near future.

5. Forget “off-the-shelf.” Experiment and set up a big data solution that fits your needs. 

You can think of big data analytics tools like Hadoop as a car. You want to go to the showroom, pay, get in, and drive away. Instead, you’re given the wheels, doors, windows, chassis, engine, steering wheel, and a big bag of nuts and bolts. It’s your job to assemble it.

As InfoWorld notes, DevOps tools can help to create manageable Hadoop solutions. But you’re still faced with a lot of pieces to combine, diverse workloads, and scheduling challenges.

When experimenting with concepts and technologies to solve a certain business problem, also think about successful deployment in the organization. The project does not stop after the proof.

6. Secure resources for changes and updates.

Apache Hadoop and Apache Spark are still evolving rapidly and it is inevitable that the behavior of components will change over time and some may get deprecated shortly after initial release. Implementing new releases will be painful, and developers will need to have an overview of the big data infrastructure to ensure that as components change, their big data projects continue to perform as expected.

The developer team must plan time for updates and deprecated features, and a coordinated approach will be essential for keeping on top of the change.

7. Use infrastructure that’s ready for CPU and I/O intensive workloads.

My preferred definition of big data (and there are many – Forbes found 12) is this: "Big data is when you can no longer afford to bring the data to the processing, and you have to do the processing where the data is."

In traditional database and analytics applications, you get the data, load it onto your reporting server, process it, and post the results to the database.

With big data, you have terabytes of data, which might reside in different places—and which might not even be yours to move. Getting it to the processor is impractical. Big data technologies like Hadoop are based on the concept of data locality—doing the processing where the data resides.

You can run Hadoop in a virtualized environment. Virtual servers don’t have local data, though, so the time taken to transport data between the SAN or other storage device and the server hurts the application’s performance. Noisy neighbors, unpredictable server speeds and contested network connections can have a significant impact on performance in a virtualized environment. As a result, it’s difficult to offer service level agreements (SLAs) to end users, which makes it hard for them to depend on your big data implementations.

The answer is to use bare metal servers on demand, which enable you to predict and guarantee the level of performance your application can achieve, so you can offer an SLA with confidence. Clusters can be set up quickly, so you can accelerate your project really fast. Because performance is predictable and consistent, it’s possible to offer SLAs to business owners that will encourage them to invest in the big data project and rely on it for making business decisions.

How can I learn more?

Join me in the Twitter chat and webinar (details below) to discuss how you’re addressing big data or have your questions answered by me and my guests.  

Add our Twitter chat to your calendar. It happens Thursday, March 31 at 1 p.m. CET. Use the hashtag #SLdevchat to share your views or post your questions to me.

Register for the webinar on Wednesday, Apr 20, at 5 p.m. to 6 p.m. CET.

 

About the author

Frank Ketelaars has been Big Data Technical Leader in Europe for IBM since August 2013. As an architect, big data developer, and streaming analytics specialist, he has worked on multiple high volume projects in banking, telecommunications, manufacturing, life sciences and government. He is a specialist in Hadoop and real-time analytical processing.


 

October 24, 2014

SoftLayer at IBM Insight 2014

IBM will be lighting up Las Vegas next week with Insight 2014, the conference for big data and analytics. Starting this Sunday and running through Thursday, October 30 at the Mandalay Bay, this show will offer amazing opportunities to learn more about the advantages of delivering big data and analytics services, and many of those advantages involve the SoftLayer cloud platform.

To guide you through the 700+ sessions and streams easier, we’ve compiled a list of must-attend SoftLayer- and cloud-based sessions.

Business Partner Summit

Breakout Session 7157: Partner with SoftLayer for Your Big Data and Analytics Workloads
Sunday, October 26 @ 2:00 p.m. – Tradewinds A (For Business Partners only)
Featured Speakers: Anand Mahurkar, founder and CEO Findability Sciences, and Guy Kurtz, IBM North America Channel Sales Leader

General Conference Sessions

BPM-6838A: Experience Faster Time to Value with IBM Cognos TM1 on Cloud
Monday, October 27 @ 10:15 a.m. – Mandalay Bay J
Learn how the SoftLayer infrastructure with IBM Cognos TM1 can help you gain better performance, operational savings, reliability, and scalability.

IIS-5758A: How Joy Global is Using Big Data as a Differentiator in the Mining Industry
Monday, October 27 @ 10:15 a.m. – Jasmine B
We’ll dig deep to learn how Joy Global runs one of the most sophisticated big data platforms in the industry hosted by a combination of SoftLayer and IBM Global Business Services.

IDB-4741C: Accelerate Social Media Analytics for Big Insight with IBM DB2 BLU and IBM InfoSphere Optim Database Tools
Monday, October 27 @ 3:30 p.m. – Jasmine F
Can the right combination of technologies help accelerate a social media analytics application hosted on SoftLayer? Yes.

EEP-5498A: Industry Leaders, IBM ECM and SoftLayer Deliver Trusted Content Anywhere with IBM Navigator on Cloud
Tuesday, October 28 @ 1:45 p.m. – Lagoon H
Extend ECM to the SoftLayer cloud platform by leveraging IBM’s pervasive ECM experience platform, IBM Content Navigator.

FTC-4285A: Data Warehousing and Analytics in the Cloud: IBM's New Data Warehousing Service
Tuesday, October 28 @ 3:00 p.m. – Islander E
Combining the best of BLU Acceleration, Netezza Technology, and SoftLayer, come see how Data Warehousing Service can be used to provide analytics for existing cloud-based data stores.

LCE-5575A: Building a Robust ECM Solution Step-by-Step
Wednesday, October 29 @ 2:00 p.m. – Shorelines B Lab Room 11
A step-by-step guide to building an ECM solution on a SoftLayer platform.

EEP-7001A: Expert Exchange: ECM in the Cloud
Wednesday, October 29 @ 4:30 p.m. – Breakers E
Meet the ECM development team and learn how they designed and deployed Navigator Cloud Edition on SoftLayer.

III-5198A: Using IBM Bluemix and SoftLayer to Run IBM InfoSphere Information Server on an Intel Technology-Powered Cloud
Thursday, October 30 @ 10:00 a.m. – Jasmine E
Learn how InfoSphere Information Server works in the cloud, and how SoftLayer bare metal and virtualization options contribute to the scaling performance.

LCI-5234A: On-Demand Data Archiving with Cloud-based Data Warehousing Services
Thursday, October 30 @ 10:00 a.m. – Shorelines B Lab Room 2
This lab will showcase the entire BLU Acceleration as a Cloud solution using SoftLayer.

If you’re a registered attendee and haven’t already done so, visit the IBM Insight 2014 website for complete descriptions of all sessions, and start building your agenda.

And don’t forget to stop by the SoftLayer pedestal in the IBM Cloud booth #515. We look forward to seeing you.

-Ted

Subscribe to big data