Big Data is typically defined as high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.
Yet, despite clear business needs to analyze, and react to, real-time data, the "High Velocity" aspect has been mostly missing in currently available solutions. For a variety of reasons, neither the existing Data Warehouse nor Hadoop-based platforms can satisfy the need for real-time access and analysis of current, operational data.
A new generation of real-time, operational data virtualization and stream processing platforms is needed.
Compelling Business Case
Business need for Operational Intelligence is not exactly new, but so far, has not been really satisfied. Being able to analyze ever-changing stream of business activities (orders, trades, etc.) and external stimuli (market data , weather conditions, customer behavior/sentiment, etc.) on-the fly and to react to them instantaneously can provide significant time-based competitive advantage (first-mover to changing market conditions/customer wishes, etc).
There are many examples of applications where Fast Data can drive real-time business decision making:
- Dynamic pricing (e-commerce)
- High-frequency trading
- Network security threats
- Credit card fraud prevention
- Factory floor data collection, RFID
- Mobile infrastructure, machine to machine (M2M) applications
- Prescriptive or Location-based applications
- Real-time dashboards, alerts, and reports
The need to process real time data stream on the fly to predict if a given transaction is a fraud, or if there is a network security threat is critical, because if decisions to address the threats are not taken in real time, the opportunity to mitigate the damage is lost.
Characteristics of a Fast Data platform
Here are the necessary traits of a successful Fast Data platform:
- Architectural simplicity and elegance. Pays off in lower Total Cost of Ownership (TCO)
- Elastic scalability. Additional compute nodes should be allocated as needed
- Low Transactional Latency. In-memory transaction speeds, response im miliseconds.
- Low Data Latency. Decision-making based on real-time data, rather than stale data, weeks old
- High Throughput Data Stream/Event Processing.
- High Scalability. Both horizontal and vertical.
- High Availability and Fault Tolerance. No single point of failure, partitioned data with replicas.
- Accessibility & Interactivity. All authorized business users, should be able to interact and issue queries on demand.
Current solutions supporting Fast Data
Most solutions available today do not meet Fast Data criteria, due to both high data latency, as well as high transactional latency. There are a number of vendors currently building their Fast Data stacks. An analysis of current Fast Data offering will be discussed in an upcoming blog post.