The marketing department of software vendors have done a good job making Big Data go mainstream, whatever that means. The promise of we can achieve anything if we make use of Big Data; business insight and beating our competitions to submission. Yet, there is no well-publicised Big Data successful implementation. The question is: why not? Clearly this silver bullet where businesses have seen billions of dollars invested in but no return on investment! Who is to blame? After all, businesses do not have to publicise their internal processes or projects. I have a different view to that and the cause is on the IT department. Most Big Data projects are driven by the technologist not the business there is create lack of understanding in aligning the architecture with the business vision for the future.
The Preliminary Phase
Big Data projects are not different to any other IT projects. All projects spur out of business needs / requirements. This is not The Matrix; we cannot answer questions which have not been asked yet. Before any work begin or discussion around which technology to use, all stakeholders need to have an understanding of:- The organisational context
- The key drivers and elements of the organisation
- The requirements for architecture work
- The architecture principles
- The framework to be used
- The relationships between management frameworks
- The enterprise architecture maturity
- Strategies and business plans
- Business principles, goals, and drivers
- Major framework currently implemented in the business
- Governance and legal frameworks
- IT strategy
- Pre-existing Architecture Framework, Organisational Model, and Architecture repository
The Big Data Continuum
Big Data projects are not and should never been executed in isolation. The simple fact that Big Data need to feed from other system means there should a channel of communication open across teams. In order to have a successful architecture, I came up with five simple layers/ stacks to Big Data implementation. To the more technically inclined architect, this would seem obvious:- Data sources
- Big Data ETL
- Data Services API
- Application
- User Interface Services
Big Data Protocol Stack |
Data Sources
Current and future applications will produce more and more data which will need to be process in order to gain any competitive advantages from them. Data comes in all sorts but we can categorise them into two:- Structured data – usually stored following a predefined formats such as using known and proven database techniques. Not all structured data are stored in database as there are many businesses using flat files such as Microsoft Excel or Tab Delimited files for storing data
- Unstructured data – businesses generates great amount of unstructured data such emails, instant messaging, video conferencing, internet, flat files such documents and images, and the list is endless. We call the data "unstructured" as they do not follow a format which will make facilitate a user to query its content.
Big Data ETL
This is the part that excites technologists and especially the development teams. There are so many blogs and articles published every day about Big Data tools that this creates confusions among non-tech people. Everybody is excited about processing petabytes of data using the coolest kid on the block: Hadoop and its ecosystem. Before we get carried away, we first need to put some baseline in place:- Real-time processing
- Batch processing
Big Data - Data Consolidation |
Data Services API
As most of the limelight goes to the tools for ETL, a very important area is usually overlooked until later almost as a secondary thought. MDM will need to be stored in a repository in order for the information to be retrieve when needed. In a true Service Oriented Architecture spirit, the data repository should be able to expose some interfaces to external third party applications for data retrieval and manipulation. In the past, MDM were mostly created in RDBMS and retrieval and manipulation were carried out through the use of the Structured Query Language. Well this does not have to change but architects should be aware of other forms of database such NoSQL types. The following questions should be asked when choosing a database solution:- Is there are standard query language
- How do we connect to the database; DB drivers or available web services
- Will the database scale when the data grows
- What security mechanism are in place for protecting some or whole data
Hello,
ReplyDeleteThe Article on Big Data Architecture Best Practices is nice.It give detail information about it .Thanks for Sharing the information about Big Data Architecture.data science consulting
Thank you very much for this great post. hikedatabase.com/united-states/hiking-in-rhode-island/
ReplyDeleteThank you for all of your work on this web page.
ReplyDeleteทางเข้าLSM99
ทางเข้าLSM99
ทางเข้าLSM99
that’s awesome.
ReplyDeleteทางเข้าLSM99
ทางเข้าLSM99
ทางเข้าLSM99
it’s so amazing.
ReplyDeleteทางเข้าLSM99
ทางเข้าLSM99
ทางเข้าLSM99
ทางเข้าLSM99
I conceive you have noted some very interesting points, regards for the post.
ReplyDeleteสมัครW88
I guess the combination of big data,Power BI and Python is all set to gear it up and dominate the market with full force.
ReplyDeletePowerbi Read Soap