New Big Data Storage Technologies: From Hadoop to Graph Databases, and from NoSQL to NewSQL – Live Streaming only
Speaker: Rick van der Lans
10 November 2020
This course has already taken place
LIVE STREAMING - £695 + VAT (£139) = £834
All public courses are available as in-house training. Contact us for more information.
Overview
Part of the Data Ed Week Europe
Big data, analytical database servers, Hadoop, NoSQL, Spark, MapReduce, SQL-on-Hadoop, translytical databases, and appliances are all immensely popular terms in the IT industry today. Due to this avalanche of new developments, it’s becoming harder and harder for organisations to select the right tools. Which technologies are relevant? Are they mature? What are their use cases? Are they worthy replacements for the more traditional SQL products? How should they be incorporated in the existing data warehouse architecture?” These are all valid but difficult to answer questions. This tutorial discusses and explains these new data storage technologies clearly and explains why and how they can be relevant for any organisation. Market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are discussed. It is intended for anyone who has to stay up to date and implement the new developments, including data warehouse designers, business intelligence experts, database specialists, database experts, consultants, and technology planners.
All public courses are available as in-house training. Contact us for more information.
Learning Objectives
- Why traditional database technology is not “big” enough
- How different are Hadoop and NoSQL form traditional technology
- How new and existing technologies such as Hadoop, NoSQL, and NewSQL can help develop BI and big data systems
- How to embed Hadoop technologies in existing BI systems
- How Spark can boost performance for analytics
- How to distinguish between three NoSQL subcategories: key-value, document, and column-family stores
- Why graph databases are very different from all other systems
- When to use NewSQL or NoSQL for developing transactional systems
- How to simplify data access through SQL-on-Hadoop engines
- When to use which new data storage technology and the pros and cons of each solution
- Which products and technologies are winners and which are losers
Course Outline
Big Data: State of the art
-
What exactly do we mean with big data?
-
The key application area of big data: business analytics
-
Differences between semi-structured, poly-structured, multi-structured, and unstructured data
-
Examples of big data: sensor data, (micro-)event data, textual data, and clickstream data
Analytical SQL Database Servers
-
Classification of analytical SQL database servers, and can they compete with NoSQL products?
-
The advantages and disadvantages of column-based database servers
-
How important is in-database analytics?
-
Is loading databases into internal memory the solution? Is it feasible?
-
Market overview, including Exasol, HP/Vertica, IBM PureData Systems for Analytics, Actian Matrix and Vector, Kognitio WX2, Oracle Exalytics, SAP HANA, Teradata Appliances, and Teradata Aster Database
The World of Hadoop
-
The Hadoop stack explained: HDFS, MapReduce, Spark, Hive, HBase, YARN, ZooKeeper, Pig, HCatalog, and so on
-
Characteristics and consequences of HDFS and file formats
-
Alternative implementations by MapR, Amazon, and ScaleOut (Hadoop in-memory)
-
Use of MapReduce for analytics and reporting
-
Storm for streaming data
-
The role of Cloudera, HortonWorks, and MapR
NoSQL Database Stores
-
Classification of NoSQL products: key-values stores, document stores, column-family stores, and graph data stores
-
It’s all about data scalability and performance
-
Why is schema-on-read more flexible than schema-on-write?
-
Are NoSQL products really database servers?
-
Market overview, including Apache HBase and CouchDB, Cassandra, Cloudera, DataStax, InfiniteGraph, Riak, MongoDB, and Neo4J
Exploring Data in Hadoop Using SQL
-
Making Hadoop data available for reporting and analysis through SQL-on-Hadoop engines
-
Examples of SQL-on-Hadoop engines, including Apache Drill, Apache Hive, Apache Phoenix, Cloudera Impala, HP Vertica, JethroData, MemSQL, Pivotal HawQ, Spark SQL and Splice Machine
-
Data virtualization for unleashing the information hidden in NoSQL and SQL systems
NewSQL database servers for transaction workloads
-
NewSQL database servers are designed for high-performance transactional systems
-
Simpler transaction mechanisms
-
The challenge of multi-table joins
-
Market overview, including Akiban, CitusDB, Clustrix, MariaDB, NuoDB, TransLattice, VMware SQLFire and VoltDB
Concluding Remarks
Who It's For
- IT Architects
- Database Specialists
- Big Data Specialists
- BI Specialists
- Data Warehouse Designers
- Technology Planners
- Technical Architects
- Enterprise Architects
- IT Consultants
- IT Strategists
- Systems Analysts
- Database Developers
- Database Administrators
- Solutions Architects
- Data Architects
Speaker
Rick van der Lans
Independent Analyst, Consultant, Author and Lecturer
R20/Consultancy
Rick van der Lans is a highly respected independent analyst, consultant, author, and internationally acclaimed lecturer specialising in data architectures, data warehousing, business intelligence, big data, and database technology. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com. He has presented countless seminars, webinars, and keynotes at industry-leading conferences. For many years, he served as the chairman of the annual European Enterprise Data and Business Intelligence Conference in London and the annual Data Warehousing and Business Intelligence Summit in The Netherlands. Rick helps clients worldwide to design their data warehouse, big data, and business intelligence architectures and solutions and assists them with selecting the right products. He has been influential in introducing the new logical data warehouse architecture worldwide, which helps organisations to develop more agile business intelligence systems. Over the years, Rick has written hundreds of articles and blogs for newspapers and websites and has authored many educational and popular white papers for a long list of vendors. He was the author of the first available book on SQL, Introduction to SQL, which has been translated into several languages with more than 100,000 copies sold. Recently published books are Data Virtualisation for Business Intelligence Systems and Data Virtualization: Selected Writings He presents seminars, keynotes, and in-house sessions on data architectures, big data and analytics, data virtualization, the logical data warehouse, data warehousing and business intelligence.
IRM UK Public Courses via Live Streaming:
Practical Guidelines for Designing Modern Data Architectures
New Big Data Storage Technologies: From Hadoop to Graph Databases, and from NoSQL to NewSQL
Fees
- 1 day
- £695
- LIVE STREAMING - £695 + VAT (£139) = £834
Group Booking Discounts
Delegates | |
---|---|
2-3 | 10% discount |
4-5 | 20% discount |
6+ | 25% discount |
Cancellation Policy:
Cancellations must be received in writing at least two weeks before the commencement of the seminar and will be subject to a 10% administration fee. It is regretted that cancellations received within two weeks of the seminar date will be liable for the full seminar fee. Substitutions can be made at any time.
Cancellation Liability:
In the unlikely event of cancellation of the seminar for any reason, IRM UK’s liability is limited to the return of the registration fee only. It may be necessary, for reasons beyond the control of IRM UK, to change the content, timings, speakers and date of the seminar.