Apache Spark 2.0.0 ·¢²¼ÁË£¬Apache Spark ÊÇÒ»ÖÖÓë Hadoop ÏàËÆµÄ¿ªÔ´¼¯Èº¼ÆËã»·¾³£¬µ«ÊÇÁ½ÕßÖ®¼ä»¹´æÔÚһЩ²»Í¬Ö®´¦£¬ÕâЩÓÐÓõIJ»Í¬Ö®´¦Ê¹ Spark ÔÚijЩ¹¤×÷¸ºÔØ·½Ãæ±íÏֵøü¼ÓÓÅÔ½£¬»»¾ä»°Ëµ£¬Spark ÆôÓÃÁËÄÚ´æ·Ö²¼Êý¾Ý¼¯£¬³ýÁËÄܹ»Ìṩ½»»¥Ê½²éѯÍ⣬Ëü»¹¿ÉÒÔÓÅ»¯µü´ú¹¤×÷¸ºÔØ¡£
¸Ã°æ±¾Ö÷Òª¸üÐÂAPIs£¬Ö§³ÖSQL 2003£¬Ö§³ÖR UDF £¬ÔöÇ¿ÆäÐÔÄÜ¡£300¸ö¿ª·¢Õß¹±Ï×ÁË2500²¹¶¡³ÌÐò¡£
Apache Spark 2.0.0 APIs¸üмǼÈçÏ£º
Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface.
SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.
A new, streamlined configuration API for SparkSession
Simpler, more performant accumulator API
A new, improved Aggregator API for typed aggregation in Datasets
Apache Spark 2.0.0 SQL¸üмǼÈçÏ£º
A native SQL parser that supports both ANSI-SQL as well as Hive QL
Native DDL command implementations
Subquery support, including
Uncorrelated Scalar Subqueries
Correlated Scalar Subqueries
NOT IN predicate Subqueries (in WHERE/HAVING clauses)
IN predicate subqueries (in WHERE/HAVING clauses)
(NOT) EXISTS predicate subqueries (in WHERE/HAVING clauses)
View canonicalization support
һЩÐÂÌØÐÔ£º
Native CSV data source, based on Databricks¡¯ spark-csv module
Off-heap memory management for both caching and runtime execution
Hive style bucketing support
Approximate summary statistics using sketches, including approximate quantile, Bloom filter, and count-min sketch.
ÐÔÄÜÔöÇ¿£º
Substantial (2 - 10X) performance speedups for common operators in SQL and DataFrames via a new technique called whole stage code generation.
Improved Parquet scan throughput through vectorization
Improved ORC performance
Many improvements in the Catalyst query optimizer for common workloads
Improved window function performance via native implementations for all window functions
Automatic file coalescing for native data sources
Èí¼þÏêÇ飺http://spark.apache.org/releases/spark-release-2-0-0.html
ÏÂÔØµØÖ·£ºhttp://spark.apache.org/downloads.html
À´×Ô:¿ªÔ´ÖйúÉçÇø

