ºìÁªLinuxÃÅ»§
Linux°ïÖú

Apache Spark 1.5.0Õýʽ·¢²¼

·¢²¼Ê±¼ä:2015-09-09 21:16:36À´Ô´:ºìÁª×÷Õß:empast
Spark 1.5.0 ÊÇ 1.x ϵÁеĵÚÁù¸ö°æ±¾£¬ÊÕµ½ 230+ λ¹±Ï×ÕßºÍ 80+ »ú¹¹µÄŬÁ¦£¬×ܹ² 1400+ patches¡£ÖµµÃ¹Ø×¢µÄ¸Ä½øÈçÏ£º

APIs£ºRDD, DataFrame ºÍ SQL

ºó¶ËÖ´ÐУºDataFrame ºÍ SQL

¼¯³É£ºÊý¾ÝÔ´£¬Hive, Hadoop, Mesos ºÍ¼¯Èº¹ÜÀí

R ÓïÑÔ

»úÆ÷ѧϰºÍ¸ß¼¶·ÖÎö

Spark Streaming

Deprecations, Removals, Configs ºÍ Behavior ¸Ä½ø

Spark Core

Spark SQL & DataFrames

Spark Streaming

MLlib

ÒÑÖªÎÊÌâ½â¾ö

SQL/DataFrame

Streaming

Credits

ÐÂÌØÐÔÁÐ±í£º

[SPARK-1855] - Provide memory-and-local-disk RDD checkpointing

[SPARK-4176] - Support decimals with precision > 18 in Parquet

[SPARK-4751] - Support dynamic allocation for standalone mode

[SPARK-4752] - Classifier based on artificial neural network

[SPARK-5133] - Feature Importance for Random Forests

[SPARK-5155] - Python API for MQTT streaming

[SPARK-5962] - [MLLIB] Python support for Power Iteration Clustering

[SPARK-6129] - Create MLlib metrics user guide with algorithm definitions and complete code examples.

[SPARK-6390] - Add MatrixUDT in PySpark

[SPARK-6487] - Add sequential pattern mining algorithm PrefixSpan to Spark MLlib

[SPARK-6813] - SparkR style guide

[SPARK-6820] - Convert NAs to null type in SparkR DataFrames

[SPARK-6833] - Extend `addPackage` so that any given R file can be sourced in the worker before functions are run.

[SPARK-6964] - Support Cancellation in the Thrift Server

[SPARK-7083] - Binary processing dimensional join

[SPARK-7254] - Extend PIC to handle Graphs directly

[SPARK-7293] - Report memory used in aggregations and joins

[SPARK-7368] - add QR decomposition for RowMatrix

[SPARK-7387] - CrossValidator example code in Python

[SPARK-7422] - Add argmax to Vector, SparseVector

[SPARK-7440] - Remove physical Distinct operator in favor of Aggregate

[SPARK-7547] - Example code for ElasticNet

[SPARK-7604] - Python API for PCA and PCAModel

[SPARK-7605] - Python API for ElementwiseProduct

[SPARK-7639] - Add Python API for Statistics.kernelDensity

[SPARK-7690] - MulticlassClassificationEvaluator for tuning Multiclass Classifiers

[SPARK-7879] - KMeans API for spark.ml Pipelines

[SPARK-7888] - Be able to disable intercept in Linear Regression in ML package

[SPARK-7988] - Mechanism to control receiver scheduling

[SPARK-8019] - [SparkR] Create worker R processes with a command other then Rscript

[SPARK-8124] - Created more examples on SparkR DataFrames

[SPARK-8129] - Securely pass auth secrets to executors in standalone cluster mode

[SPARK-8169] - Add StopWordsRemover as a transformer

[SPARK-8302] - Support heterogeneous cluster nodes on YARN

[SPARK-8313] - Support Spark Packages containing R code with --packages

[SPARK-8344] - Add internal metrics / logging for DAGScheduler to detect long pauses / blocking

[SPARK-8348] - Add in operator to DataFrame Column

[SPARK-8364] - Add crosstab to SparkR DataFrames

[SPARK-8431] - Add in operator to DataFrame Column in SparkR

[SPARK-8446] - Add helper functions for testing physical SparkPlan operators

[SPARK-8456] - Python API for N-Gram Feature Transformer

[SPARK-8479] - Add numNonzeros and numActives to linalg.Matrices

[SPARK-8484] - Add TrainValidationSplit to ml.tuning

[SPARK-8522] - Disable feature scaling in Linear and Logistic Regression

[SPARK-8538] - LinearRegressionResults class for storing LR results on data

[SPARK-8539] - LinearRegressionSummary class for storing LR training stats

[SPARK-8551] - Python example code for elastic net

[SPARK-8564] - Add the Python API for Kinesis

[SPARK-8579] - Support arbitrary object in UnsafeRow

[SPARK-8598] - Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs

[SPARK-8600] - Naive Bayes API for spark.ml Pipelines

[SPARK-8671] - Add isotonic regression to the pipeline API

[SPARK-8704] - Add missing methods in StandardScaler (ML and PySpark)

[SPARK-8706] - Implement Pylint / Prospector checks for PySpark

[SPARK-8711] - Add additional methods to JavaModel wrappers in trees

[SPARK-8774] - Add R model formula with basic support as a transformer

[SPARK-8777] - Add random data generation test utilities to Spark SQL

[SPARK-8782] - GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)

[SPARK-8798] - Allow additional uris to be fetched with mesos

[SPARK-8807] - Add between operator in SparkR

[SPARK-8847] - String concatination with column in SparkR

[SPARK-8867] - Show the UDF usage for user.

[SPARK-8874] - Add missing methods in Word2Vec ML

[SPARK-8882] - A New Receiver Scheduling Mechanism

[SPARK-8936] - Hyperparameter estimation in LDA

[SPARK-8967] - Implement @since as an annotation

[SPARK-8996] - Add Python API for Kolmogorov-Smirnov Test

[SPARK-9022] - UnsafeProject

[SPARK-9023] - UnsafeExchange

[SPARK-9024] - Unsafe HashJoin

[SPARK-9028] - Add CountVectorizer as an estimator to generate CountVectorizerModel

[SPARK-9112] - Implement LogisticRegressionSummary similar to LinearRegressionSummary

[SPARK-9115] - date/time function: dayInYear

[SPARK-9143] - Add planner rule for automatically inserting Unsafe <-> Safe row format converters

[SPARK-9178] - UTF8String empty string method

[SPARK-9201] - Integrate MLlib with SparkR using RFormula

[SPARK-9230] - SparkR RFormula should support StringType features

[SPARK-9231] - DistributedLDAModel method for top topics per document

[SPARK-9245] - DistributedLDAModel predict top topic per doc-term instance

[SPARK-9246] - DistributedLDAModel predict top docs per topic

[SPARK-9263] - Add Spark Submit flag to exclude dependencies when using --packages

[SPARK-9381] - Migrate JSON data source to the new partitioning data source

[SPARK-9391] - Support minus, dot, and intercept operators in SparkR RFormula

[SPARK-9440] - LocalLDAModel should save docConcentration, topicConcentration, and gammaShape

[SPARK-9464] - Add property-based tests for UTF8String

[SPARK-9471] - Multilayer perceptron classifier

[SPARK-9544] - RFormula in Python

[SPARK-9657] - PrefixSpan getMaxPatternLength should return an Int

[SPARK-10106] - Add `ifelse` Column function to SparkR

Apache Spark ÊÇÒ»ÖÖÓë Hadoop ÏàËÆµÄ¿ªÔ´¼¯Èº¼ÆËã»·¾³£¬µ«ÊÇÁ½ÕßÖ®¼ä»¹´æÔÚһЩ²»Í¬Ö®´¦£¬ÕâЩÓÐÓõIJ»Í¬Ö®´¦Ê¹ Spark ÔÚijЩ¹¤×÷¸ºÔØ·½Ãæ±íÏֵøü¼ÓÓÅÔ½£¬»»¾ä»°Ëµ£¬Spark ÆôÓÃÁËÄÚ´æ·Ö²¼Êý¾Ý¼¯£¬³ýÁËÄܹ»Ìṩ½»»¥Ê½²éѯÍ⣬Ëü»¹¿ÉÒÔÓÅ»¯µü´ú¹¤×÷¸ºÔØ¡£

Spark ÊÇÔÚ Scala ÓïÑÔÖÐʵÏֵģ¬Ëü½« Scala ÓÃ×÷ÆäÓ¦ÓóÌÐò¿ò¼Ü¡£Óë Hadoop ²»Í¬£¬Spark ºÍ Scala Äܹ»½ôÃܼ¯³É£¬ÆäÖÐµÄ Scala ¿ÉÒÔÏñ²Ù×÷±¾µØ¼¯ºÏ¶ÔÏóÒ»ÑùÇáËɵزÙ×÷·Ö²¼Ê½Êý¾Ý¼¯¡£

¾¡¹Ü´´½¨ Spark ÊÇΪÁËÖ§³Ö·Ö²¼Ê½Êý¾Ý¼¯Éϵĵü´ú×÷Òµ£¬µ«ÊÇʵ¼ÊÉÏËüÊÇ¶Ô Hadoop µÄ²¹³ä£¬¿ÉÒÔÔÚ Hadoo ÎļþϵͳÖв¢ÐÐÔËÐС£Í¨¹ýÃûΪ Mesos µÄµÚÈý·½¼¯Èº¿ò¼Ü¿ÉÒÔÖ§³Ö´ËÐÐΪ¡£Spark ÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУ AMP ʵÑéÊÒ (Algorithms, Machines, and People Lab) ¿ª·¢£¬¿ÉÓÃÀ´¹¹½¨´óÐ͵ġ¢µÍÑÓ³ÙµÄÊý¾Ý·ÖÎöÓ¦ÓóÌÐò¡£

Èí¼þÏêÇ飺http://spark.apache.org/releases/spark-release-1-5-0.html

ÏÂÔØµØÖ·£ºhttp://www.apache.org/dyn/closer.lua/spark/spark-1.5.0/spark-1.5.0.tgz

À´×Ô:¿ªÔ´ÖйúÉçÇø
ÎÄÕÂÆÀÂÛ

¹²ÓÐ 0 ÌõÆÀÂÛ