红联Linux门户
Linux帮助

Pachyderm 1.1发布,容器化的数据池

发布时间:2016-08-12 08:36:43来源:红联作者:baihuo
Pachyderm 1.1 在 7 月份时候就发布了,Pychyderm 是一个容器化的数据池,可以让你使用容器来存储和分析数据。

该版本包含众多改进内容,详细列表如下:

特征:

Data Provenance, which tracks the flow of data as it’s analyzed

FlushCommit, which tracks commits forward downstream results computed from them

DeleteAll, which restores the cluster to factory settings

More featureful data partitioning (map, reduce and global methods)

Explicit incrementality

Better support for dynamic membership (nodes leaving and entering the cluster)

Commit IDs are now present as env vars for jobs

Deletes and reads now work during job execution

pachctl inspect-* now returns much more information about the inspected objects

PipelineInfos now contain a count of job outcomes for the pipeline

Fixes to pachyderm and bazil.org/fuse to support writing a larger number of files

Jobs now report their end times as well as their start times

Jobs have a pulling state for when the container is being pulled

Put-file now accepts a -f flag for easier puts

Cluster restarts now work, even if kubernetes is restarted as well

Support for json and binary delimiters in data chunking

Manifests now reference specific pachyderm container version making deployment more bulletproof

Readiness checks for pachd which makes deployment more bulletproof

Kubernetes jobs are now created in the same namespace pachd is deployed in

Support for pipeline DAGs that aren’t transitive reductions.

Appending to files now works in jobs, from shell scripts you can do >>

Network traffic is reduced with object stores by taking advantage of content addressability

Transforms now have a Debug field which turns on debug logging for the job

Pachctl can now be installed via Homebrew on macOS or apt on Ubuntu

ListJob now orders jobs by creation time

Openshift Origin is now supported as a deployment platform

内容:

Webscraper example

Neural net example with Tensor Flow

Wordcount example

Bug 修复:

False positive on running pipelines

Makefile bulletproofing to make sure things are installed when they’re needed

Races within the FUSE driver

In 1.0 it was possible to get duplicate job ids which, that should be fixed now

Pipelines could get stuck in the pulling state after being recreated several times

Map jobs no longer return when sharded unless the files are actually empty

The fuse driver could encounter a bounds error during execution, no longer

Pipelines no longer get stuck in restarting state when the cluster is restarted

Failed jobs were being marked failed too early resulting in a race condition

Jobs could get stuck in running when they had failed

Pachd could panic due to membership changes

Starting a commit with a nonexistant parent now errors instead of silently failing

Previously pachd nodes would crash when deleting a watched repo

Jobs now get recreated if you delete and recreate a pipeline

Getting files from non existant commits gives a nicer error message

RunPipeline would fail to create a new job if the pipeline had already run

FUSE no longer chokes if a commit is closed after the mount happened

GCE/AWS backends have been made a lot more reliable

Tests:

From 1.0.0 to 1.1.0 we’ve gone from 70 tests to 120, a 71% increase.

下载地址:https://github.com/pachyderm/pachyderm

来自:开源中国社区
文章评论

共有 0 条评论