Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
| Download
Project: Testing 18.04
Path: dask.ipynb
Views: 1944Kernel: Python 3 (system-wide)
Dask in Python 3 (Ubuntu Linux)
In [1]:
Out[1]:
'2.21.0'
In [2]:
Out[2]:
'2.21.0'
In [3]:
Out[3]:
<dask.config.set at 0x7f29fc3ea2e8>
In [4]:
Out[4]:
{'temporary-directory': '/home/user/tmp',
'dataframe': {'shuffle-compression': None},
'array': {'svg': {'size': 120}},
'optimization': {'fuse': {'active': True,
'ave-width': 1,
'max-width': None,
'max-height': inf,
'max-depth-new-edges': None,
'subgraphs': None,
'rename-keys': True}},
'distributed': {'version': 2,
'scheduler': {'allowed-failures': 3,
'bandwidth': 100000000,
'blocked-handlers': [],
'default-data-size': '1kiB',
'events-cleanup-delay': '1h',
'idle-timeout': None,
'transition-log-length': 100000,
'work-stealing': True,
'work-stealing-interval': '100ms',
'worker-ttl': None,
'pickle': True,
'preload': [],
'preload-argv': [],
'unknown-task-duration': '500ms',
'default-task-durations': {'rechunk-split': '1us', 'shuffle-split': '1us'},
'validate': False,
'dashboard': {'status': {'task-stream-length': 1000},
'tasks': {'task-stream-length': 100000},
'tls': {'ca-file': None, 'key': None, 'cert': None},
'bokeh-application': {'allow_websocket_origin': ['*'],
'keep_alive_milliseconds': 500,
'check_unused_sessions_milliseconds': 500}},
'locks': {'lease-validation-interval': '10s', 'lease-timeout': '30s'},
'http': {'routes': ['distributed.http.scheduler.prometheus',
'distributed.http.scheduler.info',
'distributed.http.scheduler.json',
'distributed.http.health',
'distributed.http.proxy',
'distributed.http.statics']}},
'worker': {'blocked-handlers': [],
'multiprocessing-method': 'spawn',
'use-file-locking': True,
'connections': {'outgoing': 50, 'incoming': 10},
'preload': [],
'preload-argv': [],
'daemon': True,
'validate': False,
'lifetime': {'duration': None, 'stagger': '0 seconds', 'restart': False},
'profile': {'interval': '10ms', 'cycle': '1000ms', 'low-level': False},
'memory': {'target': 0.6, 'spill': 0.7, 'pause': 0.8, 'terminate': 0.95},
'http': {'routes': ['distributed.http.worker.prometheus',
'distributed.http.health',
'distributed.http.statics']}},
'nanny': {'preload': [], 'preload-argv': []},
'client': {'heartbeat': '5s', 'scheduler-info-interval': '2s'},
'deploy': {'lost-worker-timeout': '15s', 'cluster-repr-interval': '500ms'},
'adaptive': {'interval': '1s',
'target-duration': '5s',
'minimum': 0,
'maximum': inf,
'wait-count': 3},
'comm': {'retry': {'count': 0, 'delay': {'min': '1s', 'max': '20s'}},
'compression': 'auto',
'offload': '10MiB',
'default-scheme': 'tcp',
'socket-backlog': 2048,
'recent-messages-log-length': 0,
'zstd': {'level': 3, 'threads': 0},
'timeouts': {'connect': '10s', 'tcp': '30s'},
'require-encryption': None,
'tls': {'ciphers': None,
'ca-file': None,
'scheduler': {'cert': None, 'key': None},
'worker': {'key': None, 'cert': None},
'client': {'key': None, 'cert': None}}},
'dashboard': {'link': '{scheme}://{host}:{port}/status',
'export-tool': False,
'graph-max-items': 5000},
'admin': {'tick': {'interval': '20ms', 'limit': '3s'},
'max-error-length': 10000,
'log-length': 10000,
'log-format': '%(name)s - %(levelname)s - %(message)s',
'pdb-on-err': False}},
'rmm': {'pool-size': None},
'ucx': {'tcp': None,
'nvlink': None,
'infiniband': None,
'rdmacm': None,
'cuda_copy': None,
'net-devices': None,
'reuse-endpoints': True},
'scheduler': {'work-stealing': True}}
In [6]:
Out[6]:
Start dash-scheduler
and dash-worker
with --dashboard-prefix b9bacd7b-6cee-402c-88ed-9d74b07f29a1/port/8787
The dashboard is actually at https://cocalc.com/{{ THE PROJECT UUID }}/port/8787/status
Websocket forwarding doesn't work, though ... hmm...
alternatively, start an X11 desktop in cocalc and run google-chrome
at http://127.0.0.1:8787/status
data array similar to numpy arrays
In [7]:
In [8]:
Out[8]:
In [9]:
Out[9]:
In [10]:
Out[10]:
In [11]:
Out[11]:
In [12]:
Out[12]:
(20,)
In [12]:
Out[12]:
array([0.99706421, 1.03277113, 0.9993643 , 0.99231638, 1.02139168,
0.98631926, 0.99280159, 0.97743904, 0.99200793, 1.00457694,
1.01779522, 0.98179355, 1.01977627, 1.01330775, 1.00401255,
0.98929948, 0.98495306, 0.99648525, 0.98166991, 1.01806776])
In [13]:
Out[13]:
0.9974348104666977
In [14]:
sum 1000 ints
In [12]:
Out[12]:
760761
functions and native lists
In [17]:
Out[17]:
In [18]:
Out[18]:
In [18]:
Out[18]:
8212721
loops?
In [19]:
In [20]:
Out[20]:
([1, 1, 2, 3, 5], 10)
Dask Bags
In [21]:
Out[21]:
dask.bag<from_sequence, npartitions=50>
In [22]:
In [23]:
Out[23]:
[(False, -16256), (True, -16240)]
Dask Delayed
In [14]:
In [15]:
In [16]:
Out[16]:
30
In [17]:
Out[17]:
Delayed('vizualize-ad3fba49-bb55-4757-a0ce-668e7fb3aac8')
In [18]:
Out[18]:
Delayed('vizualize-3dde23e6-bc93-4e01-a407-6b5d33a0ddbe')
In [0]:
In [0]:
In [0]:
In [0]:
In [0]:
In [0]:
In [34]:
Out[34]:
In [0]:
Ad-hoc Local Cluster
In [34]:
Out[34]:
/usr/local/lib/python3.6/dist-packages/distributed/dashboard/core.py:79: UserWarning:
Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.
warnings.warn("\n" + msg)
7.34683839787855e-05
In [0]:
In [0]:
In [0]: