{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# [git-annex](https://git-annex.branchable.com)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[git-annex] is a tool for managing large data files with [git](http://git-scm.com). The idea is to store the information *about* the file in a git repository that can be synchronized, but to store the actual data separately. The annex keeps track of where the file actually resides (which may be in a different repository, or on another compute) and allows you to control the file (renaming, moving, etc.) without having to have the actual file present.\n",
"\n",
"Here we explore [git-annex] as a mechanism for replacing *and interacting* with Dropbox, Google Drive, One Drive etc. with the following goals:\n",
"\n",
"1. Multiple users can share data.\n",
"2. Data shared across many platforms: HPC clusters, laptops, desktops, Mac, Windows, Linux, [CoCalc], etc.\n",
"3. Allow only a subset of data to be stored on any particular device (esp. laptops) if memory on that device is limited.\n",
"4. Utilize cloud storage options including Google Cloud, Dropbox, Microsoft One Drive both as redundant backups, but also as a mechanism for sharing data with others who need to be able to use only once of these services.\n",
"5. Automatic and manual sync options.\n",
"\n",
"[git-annex]: https://git-annex.branchable.com\n",
"[CoCalc]: https://cocalc.com\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"My use-case is that I run a research group at WSU with ~10 collaborators. We need to share source code, experimental data, papers, plots, and simulation data on a regular basis. Some of the collaborators are used to using Dropbox or Google Drive, which work for syncing, but have issues (listed below). WSU provides 1TB of storage through Microsoft One Drive for all students and faculty, so this would be a natural storage too, but few use it yet. We run simulations on our local machines, office desktops, a local HPC cluster [Kamiak] and online using [CoCalc].\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc": true
},
"source": [
"