Search and massive download - Pygeodes documentation

In this example, we will search for items using pygeodes, filter these items using geopandas dataframes and use a download queue to downloads these items and monitor the progress of the download.

Imports¶

Let’s start by importing geodes

from pygeodes import Geodes, Config

Configuration¶

We configure using a config file located in our cwd

conf = Config.from_file("config.json")
geodes = Geodes(conf=conf)

Searching products¶

We search for products in the T31TCK tile whose acquisition date is after 2023-01-01

from pygeodes.utils.datetime_utils import complete_datetime_from_str

query = {
    "grid:code": {"eq": "T31TCK"},
    "end_datetime": {"gte": complete_datetime_from_str("2023-01-01")},
}
items, dataframe = geodes.search_items(query=query)

/work/scratch/data/fournih/test_env/lib/python3.11/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geodes-portal.cnes.fr'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Found 539 items matching your query, returning 80 as get_all parameter is set to False
80 item(s) found for query : {'grid:code': {'eq': 'T31TCK'}, 'end_datetime': {'gte': '2023-01-01T00:00:00.000000Z'}}

Exploring results¶

We get a list ot items and a dataframe, we can work with the dataframe for instance :

dataframe

Adding columns¶

We want to filter on cloudcover, so we need to add the column to the dataframe.

items[0].list_available_keys()

{'area',
 'continent_code',
 'dataset',
 'datetime',
 'end_datetime',
 'endpoint_description',
 'endpoint_url',
 'eo:cloud_cover',
 'grid:code',
 'hydrology.rivers',
 'id',
 'identifier',
 'instrument',
 'keywords',
 'latest',
 'platform',
 'political.continents',
 'processing:level',
 'product:timeliness',
 'product:type',
 'proj:bbox',
 'references',
 's2:datatake_id',
 'sar:instrument_mode',
 'sat:absolute_orbit',
 'sat:orbit_state',
 'sat:relative_orbit',
 'sci:doi',
 'start_datetime',
 'version'}

We find we can use spaceborne:cloudCover, so we add it to the dataframe :

from pygeodes.utils.formatting import format_items

dataframe_new = format_items(dataframe, {"eo:cloud_cover"})

Filtering our results¶

Now that the cloud cover is in our dataframe, we can filter on it.

dataframe_filtered = dataframe_new[dataframe_new["eo:cloud_cover"] < 30]

dataframe_filtered

Plotting¶

We can plot our results on a map :

dataframe_filtered.explore()

Downloading our items¶

We can download our results using the Profile system

from pygeodes.utils.profile import DownloadQueue, Profile

We reset our Profile to be sure to track only the downloads from the queue

Profile.reset()
items = dataframe_filtered["item"].values
queue = DownloadQueue(items)

In a separate cell, we run our queue

queue.run()

Pygeodes documentation

Using S3