Search and massive download
In this example, we will search for items using pygeodes, filter these items using geopandas dataframes and use a download queue to downloads these items and monitor the progress of the download.
Imports¶
Let’s start by importing geodes
from pygeodes import Geodes, ConfigConfiguration¶
We configure using a config file located in our cwd
conf = Config.from_file("config.json")
geodes = Geodes(conf=conf)Searching products¶
We search for products in the T31TCK tile whose acquisition date is after 2023-01-01
from pygeodes.utils.datetime_utils import complete_datetime_from_str
query = {
"grid:code": {"eq": "T31TCK"},
"end_datetime": {"gte": complete_datetime_from_str("2023-01-01")},
}
items, dataframe = geodes.search_items(query=query)/work/scratch/data/fournih/test_env/lib/python3.11/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geodes-portal.cnes.fr'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
Found 539 items matching your query, returning 80 as get_all parameter is set to False
80 item(s) found for query : {'grid:code': {'eq': 'T31TCK'}, 'end_datetime': {'gte': '2023-01-01T00:00:00.000000Z'}}
Exploring results¶
We get a list ot items and a dataframe, we can work with the dataframe for instance :
dataframeAdding columns¶
We want to filter on cloudcover, so we need to add the column to the dataframe.
items[0].list_available_keys(){'area',
'continent_code',
'dataset',
'datetime',
'end_datetime',
'endpoint_description',
'endpoint_url',
'eo:cloud_cover',
'grid:code',
'hydrology.rivers',
'id',
'identifier',
'instrument',
'keywords',
'latest',
'platform',
'political.continents',
'processing:level',
'product:timeliness',
'product:type',
'proj:bbox',
'references',
's2:datatake_id',
'sar:instrument_mode',
'sat:absolute_orbit',
'sat:orbit_state',
'sat:relative_orbit',
'sci:doi',
'start_datetime',
'version'}We find we can use spaceborne:cloudCover, so we add it to the dataframe :
from pygeodes.utils.formatting import format_items
dataframe_new = format_items(dataframe, {"eo:cloud_cover"})Filtering our results¶
Now that the cloud cover is in our dataframe, we can filter on it.
dataframe_filtered = dataframe_new[dataframe_new["eo:cloud_cover"] < 30]dataframe_filteredPlotting¶
We can plot our results on a map :
dataframe_filtered.explore()Downloading our items¶
We can download our results using the Profile system
from pygeodes.utils.profile import DownloadQueue, ProfileWe reset our Profile to be sure to track only the downloads from the queue
Profile.reset()
items = dataframe_filtered["item"].values
queue = DownloadQueue(items)In a separate cell, we run our queue
queue.run()