Manifests

Manifest is a source, where metadata are read from. Spinta by default reads metada from tabular manifests, for example from CSV or XLSX files, but there can be other metadat sources, for example metadata could be read from SQL databases, from RDFS or OWL ontologies, etc.

Spinta works with following metadata:

../_images/manifest-model.png

Adding new manifest type

New manifests are added under spinta/manifests directory.

components.py file should contain your manifest class component, for example:

class MyManifest(Manifest):
    type: str = 'my'

    @staticmethod
    def detect_from_path(path: str) -> bool:
        return path.endswith('.my')

Manifests must implement following commands:

# commands/configure.py

 from spinta import commands
 from spinta.components import Context
 from spinta.core.config import RawConfig
 from spinta.manifests.my.components import MyManifest


 @commands.configure.register(Context, MyManifest)
 def configure(context: Context, manifest: MyManifest):
     rc: RawConfig = context.get('rc')
     manifest.path = rc.get('manifests', manifest.name, 'path')
# commands/load.py

 from spinta import commands
 from spinta.components import Context
 from spinta.manifests.components import Manifest
 from spinta.manifests.helpers import load_manifest_nodes
 from spinta.manifests.my.components import MyManifest
 from spinta.manifests.my.helpers import read_my_manifest


 @commands.load.register(Context, MyManifest)
 def load(
     context: Context,
     manifest: MyManifest,
     *,
     into: Manifest = None,
     freezed: bool = True,
     rename_duplicates: bool = False,
     load_internal: bool = True,
 ):
     if load_internal:
         target = into or manifest
         if '_schema' not in target.models:
             store = context.get('store')
             commands.load(context, store.internal, into=target)

     if manifest.path is None and file is None:
         return

     schemas = read_my_manifest(manifest.path, rename_duplicates=rename_duplicates)

     if into:
         load_manifest_nodes(context, into, schemas, source=manifest)
     else:
         load_manifest_nodes(context, manifest, schemas)

     for source in manifest.sync:
         commands.load(
             context, source,
             into=into or manifest,
             freezed=freezed,
             rename_duplicates=rename_duplicates,
             load_internal=load_internal,
         )

Here, you need to implement read_my_manifest function, which is responsible, for producing iterator with following items:

(location, schema)

Here, location should be a line number, database primary key or other identifyer, where this manifest item can be located.

schema is a dict, that must contain at least type key. type must be defined in spinta.comfig:CONFIG['components']['nodes'], which at the time of writing this text, looks like this:

{
    'ns': 'spinta.components:Namespace',
    'dataset': 'spinta.datasets.components:Dataset',
    'base': 'spinta.components:Base',
    'model': 'spinta.components:Model',
}

Components can be added via configuration, for example, using environment variables:

SPINTA_COMPONENTS__NODES__MY=myproject.components:My

See Configuration for more information.

Each component, has schema attrubute, where you can find, what can be used in schema for each type.

Here is an example for type: model:

from spinta.utils.schema import NA

{
    'type': 'model',
    'name': 'City',
    'title': 'City',
    'description': '',
    'external': {
        'dataset': 'example/dataset',
        'resource': 'myres',
        'pk': ['id'],
        'name': 'my://cities',
        'prepare': NA,
    },
    'properties': {
        'id': {
            'type': 'integer',
            'type_args': None,
            'required': True,
            'unique': True,
            'external': {
                'name': 'ID',
                'prepare': NA,
            }
        },
    },
}

A good place too look for examples, whould be spinta/manifests/tabular/helpers.py, look for *Reader classes.

Once you produce schemas as described above, then load_manifest_nodes function, will take care of everything else.