avocado.Dataset

class avocado.Dataset(name, metadata, observations=None, objects=None, chunk=None, num_chunks=None, object_class=<class 'avocado.astronomical_object.AstronomicalObject'>)

A dataset of many astronomical objects.

Parameters:
  • name (str) – Name of the dataset. This will be used to determine the filenames of various outputs such as computed features and predictions.
  • metadata (pandas.DataFrame) – DataFrame where each row is the metadata for an object in the dataset. See AstronomicalObject for details.
  • observations (pandas.DataFrame) – Observations of all of the objects’ light curves. See AstronomicalObject for details.
  • objects (list) – A list of AstronomicalObject instances. Either this or observations can be specified but not both.
  • chunk (int (optional)) – If the dataset was loaded in chunks, this indicates the chunk number.
  • num_chunks (int (optional)) – If the dataset was loaded in chunks, this is the total number of chunks used.
__init__(name, metadata, observations=None, objects=None, chunk=None, num_chunks=None, object_class=<class 'avocado.astronomical_object.AstronomicalObject'>)

Create a new Dataset from a set of metadata and observations

Methods

__init__(name, metadata[, observations, …]) Create a new Dataset from a set of metadata and observations
extract_raw_features(featurizer[, keep_models]) Extract raw features from the dataset.
from_objects(name, objects, **kwargs) Load a dataset from a list of AstronomicalObject instances.
get_bands() Return a list of all of the bands in the dataset.
get_models_path([tag]) Return the path to where the models for this dataset should lie on disk
get_object([index, object_class, object_id]) Parse keywords to pull a specific object out of the dataset
get_predictions_path([classifier]) Return the path to where the predicitons for this dataset for a given classifier should lie on disk.
get_raw_features_path([tag]) Return the path to where the raw features for this dataset should lie on disk
label_folds([num_folds, random_state]) Separate the dataset into groups for k-folding
load(name[, metadata_only, chunk, …]) Load a dataset that has been saved in HDF5 format in the data directory.
load_predictions([classifier]) Load the predictions for a classifier from disk.
load_raw_features([tag]) Load the raw features from disk.
plot_interactive() Make an interactive plot of the light curves in the dataset.
plot_light_curve(*args, **kwargs) Plot the light curve for an object in the dataset.
predict(classifier) Generate predictions using a classifier.
read_object(object_id[, object_class]) Read an object with a given object_id.
select_features(featurizer) Select features from the dataset for classification.
write([overwrite]) Write the dataset out to disk.
write_models([tag]) Write the models of the light curves to disk.
write_predictions([classifier]) Write predictions for this classifier to disk.
write_raw_features([tag]) Write the raw features out to disk.

Attributes

path Return the path to where this dataset should lie on disk